Quantcast
Channel: Processors
Viewing all articles
Browse latest Browse all 123789

Forum Post: RE: C6747 Floating Point Performance is the pits!

$
0
0

Trent,

The bullet point that you quote above is very poorly written. We have really good engineers, but not all of us did well in writing classes. That last highlighted sentence mixes terms being compared in the clauses of the previous sentence. It should be reversed to be more clear: "For example, when using the -mv6700 compiler switch the MPYSP instruction will be used, and when using the -mv6000 compiler switch then the run-time support floating-point multiply will be used."

The MPYSP is a floating-point hardware instruction. It is only available on the C67x, C674x, and C66x devices.

The -mv6740 is what you should be using. Thanks to tsz for pointing you to it.

Your assembly listing left out some of the code from the top of the biquad function, like the branch target for the BDEC, so it is hard to tell what else is above it that should have been better or that is not very good.

The improvements from the C67x to the C674x brought in the C64x+ fixed-point architectures program flow improvements, especially the SPLOOP/SPKERNEL low overhead looping. Your code does not use this, so there are things you need to do to improve the code or the compiler switches or the information that can be available to the compiler.

Since you have been on the forum a while, you are experienced with the architecture, so you should be able to find and skim through the optimization documents that we have plus the online training material (go to the TI Wiki Pages and search for "c6000 optimization" (no quotes) to find some Wiki articles and the C6000 Optimization Workshop).

In a short list, look for removing the -g switch (I prefer to leave it in, but see if it improves), use the restrict keyword, use pragmas at the top of your loop for loop count information, use nassert to tell the compiler any alignment information that could help.

I am not sure you shared any real numbers on your bad performance, or how it has improved from these changes that you have already made. I remember reading relative numbers, but that does not paint a very clear picture of what we are dealing with here.

If you are doing all of this processing in your DMA ISR, it sounds like you may not have much else going on in the application. Otherwise, we generally recommend not doing that but using other SYS/BIOS features to allow more process-controlled execution of whatever is the highest priority thread that needs to run. I suspect you have that worked out in your case, so this is just for other readers with more generalized applications.

Regards,
RandyP


Viewing all articles
Browse latest Browse all 123789

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>