GenPol,
You were more specific in your first post than I realized. Thank you for the confirmation, though.
Did you develop your own linear assembly version of these functions? I only see scheduled assembly (.s) and C versions in the src folder. My version is 3.1.0.0, so perhaps you have something different.
Looking at the scheduled assembly, it is massively optimized with very high utilization of the 8 execution units. And it does disable interrupts during the entire duration of the function. If you were to change anything to make it interruptible, you would be very likely to lose performance. This would have to be examined in great detail to find a way to do that.
The option that TI provides is to use the Compiler's switches on the C versions, and you can also do that with your linear assembly version. You will likely not get as good performance as the scheduled assembly, but you could improve your interrupt latency.
The -mi switch can be used with linear assembly to limit the amount of time that interrupts are disabled. I tried a simple test using the dotp linear assembly code from the Compiler User Guide
With -o3 optimization, the inner loop was 2 cycles with both cycles fully loaded on all 8 functional units. This inner loop was not interruptible because of the loop size.
With -mi 100, the inner loop was 6 cycles long so that it could be interrupted.
The -mi 100 version will likely take at least 3x as long for the inner loop to execute. So that tradeoff must be considered.
My suggestions, in order of my preference:
1. Try the switches I have mentioned above and let us know how the results work for you.
2. We will ask a Moderator to move this thread to the Compiler Forum so the best people can comment on the compiler options available to you. They may need you to post your linear assembly version, and they may ask you about the compiler's pipeline comments in the assembly output (use the -k compiler switch to keep the assembly file). They may be able to address why SPLOOP is not being used.
3. With a measurement of the execution time of these functions while interrupts are disabled, examine the impact of this on the latency of your interrupt(s). You may be able to find another system solution to your interrupt issue, such as using the EDMA3 to keep data moving when a peripheral is ready or using the PRU to off-load some simple ISR functions.
4. FFT functions can be broken into smaller FFTs and then executed separately. This is a technique that I first learned about from our MultiCore team for running a very long FFT (up to 1,000,000 points) by breaking it up across several DSP cores. This technology might not be publicly available from TI (you can search TI.com for it), but it is probably available from some of the universities that share on the internet. You could use several smaller executions, and those would allow interrupts to occur in between those calls, reducing your latency.
5. Consider moving to the OMAPL138 which would add the ARM9 processor for you to use for servicing these interrupts.
Regards,
RandyP