Quantcast
Channel: Processors
Viewing all articles
Browse latest Browse all 124323

Forum Post: Speed of vector complex multiply using Intrinsics

$
0
0

I am benchmarking some applications on C6678 and have written some vector math routines not provided by DSPLIB to speed up performance. One routine is a complex vector multiply. Unfortunately my code for the multiply is slower than DSPLIB's FFT by 40%, and I don't understand how that could be given the multiply is 6N flops and the FFT is 5NlogN flops. Is the following code the best you can do for a complex vector multiply (where storage is re/im/re/im not im/re/im/re)?

#pragma MUST_ITERATE(2,,2)
#pragma UNROLL(2)
for (i=0; i<n; i++) {
  __float2_t dv = _complex_mpysp(_amemd8(&da[i]),_amemd8(&db[i]));
  _amemd8(&dc[i]) = _ftod(_lof(dv),-_hif(dv));
}

Here is my compile line with some extraneous stuff cut out

"C:/ti/C6000 Code Generation Tools 7.4.2/bin/cl6x" -mv6600 -c  -mv6600 --abi=eabi -k -O3 --define=C66_PLATFORMS --display_error_number --diag_warning=225  -Dxdc_target_types__="ti/targets/elf/std.h" -Dxdc_target_name__=C66 ../../source/ti_c6678/e_cvmul.c --output_file=e_cvmul.obj

And here is the ASM which is generated

(Please visit the site to view this file)


Viewing all articles
Browse latest Browse all 124323

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>