Quantcast
Channel: Processors
Viewing all articles
Browse latest Browse all 123869

Forum Post: RE: DMA vs. memcpy

$
0
0

David,

Memory transfer performance depends on your board and your application.

- There are 4 combinations of transfer src/dst using L2 SRAM and External Memory, or more if you consider L1D SRAM.
- There can be considerations for cache coherency that need to be included in the benchmarking, when external memory is being used.
- There are timing requirements, buffer sizes, and shared resources that may need to be considered for the full application.

The best thing to do is run the tests in your environment the way you need it to be done:

- Run memcpy with typical alignment and take TSCL readings before and after.
- Run QDMA with the same alignment and take TSCL readings before and after, using cache coherency commands as needed.
- If the alignment and length and compiler settings work out, try the for-loop in rrlagic's well-experienced post.

15 years ago, we measured 5 words as being the decision point. <5 words use memcpy, >5 words use EDMA.

With the faster memories on the DSK6455 and the faster internal buses inside the C6455, that number moved higher. I have not measured it since then, but I would use memcpy for <50 words, QDMA for >100 words, and measure carefully in between.

You may find QDMA faster than memcpy or optimized loops for external memories that benefit from optimized longer transfer commands.

In the simple case in your application, you will want to post the QDMA as early as possible in your thread so you can do as many instructions as possible before starting to poll for completion. For memcpy in that case, you will want to wait as long as possible so the data will be as fresh as possible in the cache.

The best case is to use ping-pong buffers so you can be transferring one buffer's data while processing on the first buffer's data. This can virtually eliminate the time to do the transfers. In many applications, the EDMA can be setup to do the transfers once a buffer has filled and send an interrupt to start the processing on that buffer, even while the next buffer is filling. This is one of the biggest potentials for performance improvements in DSP applications.

Regards,
RandyP


Viewing all articles
Browse latest Browse all 123869

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>