Hi Kenshow,
One possibility that you get two different latency is because of cache impact. If the data sent from DSP is already in L1/L2 cache, the latency would be lower. You may try to disable cacheability of the memory that contains testing data to verify it.
Thanks,