Quantcast
Channel: Processors
Viewing all articles
Browse latest Browse all 123537

Forum Post: RE: c6678 EVM MSM RAM test question

$
0
0

Comments inserted in blue.

-Chad

Chad,

I have another several tests the results are:

test1,core0 test

optimization(-O3):
L2 cache disable
***********TEST START***********
DDR write g_ll_overhead = 4021887
DDR read g_ll_overhead = 25057412
MSM write g_ll_overhead = 524311
MSM read g_ll_overhead = 986437
***********TEST OVER***********

L2 cache enable, 128KB
***********TEST START***********
DDR write g_ll_overhead = 5039691
DDR read g_ll_overhead = 2177260
MSM write g_ll_overhead = 524322
MSM read g_ll_overhead = 986421
***********TEST OVER***********

from this test result, I have a question:

Q1:why if I build with -O3 , both the read and write speed of MSM RAM is fast than DDR3  whether the L2 CACHE is disable or enable.(JUST CORE0 RUN THE PROEJCT)

A1: This goes along with what I was saying before.  With -O3 it's going to unroll the loop for you, hence more access per cycle.   The prefetching while it will do a fair amount to help improve performance from DDR3 accesses, it will not be superior to having it in MSMC RAM.

I use the below C code to test random address access with 8 cores run at the same time

g_ll_startcycle = CSL_tscRead();
for(i = 0; i < 0x100000; i++)
{
mytemp = *(int*)(g_p_DDR + rand()%0x100000);
}
g_ll_endcycle = CSL_tscRead();
g_ll_overhead = g_ll_endcycle - g_ll_startcycle;
printf("DDR read g_ll_overhead = %lld\r\n",g_ll_overhead);

g_ll_startcycle = CSL_tscRead();
for(i = 0; i < 0x100000; i++)
{
mytemp = *(int*)(g_p_MSM + rand()%0x100000);
}
g_ll_endcycle = CSL_tscRead();
g_ll_overhead = g_ll_endcycle - g_ll_startcycle;
printf("MSM read g_ll_overhead = %lld\r\n",g_ll_overhead);

test results:

*********************************************************
core0-7:
no optimization:
L2 cache disable:

[C66xx_0] DDR read g_ll_overhead = 182241275
[C66xx_1] DDR read g_ll_overhead = 182235471
[C66xx_2] DDR read g_ll_overhead = 182991167
[C66xx_3] DDR read g_ll_overhead = 182235611
[C66xx_4] DDR read g_ll_overhead = 185485905
[C66xx_5] DDR read g_ll_overhead = 182237667
[C66xx_6] DDR read g_ll_overhead = 182235691
[C66xx_7] DDR read g_ll_overhead = 185485429

[C66xx_0] MSM read g_ll_overhead = 79199887
[C66xx_1] MSM read g_ll_overhead = 79456549
[C66xx_2] MSM read g_ll_overhead = 79590121
[C66xx_3] MSM read g_ll_overhead = 79593668
[C66xx_4] MSM read g_ll_overhead = 79503946
[C66xx_5] MSM read g_ll_overhead = 79359653
[C66xx_6] MSM read g_ll_overhead = 78745559
[C66xx_7] MSM read g_ll_overhead = 78750037


L2 cache enable, 128KB

[C66xx_0] DDR read g_ll_overhead = 69102760
[C66xx_1] DDR read g_ll_overhead = 69102244
[C66xx_2] DDR read g_ll_overhead = 69102456
[C66xx_3] DDR read g_ll_overhead = 69102692
[C66xx_4] DDR read g_ll_overhead = 69103128
[C66xx_5] DDR read g_ll_overhead = 69101970
[C66xx_6] DDR read g_ll_overhead = 69102486
[C66xx_7] DDR read g_ll_overhead = 69103742

[C66xx_0] MSM read g_ll_overhead = 79224192
[C66xx_1] MSM read g_ll_overhead = 79488368
[C66xx_2] MSM read g_ll_overhead = 79599546
[C66xx_3] MSM read g_ll_overhead = 79594841
[C66xx_4] MSM read g_ll_overhead = 79517403
[C66xx_5] MSM read g_ll_overhead = 79374221
[C66xx_6] MSM read g_ll_overhead = 78780542
[C66xx_7] MSM read g_ll_overhead = 78778309

**********************************************************
core0-7:
optimization(-O3):

L2 cache disable:

[C66xx_0] DDR read g_ll_overhead = 180781462
[C66xx_1] DDR read g_ll_overhead = 181028146
[C66xx_2] DDR read g_ll_overhead = 180776248
[C66xx_3] DDR read g_ll_overhead = 180776968
[C66xx_4] DDR read g_ll_overhead = 184167664
[C66xx_5] DDR read g_ll_overhead = 180779084
[C66xx_6] DDR read g_ll_overhead = 180777110
[C66xx_7] DDR read g_ll_overhead = 184167236

[C66xx_0] MSM read g_ll_overhead = 77093239
[C66xx_1] MSM read g_ll_overhead = 77361328
[C66xx_2] MSM read g_ll_overhead = 77498327
[C66xx_3] MSM read g_ll_overhead = 77502841
[C66xx_4] MSM read g_ll_overhead = 77421591
[C66xx_5] MSM read g_ll_overhead = 77271424
[C66xx_6] MSM read g_ll_overhead = 76642935
[C66xx_7] MSM read g_ll_overhead = 76652356


L2 cache enable, 128KB

[C66xx_0] DDR read g_ll_overhead = 66999428
[C66xx_1] DDR read g_ll_overhead = 67001706
[C66xx_2] DDR read g_ll_overhead = 67000144
[C66xx_3] DDR read g_ll_overhead = 67000286
[C66xx_4] DDR read g_ll_overhead = 66999732
[C66xx_6] DDR read g_ll_overhead = 66999872
[C66xx_5] DDR read g_ll_overhead = 67000218
[C66xx_7] DDR read g_ll_overhead = 66999546

[C66xx_0] MSM read g_ll_overhead = 77131099
[C66xx_1] MSM read g_ll_overhead = 77399198
[C66xx_2] MSM read g_ll_overhead = 77500053
[C66xx_3] MSM read g_ll_overhead = 77506983
[C66xx_4] MSM read g_ll_overhead = 77484238
[C66xx_5] MSM read g_ll_overhead = 77401469
[C66xx_6] MSM read g_ll_overhead = 76669513
[C66xx_7] MSM read g_ll_overhead = 77007213

from the  test  results, we can find that MSM random address access speed is slower than DDR random address access speed if 8 cores run at the same time when L2 CACHE enable no matter build with -O3 or not . 

Q2: I am not clear why does this happen? what reason ? Can you exlpain it?

Q3: if data in my project  will  be random accessed, not  continuous accessed, whether I need put  the proccessed data in DDR, not in MSM RAM.

A2 & 3: There are multiple variables coming into play here.  That said, I think it's primarily the multiple accesses by all cores per cycle to the MSMC RAM is causing stalling of the MSMC RAM, while the prefetching to L2 is at least giving a partial fill of L2 w/ no contention for access of the L2 by other cores.

The reality is, this is not too realistic,  you're not going to have contention every cycle randomly on MSMC.

Random accessed data in a typical system is going to be better in MSMC than DDR3 under normal operating conditions.

Best Regards,

Si


Viewing all articles
Browse latest Browse all 123537

Trending Articles