Jim,
The DSPlib kernels performance table you are referring are the raw CPU cycles. The command file in the DSP library project provides the memory details that can be used for the test bench. However there will be some difference in the raw CPU cycles and the cycles measured on the EVM with the code and data in L2SRAM as provided in the command file.
I would recommend that you look into the examples in MCSDK which will give you the idea for partitioning the your algorithms and memory.
Regards
Asheesh