We have "hello world" in ti\mcsdk_2_01_02_06\tools\boot_loader\examples\pcie, it boots multiple cores. The code is in DDR3 (not MSMC). First core 0 is boot up, then it sent IPCGR to other cores to boot them up. I didn't meet issues in this case.
For using MSMC, there are some example codes by Advantech how they did in the Linux PCIE driver, that may help.
The driver is under http://www.advantech.com/products/PCIe-DSP-Card/sub_HALF-LENGTH_PCIE_CARD1.aspx for DSPC-8681 product. I looked my copy of version 0.7 (the latest is 0.8, not sure if any thing changed for local reset), check \dsp_loader\app\src\dsp_loader.c, function dio_put_dsp_in_reset() and dio_bring_dsp_out_reset().
/*-------------------------------------------------------------------------
* The following code is a work around to flush the cache, without this
* Any dirty cache lines in L1D may cause corruption of the downloaded
* image
*/
ret = downloadSimpleLoopCode(dsp_id);
That is, 8 IDLE instructions in MSMC.
Regards, Eric