I don't know exactly how DINT/RINT did not fix the original problem. I did see on another thread that someone experienced the same problem and they claimed that something in BIOS would still cause interrupts.
The original problem was that I was loading into a register and continuing to use it for a few more cycles assuming the load had not yet written to the register. When interrupted this could be a incorrect assumption. That is the conclusion that I came to with the help of Andy and Randy.
What I did to loosely prove this theory was to free a register to allow for alternating the loading into a second register pair that was not used as much immediately after the load instruction. This reduced the errors by about 90%. I then stretched out the pipelined/unrolled inner loop so that there would be zero cases where this invalid assumption was used and the errors disappeared.
Looking back on it, if I had saved some of the save-on-use registers to the stack I could have not lost any performance at all. At this point I have a lot of other work to do and I don't know if I will get back to this. It is good enough for now.
Since everyone keeps mentioning that they recommend relying on the C/C++ compiler I wanted to make a comment. This Viterbi decoder uses ~80% of the cycles when the DSP is at its worst case loading. We already had a Viterbi decoder written in assembly and with it we nearly had zero cycles to spare. For this reason there would be no point in writing a C decoder. I happen to enjoy the theory of Viterbi decoding and I wanted to leverage my knowledge of it to achieve performance that is much greater than the compiler can (better by a factor of 16 - trust me I tried it first in C++). A Viterbi decoder, I think, is an excellent candidate for hand-coded assembly because it is so parallelizable.
Thanks again Randy and Andy for the feedback. Maybe someone can explain why RINT/DINT didn't work?