Using DCMPY I see how I would get 16 MAC/cycle with the (4 complex multiplies per cycle) for correlation and filtering.
Can the more optimal 32 MAC/cycle be achieved for these applications easily with say . . . CMATMPY?
I can see how it might work but how realistic is this?
Thanks
Chris