Aravind,
Well, using fir2() you could execute n_firs FIR operations per second:
n_firs = cpu / (nx * (3 + nh/2) + 25)
where:
cpu = clock rate, Hz
nx = buffer size
nh = n coefficients
So, for cpu=120e6, nx=1024, nh=400, I get n_firs of 577. So, theoretically for four channels you could execute a 400 point FIR on every sample at 147.7 kHz. In reality, though, the processor has other things to do also.
Hope this helps,
Bill