Renjith,
Udelay is the standard way for putting short delays. It is used extensively throughout the kernel, and in nand_base.c specifically.
This means that the stalls I am experiencing may appear in endless ways outside the NAND driver.
As for the precision of udelay, 10% variations are acceptable, but this cannot explain the over 10,000% anomaly described.
Using a timer-based function such as usleep_range affected performance dramatically, unfortunately, and anyway udelay as a simple loop can not be avoided in the entire kernel.
Do you have ideas why the writes are stalled in spite of using memory barriers?