i'm having the same problem here but what makes we wonder is that it only happens to my desktop. when i tried transferring large files from my internal HDD(seagate 500GB) to my external HDDs(WD 120GB and Seagate FreeAgent 1TB) the delayed write failed happens. My desktop runs at windows xp sp3 pro 32bit. This don't occur to my laptop which is running to a windows vista 32bit sp2)
You can lose data because of this, or even get a corrupted filesystem.
Actually it means you have lost data. The system tried to write it to disk, was unable, and has given up. Whatever data it was trying to write is gone, unless it's something like a Word document you tried to save and you still have the document open and can therefore save it somewhere else.
Well due to journaling it doesn't have to mean you actually lost file data; the journal should correct the lost buffers and the filesystem should simply 'rewind' to a state in history; for example 30 seconds before the lost buffers occured (the last journal commit). So it doesn't have to mean you actually get corruption, though FAT32 doesn't offer any protection of this kind and will lose data.
However even NTFS journaling might not be enough in some cases, especially if the journal replay loses buffers too, then you can get into trouble. The journal replay occurs when the device becomes 'attached' again. I'm not too sure how Windows works in this regard, though i studied the usage of geom_journal in FreeBSD thoroughly. Also the SoftUpdates system is quite nice - its an alternative to journaling but achieves a similar effect, but without the write penalty. But it requires the storage sublayer to obey the 'flush' command; something harddrive buffers don't do; they just ignore it and lie about the flush command being finished. This is very bad of course.
The point of journalling system is to keep the file system metadata in a consistent state. If a program tried to write data to the disk and the disk couldn't do the job, journalling should work to make sure that the file system is in a valid "before-the-data-was-written" state. The actual data that was being written is still not on the disk, and so it will be lost unless the copy in RAM can be written somewhere else.
Of course some of this depends on whether the write failure occurred when writing a data block vs. metadata information, but you get my drift.
And yes, hardware that lies about "flush" is very, very bad indeed. This may be one of the differences between "consumer" and "enterprise" class drives, although I haven't seen it explicitly mentioned.
True about the first part. The enterprise drives do the same however. Its possible to disable write buffering however, but that would significantly lower performance to under 1MB/s speeds.
Instead, having a journal bigger than the added sum of all write-back buffers would cope with this, or in the case of FreeBSD's soft updates having enough delay between the separate commits (metadata - directories - files) would protect against data-loss with write buffering enabled.
But if you work on an Areca controller with 256MB+ write-back buffercache, you need a very big journal to protect against that, without the use of a BBU battery unit to guarantee the buffers are not lost.