Disk corruption reported (Event 55) frequently

John_Blight

Distinguished
Jan 23, 2008
7
0
18,510
We have Western Digital Lifebook Pro external hard-drives connected to a Windows Server 2003 machine for back-up. We get frequent reports of corruption in the System Event Log:

Event Type: Error
Event Source: Ntfs
Event Category: Disk
Event ID: 55
Date: 17/01/2008
Time: 02:36:39
User: N/A
Computer: <ComputerName>
Description:
The file system structure on the disk is corrupt and unusable. Please run the chkdsk utility on the volume J:.


This has happened on several such drives, and I'm becoming suspicious that it isn't a physical problem with the drives at all. Having said that, running chkdsk does typically report the following (for example):

The type of the file system is NTFS.
Volume label is BU1.

CHKDSK is verifying files (stage 1 of 5)...
864960 file records processed.
File verification completed.
6 large file records processed.
0 bad file records processed.
0 EA records processed.
0 reparse records processed.
CHKDSK is verifying indexes (stage 2 of 5)...
3029455 index entries processed.
Index verification completed.
5 unindexed files processed.
CHKDSK is verifying security descriptors (stage 3 of 5)...
864960 security descriptors processed.
Cleaning up 164 unused index entries from index $SII of file 9.
Cleaning up 164 unused index entries from index $SDH of file 9.
Cleaning up 164 unused security descriptors.
Fixing mirror copy of the security descriptors data stream.
Security descriptor verification completed.
58113 data files processed.
CHKDSK is verifying file data (stage 4 of 5)...
864944 files processed.
File data verification completed.
CHKDSK is verifying free space (stage 5 of 5)...
33821987 free clusters processed.
Free space verification is complete.
CHKDSK discovered free space marked as allocated in the
master file table (MFT) bitmap.
Windows has made corrections to the file system.

366290000 KB total disk space.
229851596 KB in 806809 files.
207712 KB in 58115 indexes.
0 KB in bad sectors.
942744 KB in use by the system.
65536 KB occupied by the log file.
135287948 KB available on disk.

4096 bytes in each allocation unit.
91572500 total allocation units on disk.
33821987 allocation units available on disk.


1. It always seems to cite 'file 9'. We have some large BKFs; one, for example, is over 20 GB. Are such large files likely to be problematic?

2. The drives that have reported errors are connected to a Belkin Firewire 800 card. I realise that Windows doesn't implement Firewire 800 correctly, but might the card be a problem?

3. Does anyone know the mechanism by which Windows detects such errors?
Might the problem be one of communication rather than a disk defect? (We also get the following in the Event Log:

Event Type: Error
Event Source: sbp2port
Event Category: None
Event ID: 9
Date: 17/01/2008
Time: 02:36:48
User: N/A
Computer: <ComputerName>
Description:
The device, \Device\Sbp2\WD&External HDD Device&0&009, did not respond
within the timeout period.)

4. I have replaced such a drive with a new one, but that is behaving in the same way.

5. There's a documented issue with cluster sizes smaller than 4096 bytes (kb932578) that's mentioned elsewhere on this forum, but that would seem not to apply here (see above chkdsk report).


Any thoughts would be appreciated.
 

John_Blight

Distinguished
Jan 23, 2008
7
0
18,510
Thanks for the replies.

Since posting initially, I've switched to a USB 2 connection. If that's successful, I'll try a new Firewire 800 cable (I have one, so why didn't I think of that before? ;-)).
 

John_Blight

Distinguished
Jan 23, 2008
7
0
18,510
The problem seems to be with using more than one drive connected through Firewire 800.

I have one drive working successfully, but a second drive connected in this way will frequently report corruptions. I've tried different a port on the card, different cables, chaining the drive via a port on the other drive, none of which solves the problem. I've updated the firmware which was reported to deal with an issue using multiple drives connected through Firewire, but that didn't make a difference. Perhaps my issue is a further symptom of such problems.

I've since reverted to USB, and have had no further problems (for a week or two, at least). It's a shame I'm unable to use the advantages offered by Firewire 800.

I've raised it with Western-Digital support.
 

OP Dave

Distinguished
Jan 12, 2012
1
0
18,510



===============================

We have investigated similar problems on every version of Windows, and find that you can solve nearly all of them by turning off the dysfunctional and badly-implemented feature called "Write-Behind Cacheing" or "Lazy Write" in Windows. Particularly in Windows 7, if you do NOT do this, you can expect FREQUENT file corruption and data loss. For instructions on how to do this in various versions of Windows, see:
datasmithpayroll.com/news/news8.htm

OP Dave