WD Scorpio Black & ASRock A785GMH/128M: extreme freezes in Windows

xen111

Commendable
Jul 29, 2016
55
0
1,640
I have not yet attempted Linux yet on this harddisk. It came fresh from the shop and I decided to install Windows on it. It produces some coil whine.

I have had at least 3-4 versions of Windows installed on it. Very regularly during minor, minor reads the hard disk led of the enclosure will go full bright and the application or system will freeze. It's like IO will block for up to 10 seconds or longer. I am sending the disk RMA due to the coil whine anyway, and I am trying to send the mobo back to the (second hand) vendor that sold it to me because of the failing RAID (see other topic),

But I just wonder where I should put the blame here.

I mean that when I am in the Edge browser of Windows 10 (for example) and I click somewhere, the system or program will stall for at least 10 seconds, often much longer even. I may click on some edit field in order to input some text and the browser will freeze for 20 seconds.

But the same thing happened in Windows 8 and Windows 7. I have not yet installed Linux here.

I was also not meaning to, at least not as the boot (root) device. I did not experience any such issues for real in Linux using a different harddisk.

I am using the same SATA cable that I used for the other harddisk. It is currently the only harddisk in the system. That is actually connected. Currently the system seems to be rather responsive. The problems happen mostly, it seems, during the first 10 or 20 minutes of running the system. After booting.

The SMART short self test reports without error. An earlier CrystalDiskMark test run completed with good figures (> 100MB/s sequential reads and writes, ~1.5MB/s 4K reads and writes.

What could possibly be the matter with this slowness? Note, I am not necessarily asking for it to be solved. The harddisk is going to get returned to begin with, but a replacement will probably not act differently.

It is hard to test these sorts of things without running a system on it. If there is nothing wrong with the HDD in terms of operation,

then could there be an issue with the SATA controller? According to WD the drive has a SATA 600 interface (Sata 3). The motherboard, SATA 300 (Sata 2). The only thing I could possibly test currently is to install a Sil3114 RAID controller into the PCI slot of the motherboard and then attach the disk to that.

Or does Windows do something weird right after boot? I have never experienced that before.

I hope for your answers. Regards.
 
Hey there, @xen111!

I saw your post in another thread, so I'm here to offer some troubleshooting advice. :) I'd recommend you try the WD drive in another computer instead, and see how it will behave from there. I'd also advice you to use WD's Data LifeGuard diagnostic for Windows and run the QUICK and EXTENDED tests to determine the health and SMART status of the drive. Keep in mind that the WD Scorpio Black is an older model, so if the drive still performs poorly in another system, you should definitely get in touch with the reseller to RMA & replace the HDD.

The SATA interface of a mechanical hard drive cannot surpass the bandwidth of a SATA II (3 Gb/s = 300MB/s) anyway, so this shouldn't have any effect on the HDD's performance.

Keep me posted with the troubleshooting!
SuperSoph_WD
 

xen111

Commendable
Jul 29, 2016
55
0
1,640


Thanks. There are a lot of WD guys here :). I normally go with Samsung but my NAS drive is a Red and I thought I'd try a different brand when getting a 7200 drive.

I'd recommend you try the WD drive in another computer instead, and see how it will behave from there.

Identical, but then, the one system is SB700 and the other one is SB710. I have no systems that do not have 700 or 710 :p.

Also I cannot try another OS without reinstalling. I may do this; imaging the drive is easy under Linux and I can wipe and restore it very fast. However that would mean having to try Windows 7 and Windows 8. Windows 8 had the same issues. Windows 7 maybe less but I think it had it too; you see while installing 10 I used various upgrade paths.

Installing the AMD Catalyst drivers (15.7.1 is the latest version for "legacy") made no difference. Putting Windows 10 to IDE mode (or at least, the BIOS/motherboard) caused no problems for Windows and did seem to solve the hangs, but the system remains very sluggish. The defining characteristic is that the HDD LED will keep a solid, stable blue (in this case) and does not flicker. The moment the IO block is over, even if there is heavy activity, at least it always flickers.

So a non-flickering LED is indication that the system is in IO-queue-block.

I have had experience with this on Linux with a Samsung/Seagate ST1000LM024. And actually I think also with a ST500LM012. But what really happened then is that I had a slow SSD (mSata Transcend 16GB) and Linux does not have the best IO queues. I was using the slow SSD as a system cache drive. Adding it as cache would cause the same kind of behaviour I see here; and even when I removed the cache and used the drive for itself, I experienced the same kind of hangs with that SSD. Now Linux has IO system buffers for dirty cache (writes) for which the limits are not defined very well and actually very wrong. I have not reattempted using the drive as a cache after setting better values.

Linux has a Single IO queue for all devices and by default allows 30 seconds of dirty cache to accumulate before it starts writing, it also sets the default dirty cache to 10% of the total system RAM on a 64 bit system. On a 32-bit system, the max is capped at 1GB and the amount that can fill up before it actually starts writing is 10% of that. That's not a fixed maximum, the 1GB is a fixed maximum. That means that on a 64-bit system maybe there is not even a maximum. Not a clear one in any case. So if you have some slow SSD (14MB/s writes) and you write a gigabyte to it, it may very well read a gigabyte into its buffers and then wait a 100 seconds until it all has cleared. If those buffers do fill up, no other write activity can occur on the system.

That's the simple explanation of why Linux is a Poor Bastard and why the Poor Bastard called Linus actually disagrees with these defaults himself, but they are still there (3 years after the debate).

I am intending to recompile my kernel to set a fixed dirty cache of 100MB max and see if it still happens.

Today I can say that this disk on this chipset, with or without AMD drivers, the SB7xx family, functions very poorly and has the same kind of hangs that I experienced back then on Linux with other drives. However, the hangs I have today are all in the range of about 25 seconds. Maybe shorter sometimes. With Linux it could take more than 2 minutes (with those other drives).

Basically it is probably going to be the same symptom to the same kind of problem.

But I don't know why this happens with this WD drive and this system it is currently in that has the ST500LM012 experiences no such problems that I know of in Windows 8. The drive made coil-whiny noise in the beginning but that shouldn't really affect its operation. From what I can tell the coil whine has subsided. I am still going to RMA it I think.

I will probably have at least 3 RAID controllers available within a couple of days that I can test. I will not be testing the first (PCI) but I can test the x1 and the x4 one. I hope Windows will boot for once.... it always complains about BOOTMGR info being incorrect and then bootrec /rebuildbcd doesn't even work; it will not even find any installation.

Or something. It's about the BCD. It can't handle being attached to a different controller.

Therefore I might be able tomorrow to install Windows 10 on this drive to some crap Sil3124 controller. Then I could see what is going to happen.

Steps to reproduce:

1. Boot Windows
2. Just start 10 different items from the Windows 10 win+x menu at once
3. Watch the system completely freeze.

If my replacement drive is going to exhibit the same things then we can safely say that they are not compatible with SB7xx.

Which makes them a rather poor choice in that sense..... I hope it is a drive failure. I will run the tool now; but the SMART short test has already completed successfully and the extended test got interrupted by something.
 

xen111

Commendable
Jul 29, 2016
55
0
1,640


I have now attached the drive to a cheap SATA RAID controller. These are firmware controllers so Linux usually not bugged by it. There are no good tools available in Linux for my purpose so I have just written a little script to do a test similar (in appearance) to what WD Data LifeGuard does.

Code:
#/bin/bash

cur=0
ans="\033["
echo
echo -e "${ans}?25l"
while true; do
    dd if=/dev/sdg of=/dev/null bs=1M count=1 skip=$cur iflag=direct 2> /dev/null && {
        printf "%20d" $cur
        echo -ne "${ans}G"
        cur=$(( cur + 1 ))
    } || {
        echo "${ans}?25hDone at $cur megabytes"
        break;
    }
done
echo

This thing will turn off the cursor and then read a single megabyte off each position into the "big hole". It will then report the number of the megabyte having been read.

Similar to Data Lifeguard, it allows you to see the "flow" of reading being done. DL does it per sector, I do it per megabyte (to save on invocation calls).

If this reading would cause the drive to stall it would be visibly indicated in a stalling number. The number is being updated in place using ANSI cursor positioning after each write.

However it is running now (currently at 63 GB after a few minutes) and there is no indication whatsoever that the reading is not fluid, The reading pace seems to be completely constant. At 73GB now. On Windows this same test would take hours because of all the stalling.

It is now at over 10% of the disk in maybe < 10 minutes. So it is completely clear this is a Windows driver issue. (Or something of the kind).

I quit to ensure that the motherboard was really on AHCI, which it was. So Linux does not experience the problem, and all version of Windows I have tried, do.

Here is some other question: Scorpio Black SATA 1.5GB/S jumper setting. I quote:

Having it run at 3GB/S causes hangs and very slow boot up time.

In Linux my drive is definitely running at 3GB/s but also not experiencing issues. In Windows I don't know, but I'm not sure I can test, because my Windows 8 installation (that is on another drive) got corrupted again (let's say destroyed) by trying to boot it off a different controller.

Now ONCE MORE the Windows boot configuration is CORRUPTED by no other act than trying to BOOT IT. And I am pretty sure bootrec /rebuildbcd is going to fail and I will need to install Windows --- again.
 
Wow, amazing job, @xen111! You definitely have the best of expertise I've come across so far when it comes to troubleshooting!
I agree that it could be a Windows driver issue, however, it wouldn't choose to act out on one HDD alone. You should definitely try clean installing Windows OS. Instead of plugging the WD Scorpio internally, you could also try troubleshooting it externally. However, make sure you run the QUICK and EXTENDED tests again, until they have been completed successfully. If doing this through Windows is causing issues, try the DOS version of the WD Data LifeGuard Diagnostics software. You should be able to find it in the Product Downloads. It will allow you to run a much detailed diagnostics on the drive without booting into Windows.

Good luck! Let me know how it is going.
SuperSoph_WD
 

xen111

Commendable
Jul 29, 2016
55
0
1,640
Oh thanks, I didn't know that (the DOS software). I like how you are complementary to people, unlike Linux people, who are always the opposite :p. How it is going is that this drive is currently running on a 570 SLI nVidia nForce mobo (of also AM2) that is even older than the other motherboard, and I have no issues in Linux (obviously) but also none in Windows, apart from the fact that it is rather slow (Windows 10 is rather slow). But Windows 10 is slow on the same disk on the other motherboard as well. Even apart from the freezes. I do believe I have also tested it on a cheap RAID contoller and there were no issues!!!!

I am very confident it is related to the motherboard implementation of the SATA, but a newer motherboard that was ALSO SB710 did NOT experience the same issues ; this was a Asus mATX mobo that was just bought last year. Running a Phenom x6. So it is a combination of both the Windows driver and the motherboard.

Meaning Windows 10 on that same system using the Delock Sata RAID controller is still not very fast (I have never had such as slow system in my life almost) and I did not experience that same slowness before on different systems (even on a 4 year old laptop!!!) -- a laptop without ANY Windows 10 support drivers (including for its chipset / graphical chip) -- (at least, not initially) and yet my much more powerful 5050e system (although also very old) with a much faster harddisk sees it running so slooooow.

It doesn't matter whether I put it in the controller or not. Windows 10 is just dreadfully slow and seems to always stall with the disk reads, even if they are not 20-30 second delays anymore. So I don't know what is going on and maybe both systems still have an issue with the SATA.

After all, the raid controller is also SATA 300 and not SATA 600 so there may still be an issue there. I was actually looking to find a post where someone hinted how you could turn the speed to a lower setting (I thought).