Sign in with
Sign up | Sign in
Your question

Looking to increase throughput, and reduce time spent.......

Last response: in Storage
Share
January 6, 2007 10:26:19 AM

Looking to increase throughput, and reduce time spent seeking.

(1) Need to increase throughput.

(2) Desire a reduction in the time spent seeking, as a percentage


Current heavy users:
Full Backup going on C:\

A virus scan going (just a quick one, using Windows Live OneCare)

Extracting a 5 GB RAR archive from another PC, over a network, (PC in question only has 100 Mbps, rest of my LAN is 1 Gbps), to remove, or clean, a virus infected file from the RAR archive.


I am doing the above all at once.

I am not 100% certain (read: convinced) that NCQ [Native Command Queuing] is working.

The backup is only going at 4.5 MB/sec, from and to the same HDD, but different partitions, from there it will get compressed, then moved to external storage (be it another PC, or DVD+/-R, external HDDs, etc).

My main questions are:

(1) How do I know NCQ is working, Can I test it with HDTach or a similar tool ?

(2) Heaps of I/O on the one drive:

(2a) The backup is performing all reads + writes to the same HDD
(2b) The extraction is performing only writes (over the network, with my CPU uncompressing the data and virus scanning it).
(2c) The Virus Scan is performing only reads, plus using some CPU time, etc

Pausing the Virus Scan appears to have a minimal difference.


Looking for advice, recommendations, and just a chat in general.
January 6, 2007 10:31:38 AM

Here is a screenshot of my Storage Sub-System related stuff:

The 'port' the drive is on is ABOVE the one highlighted:



Ultra DMA Mode 5 reported for HDD
SATA attached

Ultra DMA Mode 2 reported for CD/DVD-ROM combo drive.
IDE/ATAPI/PATA attached

Using the Intel driver from: 10/04/2006 ; 8.0.0.1008
Which has passed: Microsoft Windows Hardware Compatibility Publisher
January 6, 2007 10:58:41 AM

Here are the devices 'by connection':



I have also confirmed that the OS Disk Cache policy is set to: Write Back


Would going RAID-5 help ?
Assuming the GA-965P-DQ6 supports Intel 'Matrix-RAID' RAID-5
If so how much ?

Would using an external HDD solution help ?
If so how much ?
eSATA, USB 2.0, Firewire and 1 Gbps LAN to another PC are options

Would using 10,000 rpm HDD(s) help ?
If so how much ?
Assuming backup from/to the same HDD, aswell as external 7,200 rpm HDD(s)


I am planning to run SiSoft SANDRA 2007 on the HDD once the backup is complete to see how 4.5 MB/sec compares to the figures SANDRA gets.

CPU usage is sitting around 25%
Memory usage around 30% (I have 3.24 GB usable memory).
Related resources
January 6, 2007 11:41:49 AM

OK reading:

http://europe.giga-byte.com/FileList/Manual/motherboard...

Page 38 of 104

Indicates I need to check AHCI Mode is enabled, or I'll miss out on NCQ (which I could really use right now, assuming it is not currently enabled).

There are also updated IDE/SATA/RAID drivers on the Gigabyte and Intel websites, which I'll try later too.
January 6, 2007 12:43:59 PM

Backup + Verify just finished:

Average Speed was: 2,856,245 bytes/sec (apx)

(Although this figure only counts the reads the backup performed, which was only about 40% of the total disk I/O going on, However that means there was only 6.81 MB/sec of disk I/O going on, the rest of the time was spent seeking, without any command queuing by the looks of it).

Total backup was only: 19.0063 GB in size

Backup took: 1 hour, 59 min, 5 sec
Verify took: 41 min, 40 sec

The Verify was 2.85x times faster.


I understand that the HDD heads are spending more time seeking than actually transferring data in this case, thus the post.

Figured may aswell 'share' the knowledge (I already know what the findings will be, but was not expecting my system to have been setup without AHCI - thus NCQ is not even active).

Will be interesting to see what sort of difference it makes, and if HDTach can 'record' a scenario for benchmarking other systems with a very similar pattern.
January 6, 2007 1:22:56 PM

I'm using the Intel ICH8R south-bridge, but AHCI was (and still is) disabled in the BIOS, so none of the advanced AHCI / SATA features are working.

Including, but not limited to, Hot-Plug and NCQ.

My I/O queue length sits around 8 to 64, hovering around 20 to 32 most of the time.

I know the tasks would finish faster if I let them stream, but that isn't going to happen when doing backups involving reads/writes to the same physical disk.
Especially when the logical partitions in question are at opposite ends of the physical disk.

Does anyone have a HDTach (or is it HDDTach) or IOMeter chart handy for a Seagate ST3320620AS ?
(320 GB, 7200 rpm, SATA [3.0 Gbps], NCQ)
Not 100% sure on Hot-Plug support but that is more of a OS + Driver + Controler + Firmware feature anyway.


My questions are above, feel free to concentrate on them :wink:
January 6, 2007 2:25:57 PM

Quote:

Thus, if the BIOS was changed from "Standard IDE" to "AHCI" and then the F6 sequence was done correctly to load the correct driver software during Windows Setup, the system STILL failed if the HDDs were not also switched from 1.5 Gbps to 3.0 Gbps.


What exactly do you mean by "the system STILL failed" ?
January 7, 2007 5:58:22 AM

Quote:
(1) Need to increase throughput.

(2) Desire a reduction in the time spent seeking, as a percentage

(1 increasing throughput) Raptor or SAS would be a good way. Another would be Raid.

(2 seek times) 10k rpm raptors or 10k SAS would decrease seek times from 8-9ms on 7.2k rpm drives to ~3ms.

Another thing worth mentioning is differential backups. Dont know how much data changes each time you do a backup but if there isnt much change then a differential backup would really help you reduce transfer times.
Quote:

Would going RAID-5 help ?
Assuming the GA-965P-DQ6 supports Intel 'Matrix-RAID' RAID-5
If so how much ?

Raid 1 would be best in your case because of the large file transfers.

If you do Raid 5 then you will need a real hardware controller. Otherwise that 25% cpu load will be sitting around 35%.
Also with raid5 you will need 4+ disks before the write performance would be up to par. Your read speeds with raid5 will however be much better. Sustained throughput will reach into the 150mb/s range for 4+ disks and only increases with the addition of more drives.
Quote:
Current heavy users:
Full Backup going on C:\

A virus scan going (just a quick one, using Windows Live OneCare)

Extracting a 5 GB RAR archive from another PC, over a network, (PC in question only has 100 Mbps, rest of my LAN is 1 Gbps), to remove, or clean, a virus infected file from the RAR archive.


I am doing the above all at once.

I am not 100% certain (read: convinced) that NCQ [Native Command Queuing] is working.

NCQ is not useful for you because you are doing sequential reads, especially when doing a backup. When doing a backup you have to read the data in sequential order regardless, because you have NCQ on then it checks each piece of data to see if it can optimize the orders but it cannot do that and you in return receive a performance hit. NCQ will give you a boost only if you are doing non-sequential reads, which you are not doing.

Write-back cache is also not useful for you
for the same reasons. Write-back is storing your info in cache and waiting to perform operations on that data before writing the info to the hard drive. Once again you are doing long sequential reads/writes and not changing the data along the way so it is better to disable this feature so the drive writes the info to the hard drive without delay which will reduce another performance hit.
January 7, 2007 7:08:16 AM

Don't forget also.. If you are virus scanning RAR files, to completely check for viruses the program has to unpack, check, then re-pack all the files. While this may not be too computationally challenging, it will eat up a LOT of memory.. especially in the cache and I/O subsystem. This may be "log-jamming" your memory management as the CPU tries to juggle between reading 1st HDD, unpacking, scanning, then re-packing, and writing to 2nd HDD.

This could definitely be slowing down your system as it is having to do smaller "chunks" to avoid over-flowing the memory, which would definitely reduce through-put.

If this is a one time thing, there's really nothing you can do about it. However, if you're doing this on a regular basis I'd suggest cutting down the Virus scanning to once or twice a week (preferably on you days off). If that's not praticle, looking into "smart" virus scannig programs that will only scan new/altered files to save you the overhead.
January 7, 2007 8:06:34 AM

Turning Write-Back caching off caused a massive performance hit - I will not be trying that again (Well maybe just to time it, to record just how much slower it was).

I am not performing purely sequential I/O, as stated above my I/O queue sits between 4 and 64 quite often.

Sure the backup is sequential but with all the other stuff going on at once the drive heads spend at least 80% of their time moving and under 20% of their time transferring data.

With an OS Disk Cache between 2.0 - 2.5 GB the write-back cache made a massive increase in performance vs your suggestion, I just need to command queue the data from the flushed write-back cache (when it starts to flush, which is only after about 2 GB of reads have occured).


================================================

The virus scanner wasn't dealing with the archive, I was:

-I extracted all files in archive, with real-time scanning, over a 100 Mbps network link.

- As each file was written to disk the real-time scanner cleaned the virus infections. (The data was not in an archived state at this time).

- Then I re-packed archive. With the bulk of the original data, since the infection was not massive.

When scanning within archives it does not need to re-pack, because it isn't changing anything, just unpack, scan, next file, unpack, scan, [repeat until end of archive]. - I am not aware of any Anti-Virus products that modify archive files for this reason, most just Ignore or Quarintine archives.

Quote:
This may be "log-jamming" your memory management as the CPU tries to juggle between reading 1st HDD, unpacking, scanning, then re-packing, and writing to 2nd HDD.


As above - I only have 1 HDD in this scenario.

================================================

Thoughts:


If I could “I/O Meter” 'record' this scenario would anyone else be willing to play it back, to get a better feel for what is going on ?

The scenario would be a multi-gigabyte file (I think), not sure on the ceiling for I/O Meter, but it'd take me awhile to upload to someone else. - Sure it'd compress a bit...

It should be plain as day that NCQ will make a +30%to +50% difference in this scenario.

Turning the write-back cache off is the most bizarre suggestion I've ever heard but I will benchmark it none the less, since it came up.
January 7, 2007 8:39:38 AM

I expect a 10,000 rpm HDD vs a 7,200 rpm drive would lead to a 2.12x to 2.94x times performance increase in this scenario. This is assuming both HDDs are the same size. (300 - 320 GB)

Instead of spending 80% of the time seeking, and 20% of the time transferring data, it would spend 57.6% of the time seeking and 42.4% of the time transferring data.

This is assuming the drive heads can move from the inner to outer zone of the HDD +39% faster (relative to size of the HDD, and other factors).

This is converse if the scenario was spending only 20% of the time seeking, and 80% of the time transferring data - I'd only gain between 1.07x and 1.486x times the performance.


Question is - How cost effective is this solution ?


Regarding the RAID-1 comment, don't you mean RAID-0 ?
(Honest enough typo, unless you're serious about mirroring vs striping ?)

I also do use Incremental Backups, However I wish to run a Full Backup once a week - once a month at least if storage space permits.

================================================

Here is HD Tach normally, and HD Tach when I am performing similar tasks using the last 3rd of the HDD.

January 7, 2007 9:52:20 AM

a 10k HDD wont be 2.12x faster:/ that would need a 15264krpm HDD...if you divide one by the other

...but according to my calculator it should be 38.8% faster than a 7200 drive...:/
January 7, 2007 10:13:26 AM

It will if the scenario is 100% data transfer - which no scenario can ever be. There must always be an element of seek time in every scenario.

Thus it will be much faster than you're thinking with that calculation - Which is only ever a true & accurate calculation if, and only if, you are comparing data transfer rate, with 0 seek operations, on drives with platters that are exactly the same dimensions, with exactly the same zone layout, and the heads move exactly +38.888...% faster.

However since 10,000 rpm HDDs and 7,200 rpm HDDs don't have platters that are exactly the same dimensions the formulae is moot.

As above, in if 80% of the time is spent seeking, and that can be reduced to 57.6% of the time, then data transfer has increased from 20% of the time, to 42.4% of the time.

20% vs 42.4% = 2.12 - Worse case: 1.5264 x, Best case: 2.94 x

This is why 10,000 rpm drives are so popular with enthusiasts, they are getting more performance than perhaps they know.

I however am more concerned with cost effectiveness so sure, if it is cost effective to get a 10 K-rpm drive to try out, then so be it.
January 7, 2007 5:06:16 PM

Quote:
Regarding the RAID-1 comment, don't you mean RAID-0 ?
(Honest enough typo, unless you're serious about mirroring vs striping ?)

yeah thats a typo, should be Raid 0

I still disagree on NCQ, you are only performing three tasks. Two of which are extremely sequential.

Extracting a RAR is very sequential.
Backups are sequential.
The only thing that isnt sequential is the virus scan. This as you have discovered is not the slowdown as it is only active when you are doing new writes. Reads from the HDD are usually not scanned. I say usually as it depends on how high your AV settings are jacked up to.

NCQ is meant to reorder commands to reduce the head movement, which reduces wear and tear on the drive. Its not specifically meant to increase performance, which only happens in rare instances, none of which seems to apply to your situation. Perhaps if you had SQL running or were doing file sharing then NCQ might show some sort of improvement.

In any means the only way to increase throughput would be to combine drives in Raid 0/5/10/etc.
The only way to increase seek times is a faster spindle. 7.2k/10k/15k ~8.5ms/~3.4ms/~1.5ms respectively.

Unfortunately the law of diminishing returns makes these solutions less cost effective as a single drive but you need to weigh cost effectiveness with increased usability vs slowdown to find the right solution for you.
January 7, 2007 7:05:49 PM

Quote:
Regarding the RAID-1 comment, don't you mean RAID-0 ?
(Honest enough typo, unless you're serious about mirroring vs striping ?)

yeah thats a typo, should be Raid 0

I still disagree on NCQ, you are only performing three tasks. Two of which are extremely sequential.

Extracting a RAR is very sequential.
Backups are sequential.
The only thing that isnt sequential is the virus scan. This as you have discovered is not the slowdown as it is only active when you are doing new writes. Reads from the HDD are usually not scanned. I say usually as it depends on how high your AV settings are jacked up to.

NCQ is meant to reoder commands to reduce the head movement, which reduces wear and tear on the drive. Its not specifically meant to increase performance, which only happens in rare instances, none of which seems to apply to your situation. Perhaps if you had SQL running or were doing file sharing then NCQ might show some sort of improvement.

In any means the only way to increase throughput would be to combine drives in Raid 0/5/10/etc.
The only way to increase seek times is a faster spindle. 7.2k/10k/15k ~8.5ms/~3.4ms/~1.5ms respectively.

Unfortunately the law of diminishing returns makes these solutions less cost effective as a single drive but you need to weigh cost effectiveness with increased usability vs slowdown to find the right solution for you.

The only way to increase seek times is a faster spindle. 7.2k/10k/15k ~8.5ms/~3.4ms/~1.5ms respectively.
Your math is a bit off.

7200rpm = 1rev / 8.333 msec = average spindle latency of 4.166ms. Access time is on average 12.2 ms.
10000rpm = 1rev / 6 msec = ASL 3ms. Access time is on average 7.5ms.
15000rpm = 1rev / 4 msec = ASL 2ms. Access time is on average 5.5ms.

Seek times for hard drives is around 3.5ms for 15000, 4.5ms for 10000, and around 8ms for 7200rpm drives.
Access time is rotational latency + seek time.

Source: StorageReview, and own 15K SCSI drive. I get 5.6ms seek times.
January 7, 2007 8:34:50 PM

:roll: That was why i included the "~" sign which means roughly. And it really depends on where the head is in relation to the data it is seeking. Averages incorporate worst case scenarios, so the numbers can be much lower with a drive thats head isnt parked.

The point was spindle speeds have a correlation to seek/latency/access times.
Not that a rough estimate off the top of my head was a ms off. :wink:
January 7, 2007 9:03:00 PM

Quote:
:roll: That was why i included the "~" sign which means roughly. And it really depends on where the head is in relation to the data it is seeking. Averages incorporate worst case scenarios, so the numbers can be much lower with a drive thats head isnt parked.

The point was spindle speeds have a correlation to seek/latency/access times.
Not that a rough estimate off the top of my head was a ms off. :wink:


I just have issues with your statement:
"The only way to increase seek times is a faster spindle. 7.2k/10k/15k ~8.5ms/~3.4ms/~1.5ms respectively. "
The numbers for seek time are roughly correct (7200rpm being spot on, but 10K and 15K seek times far off), but the concepts are wrong (see below for explanation).

"The point was spindle speeds have a correlation to seek/latency/access times.
Not that a rough estimate off the top of my head was a ms off. :wink:"

But a ms means a lot, when you're dealing with 3ms seek times :-D

Wrong/Right/Right.
Seek times don't have anything do with spindle speed. Seek time is the time spent moving the head to the correct cylinder. Platter diameter (Not surprisingly, 10000 and 15000rpm hard drives tend to use smaller platters!), voice coil speed (ie. magnet and voice coil strength), and drive mechanics determine seek times.
Rotational Latency directly depends on spindle speed.
Access time is the two above times combined. It has a good dependency on spindle speed, but is not directly determined by spindle speed.
January 7, 2007 9:34:25 PM

thanks forum troll but you just broke something down and prooved the same point.

im not wasting time arguing with a troll over a ms when the point is increased spindle speeds equates to better access times.




DONT FEED THE TROLLS
January 7, 2007 9:55:29 PM

Quote:
thanks forum troll but you just broke something down and prooved the same point.

im not wasting time arguing with a troll over a ms when the point is increased spindle speeds equates to better access times.




DONT FEED THE TROLLS


Quote:
thanks forum troll but you just broke something down and prooved the same point.

Apparently, you don't seem to understand the subtleties of hard drive performance parameters; seek time, latency, and access time do NOT mean the same thing.

Look, stupid, you said SEEK times and I called you out on it.
Quote:
"The only way to increase seek times is a faster spindle. 7.2k/10k/15k ~8.5ms/~3.4ms/~1.5ms respectively. "

Don't even try to do a "tangential segue" by saying that you were addressing "access times" when you stated "seek times". Give up. You're not as good as BaronMatrix at this tactic, who can sometimes slide it by me.

Quote:
"The point was spindle speeds have a correlation to seek/latency/access times.
Not that a rough estimate off the top of my head was a ms off. Wink"

Wrong again, even when I pointed it out to you the first time. :roll: *sigh* Some people just never learn.

Spindle speed has no connection to seek times.
http://www.storagereview.com/guide2000/ref/hdd/perf/per...
Quote:
Seek time is almost entirely a function of the design and characteristics of the hard disk's actuator assembly. It is affected slightly by the read/write head design since the size of the heads affects movement speed.

Why don't you show me where spindle speed comes into play for seek times.

While you're there, you ought to check out:
http://www.storagereview.com/guide2000/ref/hdd/perf/per...
http://www.storagereview.com/guide2000/ref/hdd/perf/per...


Quote:
im not wasting time arguing with a troll over a ms when the point is increased spindle speeds equates to better access times.

Ah, the classic "defensive counter accusatory" tactic. You call me a troll, when I'm trying to point out one of your mistakes. You might want to look at yourself to determine whether YOU are exhibiting troll-like behavior.

I'm not wasting time arguing with someone with an incredibly overinflated ego who can't accept that they made a mistake. Your head's gotten a bit too big lately, maybe you should take a chill pill, n00b. Face it, you made a mistake. A real man would admit it and move on, a more educated man. Are you man enough to stop resorting to dumb@ss BaronMatrix tactics? I think not. Why don't you try to prove me wrong? It would make the world a better place.

DON'T FEED THE TROLLS back at you. If you have the maturity to take the flamewar out of the thread, we *could* PM each other.

TabrisDarkPeace, I apologize for hijacking the thread. It is irresistible for me to prevent the spread of misinformation.

My on-topic contribution to the thread is, that you should use IOmeter to benchmark your drive.
http://www.iometer.org/doc/downloads.html

An example of the difference between NCQ and no NCQ:
http://www.storagereview.com/articles/200510/Testbed4_6...

The NCQ drives increase in performance with an increase in outstanding requests, the no-NCQ drives remain at a constant level of performance.
January 7, 2007 11:57:10 PM

keep on trolling. perhaps youll find a word or two more out there that someone accidently wrote out that you can use to inspire your paranoia and write a lengthy trolling response for little reason.

seek/latency/access times, one little slip and you act like a tool. get over it spaz.
January 8, 2007 12:49:25 AM

Quote:
keep on trolling. perhaps youll find a word or two more out there that someone accidently wrote out that you can use to inspire your paranoia and write a lengthy trolling response for little reason.
Keep on acting like an egotistical n00b. Perhaps when someone points out another of your mistakes, you can repeatedly call them a troll for no valid reason.

Quote:
seek/latency/access times, one little slip and you act like a tool. get over it spaz.

One little mistake, and you act like a fool.
Here are some tips to further your own efforts to look like a fool (thanks for the amusement!)
When you make a mistake, at all costs, do not admit to making a mistake -- act defensively instead. Use tangents to shift the argument in another direction, and counter-accuse your opposition by calling them a troll. Rinse and repeat until the opposition backs off.
Oh WAIT! You already do that!

Thanks for providing me amusement, you've responded a lot more immaturely than I ever would have anticipated!

You made a mistake, and you're acting like a child because someone's actually noticed. Get over it and admit to making a mistake, instead of resorting to flaming, you idiotic egotistical trolling n00b.
January 8, 2007 1:00:34 AM

actually if you look up the post a little further youll see where i wrote raid1 instead of raid0. i boned up to that mistake right away, partially because to poster suggested so like a half normal person instead of writing a half page flame post like a tool.

spastic troll.
January 8, 2007 1:05:17 AM

Quote:
actually if you look up the post a little further youll see where i wrote raid1 instead of raid0. i boned up to that mistake right away, partially because to poster suggested so like a half normal person instead of writing a half page flame post like a tool.

spastic troll.


actually if you look up the post a little further youll see where i talked about access time and seek time. You didn't own up to that mistake ever, and it doesn't matter if I wrote a half page flame post or not.

spastic troll. tool. idiot. childish n00b. egotastical fool.
January 8, 2007 1:08:24 AM

simply asking if i meant access times or latency would have been more of an adult move instead of ranting for a half page. keep on trolling.
January 8, 2007 1:15:23 AM

Quote:
simply asking if i meant access times or latency would have been more of an adult move instead of ranting for a half page. keep on trolling.


Excuse my arrogance, but how much of my first post in here is your post and not mine? Yeah, kind of makes your "half page rant" argument baseless.

You lost your right to be treated civilly (ie. like an adult) as soon as you started namecalling.

Keep on trolling.
a c 168 G Storage
January 8, 2007 1:58:56 AM

May I suggest an experiment? Run the three jobs single threaded, one after the other, not concurrently. I think your total throughput will improve nicely.
On a single drive, the three tasks are interfering with each other. With multiple partitions, even a single job is interfering with itself.
NCQ should not be an issue for you. It is designed for the server environment with many short unrelated random requests.
Raid-0 should not be an issue. It implies you have more than one physical drive available, and for your situation, it would be best to allocate each of the appropriate files to a separate physical device.
!