Sign in with
Sign up | Sign in
Your question
Solved

SSD vs HDD Linux performance compared - minimal advantage at best?

Last response: in Storage
Share
January 23, 2014 3:24:24 AM

Hello storage experts:

************** Introduction:

I recently built a system:

Supermicro X10SAE-O ATX DDR3 1600 LGA 1150 Motherboard
CPU: Intel Core i3-4130 LGA 1150
RAM: Kingston KVR1333D3E9S/4G 4GB 1333MHz DDR3 ECC CL9 (2)
HDD: WD WD10EZEX 1 TB, 3.5 Inch, 7200 RPM, SATA 6, 64 MB Cache (boot disk)
SSD: Samsung 840 EVO-Series 250GB SATA III SSD (not boot disk)
PSU: Corsair CX Series 430 Watt Power Supply CX430M
OS: Ubuntu Linux 12.10
SSD fstab: /dev/sdb /home ext4 defaults,errors=remount-ro,noatime,nodiratime 0 1

Case: Cooler Master FOR-500-KKN1 Force 500
DVD-RW: Asus DRW-24B1ST 24X (Black) (x 2)

http://www.pcworld.com/article/2048120/benchmarks-dont-... shows a typical glowing report about how great an SSD is. I was intrigued by getting the SSD, after reading such glowing reports. But I figured I should do some tests to compare how much faster it was than the HDD.

************* Methods:

I did a comparison on three real-world, time-consuming tasks with a lot of disk IO, that I carry out pretty often as part of my software development. These are the whole reason I got the SSD, the SSD would be of no use if it did not speed up these tasks. First I did the task on the hard disk drive (HDD). Then I did the exact same task, with exact same files but on the solid state disk (SSD). I measured how many seconds ("real" result from Unix time command) the task took to run.

The details of the tasks don't exactly matter, but here they are:

Task A) Repeatedly compile some 300,000 lines of source code to make an executable for running on the desktop. Each compilation produces one version of the Vitalnet data analysis software. For example, one version analyzes Texas birth data, one version analyzes California death data, another analyzes Iowa cancer data, etc. There are some 20 versions created, so some 20 compiles. It compiles the same C source files each time, with different gcc defines to produce different result.

Task B) Convert some 250 web pages needed for ehdp.com web site. The web pages are in a "source code" format, and get converted to the final .htm format. Uses shell scripts and lots of temporary files. Again, same files get repeatedly converted, with different defines.

Task C) Convert help files for the Vitalnet desktop versions. Each desktop version has about 50 help files, so about 1000 help files are converted. Uses shell scripts and lots of temporary files.

************ Results:

HDD SSD Task
-------------------------------------------------
134 134 A) compile source code
19 19 B) convert web pages
100 100 C) convert help files
-------------------------------------------------

All the tasks are using all the cores of the i3-4130 cpu. I use "make -j 4" and xjobs to accomplish that, and double-checked with top (1) command that all cores are working equally.

*********** Discussion:

So with these real-world tasks the SSD does not help any on the linux computer, at least with this configuration and tasks. The results are EXACTLY the same, although I am sure the first number is for the HDD and the second for the SSD.

I might have guessed this before. I work a lot with large data files, and had done other tests years ago about disk IO on linux. The linux OS seems very fast to read the disk, can read very large files in virtually no time, based on tests I did before. The first time is fast, but the effect really shows up once a file has been read. It caches read files to RAM, so the second time the file is read, it is basically instantaneous, even for a huge file. Also, Linux writes to memory first (not the disk), so the writing is very fast too. My understanding is that the data gets written from memory to the disk a little later, the user hardly notices the delay, and everything is very fast. I do not know the inner techical details, but the effect is easy to demonstrate by using "cat" on a large file, and then the second time the result is instantaneous.

Some may question that I messed up, and it's not believable results are exactly the same. I understand that, because I have read so many glowing recommendations for SSD storage. Well, it is possible I messed up, and I would be open to any suggestions. But I have used Unix for decades, am very familiar and adept. Here are more details how I did comparison: The HDD test was run with the SSD not mounted. So there is no question about that result. For the SSD test, I renamed /home directory to /home.old (to get the HDD files out of the way). Next, I created /home directory with same permissions, and did "mount /home", verified lost+found present there. Next, I used rsync to make exact duplicate of files in /home as /home.old directory. I used Unix time command and read the "real" time as the result. All the temporary files, .o files, source files, output files are under /home directory, /tmp directory not used. I set TMPDIR to be under the /home directory, to make sure things were not being written to /tmp directory. I verified that everything was taking place in the /home directory (not in /home.old directory).

To test the idea that the file caching is behind the same performance in these repetitive tasks, I rebooted the computer, and from a fresh start (no source files cached) just did one compile. With the SSD, the single compile took 3.77 seconds, the HDD single compile took 5.62 seconds. That's a big percent difference, the HDD takes 50% longer. But the absolute difference is about two seconds, not a big deal. It is consistent with the idea that the SSD does much better on the FIRST read, but no difference once the file is cached.

I did another "time cat file > /dev/null" test with a 4 GB file after a fresh boot. The HDD took 3.83 seconds on the first time, 0.057 seconds the second time (file is cached). The SSD took 0.76 seconds on the first time, 0.059 seconds the second time (file is cached). So with reading a really big file the first time, the SSD is MUCH faster, only takes 1.5% of the time as the HDD. But again, the absolute difference is just a few seconds, and the second time the giant file is read, the time is identical.

Possibility 1) With a more powerful cpu (I have i3-4130), it's possible the disk would become the bottleneck, and the SSD would be useful on the repetitive tasks. But based on my understanding and observations of the Linux caching behavior, I doubt that.

Possibility 2) SSD is minimally helpful at best on a Linux system with real-world repetitive tasks, at least those in this particular development environment. This is a disappointment, I shelled out money, did not get desired result. It would speed up reading really big files the first time, but that seems about it.

Possibility 3) I'm missing something, not taking something into account, would like to hear how I can take advantage of the SSD.

When I look back at some of those glowing SSD reviews, I have to wonder if their benchmarks are realistic. Also, a review that only promises faster boot time is not much of an advantage in my opinion. I did not try comparing boot time on the Linux computer between SSD and HDD, because the boot time is normally of small importance to me, and it would be a bother to install Linux to the SSD for the comparison. How often do I boot a computer, and is it such a big deal if it boots in 10 vs 20 seconds? I do see an advantage with booting a laptop faster, where I usually want quick access.

Anyway, I am planning to probably move the SSD off the linux, because it speeds up little or none, at least for the real-world programming / data tasks I do. I have not tried the ssd on the windows 7 computer (under construction) yet, where I will mostly do word processing, email, etc. Maybe it will help there.

I know maybe all this analysis is perhaps too "heavy" for this forum, but I hope someone takes the time to read it and give me some feedback.

Daniel

January 23, 2014 9:10:22 AM

The other thing I should mention is that the 250 GB SSD has about 50 GB of files on it. So there is not a performance problem from the SSD approaching being full.

I look forward to any responses, anyone else who has done other real-world tests, anyone who might point out something I'm missing.
m
0
l

Best solution

February 22, 2014 12:22:02 PM

Daniel-

Just a quick response to let you know my experience. I too am a software developer, and I have a very serious interest in my builds run faster. I have a quad-core i7 and I typically run -j8 when I'm willing to lock up my machine for build time. Usually -j4.

Builds do have a lot of drive I/O. But you might be limited by other characteristics of your machine. I agree with your statement that the build times themselves might be having an impact, and therefore you see no serious benefit for SSD vs HDD. You seem pretty well versed, so I'll assume that all .o s are going to the SSD drive in your test.

But anyways, to the point. I have to run Windows for quite a bit of work (tools, documentation). I needed to run a windows virtual machine because I cannot keep re-booting to windows just to do some stupid word doc. What I found was that having the virtual drive stored on my HDD, the boot-up time and run time were intollerable. Now, storing the virtual machine to the SSD drive, it extremely fast in comparison to what it was. That makes it tolerable (not blazingly fast).

Also, I am using the SSD drive for my root partition. If you're just doing a test, this might be a little scary to undertake. But for me I wanted to improve the performance of my machine dramatically. SWAP and ROOT really need to be on the SSD drive, to improve your total experience, IMHO. I've just re-verified that both my swap and root are on the SSD drive. For me, boot times for linux went from 30-40 seconds (Ubuntu 12.04) down to really closer to 15 seconds or less. Not that I boot that often, but that was a sure sign for me of improvement (my current up-time is 63 days, since the last time I had to replace the power supply!).

The drive I am using is a Samsung SSD 840 PRO Series, only 128gigs. So my HDD is the primary storage for my .o files and other content.

Anyways, I hope that helps a litte. In summary

1) SSD drive for me improves boot times dramatically. I believe it has impacted general execution (SWAP), but I have a lot of RAM so I'm not really sure.
2) I am not using an SSD for my primary build storage
3) I am using the SSD for storing my windows virtual machine, and it has dramatically improved performance of that virtual machine.

Good Luck.




Share
February 23, 2014 9:27:42 PM

dixter1 -

Thanks for the helpful information.

I ended up moving the SSD to the other build, the one for a new Windows 7 machine, which I run separately, not as a virtual machine. The Windows 7 boots VERY fast, just a second or two. I like that a lot, but it's hardly worth the extra cost. I have not tried any systematic performance test on the Windows 7 computer, as it would be a waste of time for me, because even if there were no performance difference, I am not going to go to the trouble of returning the SSD. Also I don't do the build processes on the Windows PC, so I don't have the convenient real-world test suite. I'm certain it dramatically decreases the boot time, so at least it is good for something. Related to that, it is very believable that storing a virtual machine on an SSD would make it come up a lot faster.

The new build is a dramatic improvement over what I had before. But in retrospect, for my situation, it would have been better to have spent the extra money for a more powerful CPU with more cores / threads (instead of SSD and extra RAM). But at the time I ordered I didn't understand about "make -j" and "xjobs". Using all the cores / threads is very powerful for many different software development operations. And it's not all that difficult, especially just using the make -j option (duh).

Yes, all the .o files were going to the SSD in the tests.

I did not put the swap on the SSD. Maybe that might have had an effect, but I doubt it, my impression is that the physical RAM is used first, and it never got anywhere near used up. But the only way to know for sure would be to test, which I did not do.

Thanks,
Daniel

m
0
l
!