SSD Game Performance Compared with hIOmon

ao1

Honorable
Sep 23, 2012
8
0
10,510
My first post, although I have written some articles for Tom’s under my real name, Richard Hart. If you have read any of my reviews you will know that I use hIOmon as part of the “benchmark” suite, but the “real world” tasks have to date been focused on write activity. I’ve been looking at read intensive tasks that are relatively easy to replicate. The dynamic nature of the file system makes this task quite challenging, but some preliminary testing has revealed some interesting results and insights into how the file system operates, which I thought I would share.

So what are the challenges?

One of the first read intensive tasks I selected was a demo version of Crysis. After installing it I found something odd. When playing the game straight after it had been installed hIomon was not registering any I/O activity at the Physical Volume level. How could that be? Looking at the hIomon export files it became evident that all the data required to play the game was already in file system cache as a result of the installation processes!

Below is a partial screen shot of (an adulterated) hIOmon export file. In the line outlined in red it can be seen (as an example) that the ShaderCache.pak process generated 1,429 Read IOP’s. The Read System Cache IOP count is 1,428, which is the amount of I/O operations that were successfully performed in system file cache using the fast I/O path rather than the longer I/O request packet path. There was therefore no requirement to request data from the physical volume level.

http://imageshack.us/a/img441/1692/iopreadcount.png

The next catch was read ahead I/O activity, which can be triggered by simply hovering a mouse over a desktop icon, or by the file system deciding to retrieve a chuck of unrequested data based on a small read request, in anticipation of future use.

Finally it is necessary to reboot after the game has been played, because once played Crysis can remain in the file system cache, so playing it again without re-booting can result in virtually all the I/O activity being served via file system cache.

Here are some key metrics when the game is loaded from the physical volume to the logical disk file level.

Volume Level

• 1,802 MiB Read Total Xfer.
• 500,761 Read IOPS
• 443,485 System Cache IOP count
• 421 Cache Miss IOP Count

Physical Volume Level

• 806 MiB Read Total Xfer.
• 56,344 Read IOPS

The Cache Miss IOP count is recording I/O’s that could not be read from cache and had to be re- read from the physical volume.
Once the game is loaded in the file system and replayed the only I/O activity at the physical volume is due to missed system cache events, which results in a grand total of:

• 1.21 MiB Read Total Xfer.
• 12 Read IOPS

So, trying to avoid a number of potential pitfalls I monitored 5 different games with varying percentages of random I/O activity as below:

Crysis

Random I/O Percentage = 49%
Total Percentage of Data Xfered by Random I/O operations = 20%

Batman Arkham City

Random I/O Percentage = 45%
Total Percentage of Data Xfered by Random I/O operations = 17%

F1 2012

Random I/O Percentage = 65%
Total Percentage of Data Xfered by Random I/O operations = 46%

Hard Reset

Random I/O Percentage = 42%
Total Percentage of Data Xfered by Random I/O operations = 17%

Sleeping Dogs

Random I/O Percentage = 68%
Total Percentage of Data Xfered by Random I/O operations = 47%


http://imageshack.us/a/img543/4456/hiomon.png


All of the drives tested are 256 GB. The Crucial M4 does surprisingly well, whilst the Vertex 4 and Neutron GTX do not. I’ll add a post later to explain the Disk I/O Ranger metrics that resulted in some drives scoring better than others.
 

ao1

Honorable
Sep 23, 2012
8
0
10,510
Below are some screenshots of the hIOmon Disk I/0 Ranger that provided the basis for the graph results I linked earlier.

The hIOmon DXTI is calculated by taking the observed amount of data transferred, using the I/O operations converted to megabytes for scaling, divided by the combined sum of the actual response times of the I/O operations.

For Hard Reset the Crucial M4 came out with the highest DXTI score (238.775). In the screenshot below the majority of data was xfered in the 200 < 500 us range.


http://imageshack.us/a/img528/4433/crucialm4hardreset.png

The Neutron GTX came in with the lowest DXTI score (143.573). In the screenshot below the majority of data was xfered in the 500 us < 1 ms range, hence the lower DXTI score.

http://imageshack.us/a/img29/6812/neutrongtxhardreset.png

For the F1 2012 game the Plextor M5Pro came out fastest with a DXTI score of 382.873. The majority of data is xfered in the 500 us < 1 ms range.

http://imageshack.us/a/img706/6999/plextorm5prof1.png

The Vertex 4 came in slowest with a DXTI score of 272.815. The majority of data was xfered in the 1 ms < 5 ms range, hence the lower DXTI score.

http://imageshack.us/a/img842/7418/vertex4f1.png

The Vertex 4 (256 GB) and Neutron GTX are exceptionally fast when it comes to write performance, but read performance across all five games lacked lustre due to higher response times. The Crucial M4 on the other hand does much better than I had expected.
 

ao1

Honorable
Sep 23, 2012
8
0
10,510
I’ve got a sneak preview of the new version of hIOmon, which adds metrics for data xfer sizes.

In the screen shot link below I have captured metrics for Modern Warfare 3 Multi Player (I'm monitoring the physical device in this instance, so any I/O activity associated with the OS or back ground services are also included).

Key Read Metrics:

• Total number of I/O operations: 42,312
• Total amount of data xfered: 889.82 MiB
• Percentage of random I/O operations: 49.44%
• Percentage of data xfered via random I/O operations: 9.99%
• Percentage of I/O operations above queue depth one : 1.18%

Top three data xfer sizes:

1. 16 KiB – 32,044 occurrences representing 75.73% of the total I/O operations of which 2.15% were random
2. 4 KiB – 5,106 occurrences representing 12.07% of the total I/O operations of which 6.83% were random
3. 8 KiB – 1,256 occurrences representing 2.97% of the total I/O operations of which 0.98% were random

http://imageshack.us/a/img824/7021/mw3mp.png
 

ao1

Honorable
Sep 23, 2012
8
0
10,510
Physical Volume Data Xfer Sizes

Batman Arkham City (Fastest Benchmarked SSD – Crucial M4)

Top three data xfer sizes
1. 4 KiB – 31.24% of I/O’s. 26.56% random
2. 32 KiB – 4.42% of I/O’s. 3.72% random
3. 128 KiB - 3.88% of I/O’s. 0.25% random

Max data xfer size – 2 MiB (0.13% of I/O’s. 0.01% random

Crysis (Fastest Benchmarked SSD – Crucial M4)

Top three data xfer sizes
1. 4 KiB – 85.58% of I/O’s. 3.75% random
2. 8 KiB – 2.88% of I/O’s. 1.46% random
3. 32 KiB – 2.5% of I/O’s. 2.23% random

Max data xfer size –2 MiB (0.03 % of I/O’s.0 % random)

F1 2012 (Fastest Benchmarked SSD – Plextor M5Pro)

Top three data xfer sizes
1. 512 KiB – 20.21% of I/O’s. 3.77% random
2. 32 KiB – 16.57% of I/O’s. 11.95% random
3. 256 KiB – 15.84% of I/O’s. 10.13% random

Max data xfer size –512 KiB

Hard Reset (Fastest Benchmarked SSD – Crucial M4)

Top three data xfer sizes
1. 4 KiB – 30.51% of I/O’s. 23.17% random
2. 128 KiB – 27.22% of I/O’s. 3.44% random
3. 16 KiB – 13.13% of I/O’s. 5.2% random

Max data xfer size –128 KiB

Sleeping Dogs (Fastest Benchmarked SSD – Plextor M5Pro)

Top three data xfer size

1. 16 KiB – 64.92% of I/O’s. 36.84% random
2. 4 KiB – 13.54% of I/O’s. 11.08% random
3. 8 KiB – 5.8% of I/O’s. 3.6% random

Max data xfer size –64 KiB (0.28 % of I/O’s.0 % random)
 

ao1

Honorable
Sep 23, 2012
8
0
10,510
From observations using the hIOmon DTS it can be seen that read operations are mostly sequential with only a small percentage being random. I have also observed this in general usage, so I thought I would run some Iometer benchmarks using a mixture of random and sequential reads, with a heavy bias towards sequential.

SSD’s utilise write combining to convert random writes to sequential writes and consequently if you benchmark a SSD it will (typically) return similar results for either 100% random or 100% sequential writes at queue depth 1.

So what about reads? It seems that mixing random and sequential read operations makes a huge difference to performance!

http://imageshack.us/a/img39/3971/iometerresults.png