Enthusiast, Workstation, Data Center Performance
Optane Consumer Workloads
Intel broke out the PCMark Vantage benchmark to highlight the difference between the enthusiast-oriented Intel 750 and the Intel Optane SSD prototype. Optane provides a large improvement in theoretical bandwidth during the tests, particularly in the gaming and Vista startup tests. Of course, we would prefer to see newer and more accurate version of the benchmark, such as PCMark 8. Also, developers haven't optimized today's games and applications for flash storage, let alone something as exotic as 3D XPoint.
Intel prefers to use SPECwpc v2.0 to characterize performance improvements for workstation use-cases, and again we notice a big jump in the professional-class workloads. We expect the much higher performance in both enthusiast and workstation use-cases, but performance-per-dollar and whether the technology delivers a measurable impact on our everyday computing experience will factor heavily.
Optane Rendering
Intel positions its SideFX Houdini test as a client-class benchmark, but it’s more of a workstation use-case. We covered the benchmark results at Computex, but Intel provided more data on the benchmark at IDF. From a high level, the test consists of rendering a seven-second clip of 1.1 billion water particles. The task requires 35 hours with Intel's fastest SSD (presumably the DC P3608), but only nine hours with the Optane SSD.
The IDF 2016 demo included a recording of the CPU usage for both test platforms during the test. The 3D XPoint-powered system, on the right, features a much higher level of CPU utilization during the benchmark. CPU usage is an important measurement that highlights the true meaning of faster I/O, particularly for data center scenarios. With 3D XPoint, the system spends less time waiting on I/O, and thus unlocks the expensive CPU resource to perform more work. In a normal deployment, this can equate to more VMs per server, or more available application resources.
During a separate presentation, Intel also compared the same benchmark, but only rendered four frames for an easier comparison. An Intel 750 required 40.6 minutes, while the Optane only needed 13 minutes to complete the test. The orange portion of the bar trace up top (Houdini 3D Rendering slide) indicates the amount of time the system is waiting for incoming I/O.
As expected, the I/O wait time for the Intel 750 is much more prevalent at 70% of the CPU's active time. The Optane-powered demo rig only waited 20% of the time for I/O. Interestingly, the slide lists the Optane NVMe0n1 device (a Linux designation for an NVMe device) as 268GB, which is much higher than Intel's disclosed 140GB prototype capacity. The Intel 750 averaged 45,500 IOPS during the test, while the Optane provided 135,000. As we can see in the final slide with the two traces presented atop each other, the vast reduction in wait times leads to a 3x performance increase.
Optane Data Center Workloads
Synthetics tests are great because they highlight the amount of performance that is available to the application, but unfortunately, many applications aren't tuned to exploit 3D XPoint fully. RocksDB is one of the few programs that can unlock the performance of speedy non-volatile media. RocksDB is an open-source, key-value database that offers increased performance and scalability compared to traditional databases. There are other databases, most notably Redis and Aerospike, that also unlock the performance of bleeding-edge non-volatile media.
Facebook and Intel are working together closely, and during an FMS keynote, Facebook outlined how it is redesigning its stack for 3D XPoint. Facebook is implementing the technology because of its radical performance improvements. During the RocksDB test, the Optane prototype provided 3X more throughput and a 10X latency advantage for 99th percentile measurements. It is somewhat interesting that Intel and Facebook chose the lower-endurance DC P3600 for the demo, because the DC P3700 is the stouter alternative for this workload.
The Optane prototype delivered 274,554 read-centric key "gets" per second, compared to 70,782 for the DC P3600. Shifting gears to 99th percentile latency, the DC P3600 weighs in with 1.9ms, while the Optane provides an impressive .125ms. Interestingly, Facebook and Intel did not provide performance information for "puts," which is the write-centric portion of the workload.
Of course, faster performance means the system utilizes the cores more efficiently to maximize the CPU investment, but it also equates to big savings in application licensing. Some application licenses can run into multiple tens of thousands of dollars per core (some Oracle licenses can be $50K per core), so squeezing out more license utilization can easily pay for the higher price of the storage solution.
MORE: Best Deals
MORE: Hot Bargains @PurchDeals