The DirectStorage Advantage: Phison IO+ SSD Firmware Preview

Comparison Products

We compared the Phison evaluation sample (ES) with the new I/O+ firmware to other prominent high-end PCIe 4.0 SSDs. The firmware, in this case, is designed for Phison’s E18 controller coupled with Micron’s 176-Layer TLC flash. The same hardware configuration is also present on the Kingston KC3000 and Sabrent Rocket 4 Plus that we included in the test pool, but they use standard firmware. This type of flash is also present on the HP FX900 Pro and Crucial P5 Plus, but those drives use different SSD controllers.

We also have proprietary designs with the SK hynix Platinum P41, the WD Black SN850, and the Samsung 980 Pro. WD will refresh the SN850 soon with a firmware update that will optimize the SSD for gaming, creating the SN850X.

DirectStorage Testing - Iometer

Games have to be coded to exploit the advantages of the DirectStorage API, but those games aren't yet available for testing. As such, we simulated performance using specific workloads in Iometer that are representative of the access patterns that the SSD will see with a DirectStorage-enabled game. DirectStorage works best with larger chunks of data, so we tested with 32 KB instead of the typical 4KB that we use to measure performance in operating systems. This is a 100% read benchmark because DirectStorage benefits most from random read performance.

We tested texture streaming with a varied amount of outstanding I/O requests using a combination of queue depth (QD) and threading. These workloads span from creating 64 outstanding I/Os (OIO) at QD32 and two threads (T2), to 1024 OIO at QD128 and eight threads (T8). 

DirectStorage gaming workloads will also often be sustained, so we graphed performance over a prolonged period of time. We tested for up to five hours, emulating an intense gaming session. This demonstrates an NVMe SSD’s ability to maintain performance without the potential for significant interruptions or lag. The SSD has to be kept cool during this intense benchmark, so we equipped our test SSD with a heatsink. 

The Phison ES unsurprisingly comes in first in the three snapshot tests, with performance reaching a peak of over 7GBps. This is significantly higher than the Kingston KC3000 and Sabrent Rocket 4 Plus, which, like the rest of the test group, have standard firmware. The WD SN850 comes just below the crop of E18 drives, suggesting a firmware update and new flash in the SN850X might be more competitive here. The Crucial P5 Plus is clearly not optimized for this sort of workload.

As you can see in the final three slides in the above album, the ES is unmatched in sustained performance. The KC3000 sticks with it for a decent length of time, but the experience is more jittery with prominent drops. The Sabrent Rocket 4 Plus also does fairly well, slower but more consistent than the KC3000. In fact, only the SN850 is reasonably consistent and fast, but it's still much slower than the ES. There’s no doubt that the new I/O+ firmware works incredibly well for this type of workload.

While we're focused on sustained random reads here, Phison says this firmware can also make good use of the flash’s high write speeds. As we have seen in our FireCuda 530 review, in its native mode, Micron’s 176-Layer TLC flash is often twice as fast as older flash (when the SLC cache is exhausted). We will test this tendency on future products, but it's outside the scope of the preview.

Trace Testing - 3DMark Storage Benchmark

With our DirectStorage emulation test out of the way, we'll assess the impact in the rest of our standard benchmarks. The remaining benchmarks aren't specifically coded to exploit the benefits of BypassIO, but there are some caveats to explore.

Built for gamers, 3DMark’s Storage Benchmark focuses on real-world gaming performance. Each round in this benchmark stresses storage based on gaming activities including loading games, saving progress, installing game files, and recording gameplay video streams.

Performance remains strong with the new firmware in 3DMark. The SK hynix Platinum P41, though, beats all of the competition. The I/O+ firmware will come in PCIe 5.0 contenders eventually, including drives with Phison’s E26 controller. There are already drives announced using this controller, and they'll come with even faster flash.

Trace Testing – PCMark 10 Storage Benchmark

PCMark 10 is a trace-based benchmark that uses a wide-ranging set of real-world traces from popular applications and everyday tasks to measure the performance of storage devices.

The Phison E18 controller isn’t quite as fast in the PCMark benchmarks, as demonstrated by the Phison ES, KC3000, and Rocket 4 Plus results. This is one of the few benchmarks where the P5 Plus does really well. All of the drives trail the stellar Platinum P41.

Transfer Rates – DiskBench

We use the DiskBench storage benchmarking tool to test file transfer performance with a custom 50GB dataset. We copy 31,227 files of various types, such as pictures, PDFs, and videos to a new folder and then follow-up with a reading test of a newly-written 6.5GB zip file.

These drives are mostly at the same level during the read workloads, limited by a bandwidth cap with these high-end PCIe 4.0 controllers. However, the Phison ES does quite well during file copies and doesn't stray from what we would normally expect from the E18 SSD controller. The Platinum P41 is on top yet again as it makes very good use of its SLC caching scheme.

Synthetic Testing - ATTO / CrystalDiskMark

ATTO and CrystalDiskMark (CDM) are free and easy-to-use storage benchmarking tools that SSD vendors commonly use to assign performance specifications to their products. Both of these tools give us insight into how each device handles different file sizes.

Phison controllers tend to do well in ATTO’s sequential performance tests, so there are few surprises here with the ES. Overall, the results are excellent.

This also often translates to good returns in the sequential benchmarks in CrystalDiskMark, but the ES drive seems to falter a bit with the QD1 read workload. This is likely an anomaly in our testing or something that Phison could smooth out in the firmware.

The I/O+ firmware is optimized for random reads, specifically 32KB+ chunks often paired with a very high queue depth. Some performance loss with 4KB random at QD1 is not a significant trade-off with this consideration, as the baseline performance is still adequate. Phison says that performance will, at worst, be comparable or only slightly better for non-optimized workloads, so we expect the finalized firmware to smooth this out.

We also see that the P5 Plus and Samsung 980 Pro, and at times even the SN850, can do relatively poorly in some benchmarks. This could be due to using older flash or, as with the P5 Plus, a work-in-progress optimization for its newer proprietary controller. WD has managed to get a lot out of its flash with solid firmware design, and Crucial can source excellent flash from its counterpart Micron. Perhaps the real story is that SK hynix has translated its OEM experience quite well, like WD, while also having cutting-edge flash. Samsung feels downright stodgy in comparison.

Sustained Write Performance and Cache Recovery

Official write specifications are only part of the performance picture. Most SSDs implement a write cache, which is a fast area of (usually) pseudo-SLC programmed flash that absorbs incoming data.  Sustained write speeds can suffer tremendously once the workload spills outside of the cache and into the "native" TLC or QLC flash. We use Iometer to hammer the SSD with sequential writes for 15 minutes to measure both the size of the write cache and performance after the cache is saturated. We also monitor cache recovery via multiple idle rounds.

Although our testing today focuses on performance in read workloads, we want to take a quick look at Phison’s claims that write performance may also improve.

Sustained write performance with the ES is very similar to what we see with the Sabrent Rocket 4 Plus, which was updated to use the same 176-layer flash. The Sabrent Rocket 4 Plus comes with a smaller SLC cache than some other drives with this controller, like the Kingston KC3000, so it absorbs fewer writes but has higher and more consistent performance outside of the cache. This is a good match for a firmware that wants to achieve solid sustained performance.

Although overall write throughput is about the same, sustained writes appear more consistent than the Rocket 4 Plus when they hit the native TLC. This isn't a huge deal and won’t impact general use, but it does mean that background improvements to the firmware can help tune this controller for heavier workloads. This is not surprising as there is a data center (DC) variant of the E18 controller, and the E26 will be used in both client and enterprise environments. Overall, the performance consistency is good.

Power Consumption and Temperature

We use the Quarch HD Programmable Power Module to gain a deeper understanding of power characteristics. Idle power consumption is an important aspect to consider, especially if you're looking for a laptop upgrade as even the best ultrabooks can have mediocre storage.

Some SSDs can consume watts of power at idle while better-suited ones sip just milliwatts. Average workload power consumption and max consumption are two other aspects of power consumption, but performance-per-watt is more important. A drive might consume more power during any given workload, but accomplishing a task faster allows the drive to drop into an idle state more quickly, ultimately saving energy.

We also monitor the drive’s temperature via the S.M.A.R.T. data and an IR thermometer to see when (or if) thermal throttling kicks in and how it impacts performance. Remember that results will vary based on the workload and ambient air temperature.

Although this drive is not the most efficient, it does do significantly better with the new firmware. Efficiency may be a secondary concern for high-end desktop gaming, but it’s still nice to see improvements here.

We recommend using a heatsink for PCIe 4.0 drives, especially high-end ones. While this drive does not act out of the ordinary with our normal test suite, the DirectStorage simulations did make us concerned about throttling. Therefore, a heatsink will be a worthwhile investment.

Test Bench and Testing Notes

Swipe to scroll horizontally
CPUIntel Core i9-11900K
MotherboardASRock Z590 Taichi
Memory2x8GB Kingston HyperX Predator DDR4 5333
GraphicsIntel UHD Graphics 750
CPU CoolingAlphacool Eissturm Hurricane Copper 45 3x140mm
CaseStreacom BC1 Open Benchtable
Power SupplyCorsair SF750 Platinum
OS StorageWD_Black SN850 2TB
Operating SystemWindows 10 Pro 64-bit 20H2

We use a Rocket Lake platform with most background applications such as indexing, windows updates, and anti-virus disabled in the OS to reduce run-to-run variability. Each SSD is prefilled to 50% capacity and tested as a secondary device. Unless noted, we use active cooling for all SSDs.

Conclusion

The Phison evaluation sample with I/O+ firmware performed mostly as expected, matching drives with similar hardware while demonstrating impressive improvements in the Iometer tests. We would expect this firmware to be overall superior with some further refinement — even if they are minor, improvements aimed at scheduling and flash management should bring general uplifts in classic workloads. You are nevertheless fine with the old firmware as performance is already quite good.

The real improvements come with the simulated DirectStorage tests, which emphasize next-gen workloads. It may sound crazy to test with 1024 outstanding I/Os, and the unoptimized SSDs found this sort of testing quite difficult. However, to meet DirectStorage criteria, drives must be able to sustain high queue depth random reads over a large data range and time period. Microsoft Windows finally has an API that allows for better efficiency and performance out of SSDs, which, while touted for games, has wider potential application. Firmware changes will be necessary to get the most out of existing hardware, although it may yet be some time before that drives are pushed this hard in actual workloads. However, Phison does point out that Forspoken's first public demo ran at medium detail and required a steady 4 GBps stream from the SSD, meaning we could see similar or higher performance demands from some optimized game titles. 

Showing the edge cases demonstrates performance when the drive is under intense pressure. The I/O+ firmware did deliver incredible sustained performance. We would like to see a lot more from this in terms of various benchmarks, but as a preview, it does offer a glimpse at the direction things are going. If these workloads seem closer to enterprise than consumer tests, that’s because they are; the “any SSD is adequate” naysayers have met their match. This is a win-win as it should encourage the development of software that can harness all that PCIe solid state storage horsepower.

Of course, consumer SSDs are generally not as robust as those found in the enterprise market. Luckily, BypassIO and these benchmarks are focused on reads. While that does introduce the issue of read disturb — which previously has not been significant for consumer use — it does avoid some pitfalls associated with write workloads. Consumer SSDs have SLC caches and are not optimized for sustained writes in most cases. However, mitigating the performance and wear effects of sustained workloads will become even more important, and it seems Phison has anticipated this.

We are excited about this direction in storage. There’s a lot more to come — we'll be sure to test new drives with new benchmarks, particularly real-world gaming, as they arrive. 

MORE: Best SSDs

MORE: How We Test HDDs And SSDs

MORE: All SSD Content

Shane Downing
Freelance Reviewer

Shane Downing is a Freelance Reviewer for Tom’s Hardware US, covering consumer storage hardware.

  • -Fran-
    Just out of curiosity... How would a SAS or sATA HDD behave in these tests? Even in RAID 0 would be interesting. More than anything, just to know how far these two are from each other.

    EDIT: "these tests" as in the QD32+ and 64KB+ blocks.

    Regards.
    Reply
  • alceryes
    I think DirectStorage will only make a mediumish splash with gaming, and only in the mid to low tier space.
    PCIe 4+ is lightning quick. Put together a PCIe 4+ performance system and you're loading up NVMe-optimized games in 7 seconds or less anyway. Yes, DS could potentially take that down to 3 seconds but, meh.

    The unexpected gains of DS will be in the mid-performance gaming systems. Not only will games load much quicker on middling-performance storage mediums but, if you were sometimes hitting a CPU bottleneck due to a mid-performance CPU, DS may be exactly what you need to relieve 3-4% of the CPU workload by moving the asset decompression stage from the CPU to the GPU.

    ...and, it's free so, yeah, good stuff all around.
    Reply
  • salgado18
    alceryes said:
    I think DirectStorage will only make a mediumish splash with gaming, and only in the mid to low tier space.
    PCIe 4+ is lightning quick. Put together a PCIe 4+ performance system and you're loading up NVMe-optimized games in 7 seconds or less anyway. Yes, DS could potentially take that down to 3 seconds but, meh.

    The unexpected gains of DS will be in the mid-performance gaming systems. Not only will games load much quicker on middling-performance storage mediums but, if you were sometimes hitting a CPU bottleneck due to a mid-performance CPU, DS may be exactly what you need to relieve 3-4% of the CPU workload by moving the asset decompression stage from the CPU to the GPU.

    ...and, it's free so, yeah, good stuff all around.
    The big deal is not full level loading, but constant incremental loading of assets during games. Something like what Rage tried to do before SSDs. As an example, don't load all textures at once, load only the ones you need at the current scene, and when the player moves you load what you need. That would be very hard on the CPU, but with DS it would be a lot more efficient.
    Reply
  • elforeign
    With the SK Hynix P41, are there any firmware improvements in the pipeline to access the benefits of Directstorage or will it require a new drive with a new controller fit for purpose? I recently bought one for a new build and am using it as my primary SSD, but I lack insight into this technology.
    Reply
  • itsmedatguy
    salgado18 said:
    The big deal is not full level loading, but constant incremental loading of assets during games. Something like what Rage tried to do before SSDs. As an example, don't load all textures at once, load only the ones you need at the current scene, and when the player moves you load what you need. That would be very hard on the CPU, but with DS it would be a lot more efficient.

    It's interesting I believe that Unreal 5 is doing something like this using an atlas to lookup assets, which seems to have lowered the overhead for streaming in what's needed, because Unreal 5 seems capable of doing this kind of thing off of a standard 2.5" SSD
    Reply
  • salgado18
    itsmedatguy said:
    It's interesting I believe that Unreal 5 is doing something like this using an atlas to lookup assets, which seems to have lowered the overhead for streaming in what's needed, because Unreal 5 seems capable of doing this kind of thing off of a standard 2.5" SSD
    A stupid example to represent the idea, in old GTA's you only got a few of the cars on the streets, which caused the game to never show a car, but once you got it the game showed that car a lot suddenly. In UE5 Matrix demo, every car is unique, because they are loaded on the fly. I think that's the big advancement of this tech.
    Reply
  • gggplaya
    alceryes said:
    I think DirectStorage will only make a mediumish splash with gaming, and only in the mid to low tier space.
    PCIe 4+ is lightning quick. Put together a PCIe 4+ performance system and you're loading up NVMe-optimized games in 7 seconds or less anyway. Yes, DS could potentially take that down to 3 seconds but, meh.

    The unexpected gains of DS will be in the mid-performance gaming systems. Not only will games load much quicker on middling-performance storage mediums but, if you were sometimes hitting a CPU bottleneck due to a mid-performance CPU, DS may be exactly what you need to relieve 3-4% of the CPU workload by moving the asset decompression stage from the CPU to the GPU.

    ...and, it's free so, yeah, good stuff all around.


    I think you'll see a benefit in higher clutter object density in scenes and larger openworlds games. Also, more unique objects throughout the map as well.
    Reply
  • alceryes
    salgado18 said:
    The big deal is not full level loading, but constant incremental loading of assets during games. Something like what Rage tried to do before SSDs. As an example, don't load all textures at once, load only the ones you need at the current scene, and when the player moves you load what you need. That would be very hard on the CPU, but with DS it would be a lot more efficient.
    Partial level loading has been a thing for decades(?)
    But, quicker asset access will benefit things like pop-in and more detail distant textures, definitely.
    Reply
  • gggplaya
    alceryes said:
    Partial level loading has been a thing for decades(?)
    But, quicker asset access will benefit things like pop-in and more detail distant textures, definitely.

    Correct, but loading more map sections are typically disquised as a long tunnel, a long road or highway, or a warp portal etc.... Direct Storage and super fast SSD's will eliminate the need for that.
    Reply
  • hannibal
    What I would like to see is Phison with and without this firmware update. What goes up, what goes down...
    Reply