DirectStorage Performance Compared: AMD vs Intel vs Nvidia

Microsoft DirectStorage 1.1
(Image credit: Microsoft)

Microsoft's DirectStorage 1.1 application programming interface is available for Windows-based computers, the latest graphics cards, and advanced NVMe solid-state drives. So, it is time to find out what hardware best handles GPU decompression, which is one of the most exciting features of DirectStorage 1.1. Fortunately, Compusemble has developed an appropriate benchmark, whereas PC Games Hardware used it to uncover some interesting findings.

Microsoft's DirectStorage 1.1 has several critical performance-boosting features, but the main objectives of this API are to reduce CPU load when dealing with NVMe requests. It also saves valuable CPU cycles for other workloads and handles game asset decompression via highly-parallel GPUs with little OS intervention and low CPU utilization. In addition, the usage of DirectStorage asset compression and decompression algorithms allows for transferring more data than the storage medium (i.e., SSD) is capable of, which greatly reduces loading times.  

Meanwhile, GPU hardware handles DirectStorage decompression algorithms differently, so PCGH decided to find out which of the latest GPUs — AMD's Radeon RX 7900 XT, Intel's Arc A770, or Nvidia's GeForce RTX 4080 — is better for asset decompression. They took Compusemble's benchmark and ran it on the graphics cards and on Intel's Core i9-12900K CPU.

Swipe to scroll horizontally
Row 0 - Cell 0 PCIe 4.0 x4 (7.9GB/s)PCIe 3.0 x4 (3.9GB/s)SATA (0.6 GB/s)
Radeon RX 7900 XT14.6 GB/s12.6 GB/s1.47 GB/s
Arc A770 16GB16.8 GB/s13.9 GB/s1.64 GB/s
GeForce RTX 408015.3 GB/s12.7 GB/s1.47 GB/s
Core i9-12900K @ 5.20 GHz5.2 GB/s5.2 GB/s1.47 GB/s

The first thing that strikes the eye is that all GPUs handle decompression at least 2.4 times better than the Core i9-12900K processor. Meanwhile, Intel's Arc A770 is noticeably better than AMD's Radeon RX 7900 XT and Nvidia's GeForce RTX 4080 regarding GPU asset decompression. In the best-case scenario, the A770 can transfer/decompress assets at a rate of 16.8 GB/s, whereas the RX 7900 XT comes third with a 14.6 GB/s rate (13% behind the leader). 

Whether an AMD, Intel, or Nvidia GPU is used, actual loading times are reduced by an order of magnitude — from 5 seconds to 0.5 seconds, according to PCGH. Therefore, given how close the decompression rate results of graphics processors are, it does not really matter which GPU is used — they are all up to the task and are generally good enough to significantly improve the gaming experience.

Anton Shilov
Freelance News Writer

Anton Shilov is a Freelance News Writer at Tom’s Hardware US. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

  • -Fran-
    Another important question to ask is how much GPU power it takes away while doing it. If this causes stutters when loading on the fly in the middle of a map or something, it may not be a good thing to use in the GPU directly? Maybe?

    Regards.
    Reply
  • DougMcC
    And of course what you really want is to never have that data in system memory at all, the real holy grail will be for the pci bus to be replaced with something that allows the GPU to load directly from an attached storage device without the cpu or system ram being involved at all.
    Reply
  • jkflipflop98
    DougMcC said:
    And of course what you really want is to never have that data in system memory at all, the real holy grail will be for the pci bus to be replaced with something that allows the GPU to load directly from an attached storage device without the cpu or system ram being involved at all.

    I don't know for sure, but I would guess you're describing version 2.0 of DirectStorage.
    Reply
  • hotaru251
    -Fran- said:
    If this causes stutters when loading on the fly in the middle of a map or something
    it doesn't.

    the tech is already used in consoles and they don't have any issue.
    Reply
  • JamesJones44
    -Fran- said:
    Another important question to ask is how much GPU power it takes away while doing it. If this causes stutters when loading on the fly in the middle of a map or something, it may not be a good thing to use in the GPU directly? Maybe?

    Regards.

    I've had the same question for since Direct Storage was originally announced. I've not yet found anyone who's done significant testing to find out how much RAM and GPU cycles are reduced to see if it's really "faster" overall. I put "faster" in quotes because in a vacuum it is (aka tested on its own), but as a whole I haven't found a good answer.
    Reply
  • PiranhaTech
    -Fran- said:
    Another important question to ask is how much GPU power it takes away while doing it. If this causes stutters when loading on the fly in the middle of a map or something, it may not be a good thing to use in the GPU directly? Maybe?

    Regards.
    In theory, it could take a good amount of GPU power, but the GPU might be waiting for asset loading anyways. It might use more GPU power, but the GPU might be idle or waiting for data anyways, therefore it can actually reduce the amount of stuttering.

    It might be something the game programmers can optimize for. This especially sounds like something a console developer can leverage. So, if testing shows that the traditional loading pipeline is better, they can use it
    Reply
  • DavidLejdar
    Considering that many a rig doesn't have such a CPU, the difference would likely be even bigger there. On the other hand, many do not have such a GPU neither, so then the question still is how well it works on an average rig - including issues such as whether a 4 GB GPU would actually not have as much space for all the data the game wants to load into VRAM directly (instead of using the system memory as buffer for decompressed data).

    -Fran- said:
    Another important question to ask is how much GPU power it takes away while doing it. If this causes stutters when loading on the fly in the middle of a map or something, it may not be a good thing to use in the GPU directly? Maybe?

    Regards.

    Game developers have the option to provide the user with a settings option to force CPU decompression for all of the workload. And if I understand it correctly, devs can designate some data to go to the CPU for decompression, which some may want to do for mentioned "on the fly loading". Or they may not want to do that, and still use some form of transition, such as a simple corridor, in which the GPU doesn't have much to do for the output and can process the data for the next area.

    DougMcC said:
    And of course what you really want is to never have that data in system memory at all, the real holy grail will be for the pci bus to be replaced with something that allows the GPU to load directly from an attached storage device without the cpu or system ram being involved at all.

    If a GPU has plenty of RAM, there sure is an argument for it. SDRAM still has a way lower latency than NVMe SSDs though, by a factor of around 100, so hardly a bottleneck (if enough GB). And for data the GPU doesn't need right now, to be buffered in the system memory, that still makes sense, as it can be accessed faster there than from a storage device.
    Reply
  • Kamen Rider Blade
    I'm still waiting for Radeon SSG like PCIe x4 interface mounted on the back of the Video Card to help shorten the traces between the Storage and Video Card.

    Also have Bi-Furication on PCIe 5.0 for PCIe x12 lanes & PCIe x4 lanes so that the GPU can get x12 Bandwidth while the SSD can get x4 lanes worth of Bandwidth.

    That should really help shorten the latency by cutting out the directing of traffic to the CPU and have a ultra short route from Storage to GPU.
    Reply
  • InvalidError
    -Fran- said:
    Another important question to ask is how much GPU power it takes away while doing it. If this causes stutters when loading on the fly in the middle of a map or something, it may not be a good thing to use in the GPU directly? Maybe?
    What about the stutters and asset pops from having the CPU bogged down by asset decompression and the GPU having to wait for the CPU 3X as long? I suspect most gamers have nowhere near an i9-12900k or AMD equivalent either.
    Reply
  • gg83
    DougMcC said:
    And of course what you really want is to never have that data in system memory at all, the real holy grail will be for the pci bus to be replaced with something that allows the GPU to load directly from an attached storage device without the cpu or system ram being involved at all.
    NvLink
    Reply