Forspoken Showcases DirectStorage for PCs to Boost NVMe SSD Performance

Luminous Production
(Image credit: Luminous Production)

At GDC 2022, AMD and Luminous Production will present the upcoming PC game Forspoken that uses a host of AMD-developed FidelityFX technologies to improve image quality, as well as Microsoft's newest DirectStorage API that should offer a massive performance boost to load times. Forspoken is the first game to adopt DirectStorage technologies to reduce CPU load by passing data directly from an NVMe SSD to the GPU.

Microsoft's DirectStorage API was developed to lower CPU utilization when dealing with games-related NVMe requests and save expensive CPU cycles for other assignments. Instead of dealing with costly individual NVMe requests for every asset a GPU needs, the API submits large, compressed batches of I/O requests in parallel that are decompressed by a DirectX 12-compliant GPU, with little intervention from the OS as well as low CPU utilization. In addition to lowering per-request NVMe overhead, an application gets finer grain control over when it's notified of I/O request completion, instead of reacting to every I/O request.

By using DirectStorage instead of traditional methods of sending assets like textures first to the CPU and then to the GPU, game developers can reduce load times, improve the quality of visuals, and use spare CPU cycles for things like more advanced physics or sophisticated game AI. If you've ever wondered why a PC with a dozen or more CPU threads and a fast SSD still takes seemingly forever to load, DirectStorage aims to fix that problem.

(Image credit: Luminous Production)

DirectStorage requires an SSD with specific capabilities, a GPU with appropriate features, and Windows 10 or 11. Microsoft's Xbox Series X|S consoles already support DirectStorage, so as games that take advantage of the technology are ported to the PC, usage of the new API will increase. Meanwhile, keep in mind that PCs tend to use a wide range of different components, so adding DirectStorage to Windows games takes more time for design and testing. The most intriguing part about Forspoken is that it's not coming to Xbox (at least for now) and is bound for Windows-based PCs as well as Sony's PlayStation 5. 

In addition to DirectStorage, Forspoken by Luminous Productions also supports a host of other innovative technologies, including those that belong to AMD's FidelityFX package. Specifically, it uses FidelityFX screen-space ambient occlusion, screen-space reflections, raytraced shadows, and Super Resolution. 

Forspoken is set to be available on PC and PlayStation 5 on May 24, 2022.

Anton Shilov
Freelance News Writer

Anton Shilov is a Freelance News Writer at Tom’s Hardware US. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

  • InvalidError
    Once DirectStorage becomes mainstream, it'll really put the squeeze on GPUs with only 4.0x4, even 4.0x8 could be a tight squeeze.
    Reply
  • JarredWaltonGPU
    InvalidError said:
    Once DirectStorage becomes mainstream, it'll really put the squeeze on GPUs with only 4.0x4, even 4.0x8 could be a tight squeeze.
    I suspect it won't matter that much, even on something like the RX 6500 XT. When a game loads from storage, which will normally be limited to x4 PCIe 4.0 for the SSD, the GPU isn't doing a ton of other stuff. It's waiting for data from the CPU/RAM/SSD to arrive, often compiling shaders for the GPU architecture as well (on first run). The GPU should be able to accept data basically as fast as the SSD can send it, even with an x4 connection on the GPU.
    Reply
  • fevanson
    InvalidError said:
    Once DirectStorage becomes mainstream, it'll really put the squeeze on GPUs with only 4.0x4, even 4.0x8 could be a tight squeeze.

    It will be limited by the nvme SSD speed (PCIE 3.0 x4 or PCIE 4.0 x4), in the current architecture the datapath is NVME->Ram->GPU Memory and the gpu performs the decompression through compute shaders.
    Reply
  • JarredWaltonGPU
    fevanson said:
    It will be limited by the nvme SSD speed (PCIE 3.0 x4 or PCIE 4.0 x4), in the current architecture the datapath is NVME->Ram->GPU Memory and the gpu performs the decompression through compute shaders.
    This is incorrect (unless by "current" you mean "future DirectStorage" use). The current non-DirectStorage flow is NVME -> RAM -> CPU (decompress) to RAM -> Copy to GPU VRAM:

    For DirectStorage, the CPU is totally removed from the equation and it's just NVME -> RAM -> GPU VRAM -> GPU decompress to VRAM:
    Reply
  • fevanson
    JarredWaltonGPU said:
    This is incorrect (unless by "current" you mean "future DirectStorage" use). The current non-DirectStorage flow is NVME -> RAM -> CPU (decompress) to RAM -> Copy to GPU VRAM:

    For DirectStorage, the CPU is totally removed from the equation and it's just NVME -> RAM -> GPU VRAM -> GPU decompress to VRAM:

    Yes I meant the current architecture of DirectStorage.
    Reply
  • JarredWaltonGPU
    fevanson said:
    Yes I meant the current architecture of DirectStorage.
    Either way I figured including the two diagrams would be useful for any others that might be interested.

    I'm very curious to see how this actually plays out in practice, though. Seems like the biggest bottleneck might be the CPU decompression, which could be eliminated even with a SATA SSD. I'd love to see modern games where load times get down into the <10 second range from desktop to game, though!
    Reply
  • InvalidError
    fevanson said:
    It will be limited by the nvme SSD speed (PCIE 3.0 x4 or PCIE 4.0 x4), in the current architecture the datapath is NVME->Ram->GPU Memory and the gpu performs the decompression through compute shaders.
    Get it?

    With DirectStorage normalizing the heavy movement of data from storage to RAM to GPU where system RAM can still be used as a cache for compressed (file system) data as it has traditionally been for the last 20+ years, compressed asset streaming from system memory could still benefit from 5.0x16 even if you have a SATA SSD, just going to have a bit more asset pop on first-time load.

    With no caching going on, having a GPU interface that is 4X as fast as the NVMe SSD still means 1/4th as much second-hop latency relaying the data and more spare bandwidth for CPU-GPU traffic which is still needed.
    Reply
  • SelfDestructive
    InvalidError said:
    Once DirectStorage becomes mainstream, it'll really put the squeeze on GPUs with only 4.0x4, even 4.0x8 could be a tight squeeze.
    Keep in mind that geometry and texture assets through DirectStorage remain compressed until they reach the GPU. Which means you essentially get 2x the bandwidth over PCIe from RAM to VRAM. (assuming a 2:1 compression ratio)


    The CPU is also still a part of the equation of course. It still needs to issue I/O requests and copy that data from Storage to RAM, and then from RAM to VRAM. We're not at the point where the CPU is out of the equation yet.. But that's the ultimate plan, as they introduce dedicated decompression/I/O chips into the pipeline.
    Reply
  • Daniel.a.Fries
    I honestly am really looking forward to these truly next gen technologies making it to video games - Too bad the game was Delayed to October 11th.

    I would love a demo sooner though (Like Ground Zeros with MGSV)
    Reply