Skip to main content

Microsoft's DirectStorage API to Support PCIe 3.0 NVMe SSDs, DirectX 12 GPUs

Corsair
(Image credit: Corsair)

Last year Microsoft promised to bring its DirectStorage application programming interface (API) that is currently used exclusively on its Xbox Series X game console to PCs. The company has yet to reveal all hardware requirements of DirectStorage, but from a presentation that it recently showed to developers, it looks like any reasonably modern PCs featuring contemporary GPUs and SSDs will be able to support it.  

Microsoft

(Image credit: Microsoft/Reddit)

Microsoft's DirectStorage API was designed for one goal: to reduce CPU load when dealing with NVMe requests and save precious CPU cycles for other workloads. To do so, the API submits large batches of I/O requests in parallel with little intervention by the OS to reduce CPU load and reduce per-request NVMe overhead. This gives applications finer grain control over when they get notified of I/O request completion, instead of having to react to every I/O request.

(Image credit: Microsoft/Reddit)

Since Microsoft controls the feature set of its Xbox Series X console, including its GPU and SSD, supporting DirectStorage on this system was pretty straightforward — Microsoft designed the API with the latest Xbox in mind. But PCs use components with different capabilities, so adding DirectStorage support to Windows 10 systems takes more time, design, and testing.

(Image credit: Microsoft/Reddit)

Microsoft yet has to officially disclose the DirectStorage hardware requirements for PCs. However, a software developer who saw an up-to-date Microsoft DirectStorage presentation and even shared some slides from it said in a Reddit post that the new API will be supported by all DirectX 12-compatible GPUs and SSDs featuring a PCIe 3.0 interface and supporting NVMe. Unfortunately, it is unclear whether all versions of the NVMe protocol will be supported, or whether SSDs will need to support any other capabilities (like a minimum speed requirement).

(Image credit: Microsoft/Reddit)

Supporting DirectStorage on Windows 10 will significantly improve storage performance and reduce loading times in games that support this API (think about games that have an Xbox Series X version). Still, it remains to be seen when exactly Microsoft brings to Windows 10 with support for DirectStorage.

(Image credit: Microsoft/Reddit)
  • hotaru.hino
    I'm wondering why is it necessary to copy from storage to system RAM, then to VRAM?
    Reply
  • ThatMouse
    I'm not a game developer, but isn't it already possible to use the GPU to decompress files? Why is this something only the CPU can do? It also looks like they are storing the compressed files in VRAM which would eat up VRAM for files the game cannot use.
    Reply
  • Krotow
    It was actually expected. If RTX IO and what AMD is planning as alternative works over PCIe in general then it should work not only over PCIe 4.0, but also with PCIe 3.0. Interesting what kind of other uses besides gaming this will introduce. Certainly will be useful for video encoding. Also for file compression and decompression as one mentioned above. Probably a plenty of other uses too. Wish GPUs would be available :)
    Reply
  • InvalidError
    hotaru.hino said:
    I'm wondering why is it necessary to copy from storage to system RAM, then to VRAM?
    Because standard IO functions expect system memory addresses as the source/destination argument.

    Also, due to BAR size limit pre-resizable BAR, there is no way for the NVMe SSD to know whether the GPU's BAR is mapped to the correct VRAM block so if you are going to waste microseconds doing context switches for the kernel/drivers to make all of the checks to move the BAR around if needed and make sure nothing else was attempting to access it, you are better off using system memory as a go-between and not worry about it. It would make sense if resizable BAR was a prerequisite for Direct Storage to greatly simplify things.
    Reply
  • Gillerer
    ThatMouse said:
    I'm not a game developer, but isn't it already possible to use the GPU to decompress files? Why is this something only the CPU can do?

    I think the current (game asset) compression methods are only suited to general purpose CPU cores, and as stated in the slides, new ones need to be developed for GPUs to be any good at it (or be able to perform without affecting the game performance negatively).

    ThatMouse said:
    It also looks like they are storing the compressed files in VRAM which would eat up VRAM for files the game cannot use.

    The data has to be in memory before the GPU (or CPU) can process it. Without memory as buffer, the CPU/GPU would spend much too much time waiting for data to trickle in from the (relatively) slow NVMe device and PCIe bus. If there is memory pressure, the compressed data can always be discarded when it's no longer needed.

    I could also see the benefit of keeping compressed data in VRAM, and instead discarding unused uncompressed data sets. If decompression is easy and doesn't affect game performance, any data needed could then be quickly decompressed again without ever going to NVMe or the PCIe bus. This would be an actually working version of the "double your RAM" scam compression software of the past.

    *

    Also, they're not storing compressed files (as in up to 1GB files in the game installation) in memory, but compressed data. A game can pick and choose which parts of a file to read to memory, based on the locations of the assets it needs. This means the memory usage is only as much as the level requires. The large file sizes in games are mainly due to how file systems are so much better at handling few huge files than thousands of small ones; especially if using a hard drive.

    *

    I worry that Microsoft will attempt to tie DirectStorage to UWP somehow... :-(
    Reply
  • hotaru.hino
    InvalidError said:
    Because standard IO functions expect system memory addresses as the source/destination argument.
    This doesn't make sense to me because if VRAM is memory mapped in the virtual address space, it should be directly addressable anyway. Or at least the portion that's been mapped.

    Also, due to BAR size limit pre-resizable BAR, there is no way for the NVMe SSD to know whether the GPU's BAR is mapped to the correct VRAM block so if you are going to waste microseconds doing context switches for the kernel/drivers to make all of the checks to move the BAR around if needed and make sure nothing else was attempting to access it, you are better off using system memory as a go-between and not worry about it. It would make sense if resizable BAR was a prerequisite for Direct Storage to greatly simplify things.
    But this provides a better answer.
    Reply