Nvidia RTX 5090 apparently handles DirectStorage GPU decompression better than RTX 4090
Performance still favors of CPU decompression at times.

Microsoft's DirectStorage API has been known to have weird performance quirks with GPU decompression. However, YouTuber Compusemble claims that Nvidia's latest GeForce RTX 5090 GPU handles GPU decompression better than the prior generation RTX 4090, showing less of a performance gap when using GPU decompression vs CPU decompression.
The YouTuber benchmarked two games with DirectStorage (DS) support, Ratchet and Clank: Rift Apart and Spider-Man 2. The games were tested with DirectStorage GPU decompression and CPU decompression, on an RTX 5090 at 4k, 1440p, and 1080p resolutions. Note that no RTX 4090 tests were shown for comparison, which has an a priori assumption that DS causes a loss of performance.

In Ratchet and Clank: Rift Apart, the RTX 5090 produced nearly identical average frame rates with GPU decompression compared to CPU decompression at native 4k. Average frame rates were just 0.96% faster with CPU decompression, while 1% lows were almost 10% higher in favor of CPU decompression. That suggests that, once again, a higher GPU load (e.g. at 4K) will cause DS running on the GPU to potentially reduce performance.
1440p flips the story, and while it showed slimmer margins, this time overall performance (specifically the 1% lows) favored GPU decompression. DirectStorage GPU decompression on the RTX 5090 provided 1.19% higher frame rates on average, and 5.26% better 1% lows than CPU decompression. 1080p saw GPU decompression winning again, featuring 0.48% higher frame rates on average and 11.11% better 1% lows.
Game | Average FPS | 1% Low FPS |
---|---|---|
Ratchet and Clank: 4K | -0.95% | -8.48% |
Ratchet and Clank: 1440p | +1.19% | +5.26% |
Ratchet and Clank: 1080p | +0.48% | +11.11% |
Spider-Man 2: 4K | -7.58% | -9.53% |
Spider-Man 2: 1440p | -3.41% | -12.50% |
Spider-Man 2: 1080p | -3.19% | -8.45% |
On the other hand, Spider-Man 2 demonstrated a performance bias against GPU decompression at all three resolutions. At 4k, the RTX 5090 performed 8.2% better in average FPS and 10.53% better in the 1% lows with CPU decompression. At 1440p, CPU decompression performed 3.53% better in average FPS and 14.29% better in 1% lows. Finally, at 1080p, CPU decompression yielded 3.3% higher average FPS and 9.23% higher 1% lows compared to GPU decompression.
Consumable claims the RTX 5090's performance is more consistent with GPU and CPU decompression compared to the older RTX 4090. An older video he recorded backs up his statement, showing the RTX 4090 with a greater performance drop-off when toggling between CPU and GPU decompression, specifically in Spider-Man 2. With the RTX 4090 at 4k in Spider-Man 2, average frame rates were 10.34% higher and 1% lows 17.95% higher with CPU decompression. At 1440p, the average frame rate was 6.25% higher and 1% lows 18.87% higher with CPU decompression. At 1080p, the average frame rate was 3.25% higher and 1% lows 25.86% higher with CPU decompression.
Considering the RTX 5090 has both more raw compute (105 TFLOPS FP32 vs 83 TFLOPS) as well as significantly more memory bandwidth (1.8 TB/s vs 1.0 TB/s), as well as 33% more VRAM, those could all contribute to the reduce impact of GPU decompression. The RTX 5090 boasts a 512-bit wide memory interface and 28 Gbps GDDR7 memory modules, giving it 78% more memory bandwidth than the RTX 4090. GPU decompression is very heavy on the memory system, since assets need to be streamed into GPU memory for the decompression process to function.
Of course, shader computations still come into play, and so it makes sense that, at lower resolutions where the 5090 is more likely to be completely CPU limited, it would also have the spare cycles to handle decompression. But whatever the cause (architecture could also play a role), the 5090 doesn't seem to mind GPU decompression as much as the 4090. Still, what we want to see is more games using DirectStorage to improve load times as well as overall performance — slightly longer load times would otherwise be preferable to inconsistent framerates.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

Aaron Klotz is a contributing writer for Tom’s Hardware, covering news related to computer hardware such as CPUs, and graphics cards.
-
HyperMatrix “Direct Storage still negatively impacts performance in GPU bound scenarios with the RTX 5000 series and should be disabled.”Reply
I’ve been disabling direct storage for a while now after I saw my CPU was sitting at 20% utilization while GPU was maxed out. No point taxing your most limited resource. -
mikeztm
There’s no way to turn it off if the game is using gdeflate compression. That mean the whole game is compressed using DirectStorage format and can only be read via DirectStorage API. Monster Hunter Wilds falls into this category and by now I would say DirectStorage is hurting the industry more than the gains. PC are not ready for a console like texture streaming compression since we don’t have universal memory and no hardware decompress unit.HyperMatrix said:“Direct Storage still negatively impacts performance in GPU bound scenarios with the RTX 5000 series and should be disabled.”
I’ve been disabling direct storage for a while now after I saw my CPU was sitting at 20% utilization while GPU was maxed out. No point taxing your most limited resource.
DirectStorage with GPU decompress is basically emulating the hardware unit in Xbox Series and PS5 using software shader code. It’s awful to run either on CPU or GPU. We need that unit to be part of our GPU just like how media decoder/encoder works today. -
SethNW Yeah, it is almost like it depends on how CPU or GPU bound you are... If you are doing overkill graphic card and are CPU bound, offlpading some work to GPU will increase performance. And if GPU is maxed out, adding more work to it will only reduce performance of it's other tasks and offloading to CPU will help. I mean almost like you don't want to add more work to whatever is maxed out on utilization... Or in other words, I am shocked and amazed. :-DReply -
mikeztm
This workload should be optimized towards GPU and the GPU should run circle around CPU so that you would never want it to run on the CPU. But instead we got this workload that is chocking the GPU.SethNW said:Yeah, it is almost like it depends on how CPU or GPU bound you are... If you are doing overkill graphic card and are CPU bound, offlpading some work to GPU will increase performance. And if GPU is maxed out, adding more work to it will only reduce performance of it's other tasks and offloading to CPU will help. I mean almost like you don't want to add more work to whatever is maxed out on utilization... Or in other words, I am shocked and amazed. :-D
Xbox and PS5 does this with dedicated decompression unit so it never hit GPU or CPU. This is basically a technique to avoid asking for a PCIe Gen5 NVMe driver as minimum requirement. But our PC isn’t ready for this until we got that hardware unit built in.