Skip to main content

Microsoft Explains GPU Hardware Scheduling: Aims to Improve Input Lag

(Image credit: Shutterstock)

If you've been following tech news lately, chances are you've heard that hardware-accelerated GPU scheduling has become a thing. Microsoft baked it into the May 2020 update, Nvidia implemented it, AMD did too, and we've tested it on Nvidia's GPUs. However, so far there has been very little explanation about what it actually does, and why it's relevant. Our own testing on Nvidia graphics cards yielded mixed results, making us wonder whether it's really worth your time.

Today, Microsoft posted a blog to answer these questions, so naturally, we're here to tell you the what and how in (mostly) our own words.

Primary Improvement: Input Latency

Skipping the history lesson of how Microsoft's WDDM GPU scheduler came to be, in games today it functions as the piece of software that assigns tasks to the GPU. This scheduler runs on the CPU as a high-priority thread that "coordinates, prioritizes, and schedules the work submitted by various applications." Why high priority? Because you want the GPU to receive its jobs as soon as you trigger your character to shoot the bad guys.

And that's where the problems come in. The WDDM scheduler creates overhead on the system and introduces latency. There's no way around creating any overhead at all, but running on the CPU adds an extra step when every millisecond matters. 

In an optimal scenario, as the GPU is rendering one frame, the CPU would be busy planning out the next frame. In practice, this is exactly how today's WDDM scheduler works, but working with such small tasks on a frame-by-frame basis creates a massive load on the CPU, which is why performance when running games at a low resolution and high framerates is very CPU dependent.

To reduce this overhead, today's games are able to command the CPU to generate multiple frame commands, sending them to the GPU in batches. This used to be an optional feature where you could manually pick the number of frames buffered, but has since become a balancing act that happens in the background without your knowledge. This pre-planning of frames is known as frame buffering, but you can undoubtedly already see how this is problematic: what you see on-screen can and will run a few frames behind your inputs when the CPU needs to reduce its load.

The Dilemma: Low Input Latency, or Reduced CPU Load

When we tested the effects of GPU Hardware Scheduling, we were using a system with an Nvidia RTX 2080 Ti and an Intel Core i9-9900K, which was the best gaming CPU money could buy (until the Core i9-10900K came around).

However, with a processor this powerful, scheduling GPU frames isn't the most demanding task for your central piece of silicon, and you won't really have to choose between reducing input latency or reducing the CPU load. The i9-9900K is so powerful, why not let the chip schedule the frames one by one? As if you care that your expensive CPU is proving it's money's worth?

But not everyone has a $500 CPU to play games on, and that's where GPU hardware scheduling should make a bigger difference when gaming.

GPU Hardware Scheduling Should Benefit Low-End CPUs More

Nvidia's Pascal and Turing GPUs, and AMD's RDNA graphics cards, all have purpose-built hardware scheduling baked into their silicon. This GPU scheduler is much more efficient at scheduling work than your CPU, and it doesn't require going back and forth over the PCIe bus.

However, switching from software scheduling on the CPU to hardware scheduling on the GPU fundamentally changes one of the pillars of the graphics subsystem. It affects the hardware, the operating system, the drivers, and how the games are coded, which is why it's taken this long to become a thing. 

The transition to hardware-accelerated GPU scheduling isn't going to be an easy one. That's why Microsoft isn't yet enabling the feature by default, but rather as an opt-in setting. You can find it under Settings -> System -> Display -> Graphics Settings, but you'll need to be updated to the latest version of Windows 10 (May 2020, build 2004) and have the right AMD or Nvidia drivers installed on your system.

Enabling the feature today shouldn't lead to any issues, but as with any technology this young, try not to be surprised if it does. Given the opt-in nature, it could be months (years?) before we realize the full benefits of GPU hardware scheduling. But if all goes as planned, pairing a powerful graphics card with a mid-tier CPU is about to make a whole lot more sense for gaming.

  • EtaLasquera
    R5 1600AE Stock
    GTX1060 3GB
    16GB RAM 2400Mhz
    1TB NVME
    1% Improvement with hardware scheduler.

    This is low or mid tier cpu?
    Reply
  • TechyInAZ
    EtaLasquera said:
    R5 1600AE Stock
    GTX1060 3GB
    16GB RAM 2400Mhz
    1TB NVME
    1% Improvement with hardware scheduler.

    This is low or mid tier cpu?

    By improvement what do you mean? This feature helps latency, not frame rate.

    That CPU is pretty powerful for that GPU, so that's probably another reason why nothing changed.
    Reply
  • hotaru251
    hmm I should try this on my i3-4130.

    dual core so it should have a larger impact correct?



    also does this just lessen burden on cpu (so doesnt use as much %?) and push it to make ur gpu do a bit extra while gaming?
    Reply
  • bit_user
    TechyInAZ said:
    This feature helps latency, not frame rate.
    Not sure about that.

    What MS' blog post seems to say is that in order to achieve better efficiency, games were coded in a way that increased latency. Switching the GPU scheduling to use more hardware assist doesn't change that, but it could reduce the efficiency benefit of batching, which would enable games to submit work more frequently, thereby reducing latency. In other words, it opens the door to latency-reductions in games, though the game is what ultimately determines whether any benefit is realized.

    Here's the relevant bit:
    an application would typically do GPU work on frame N, and have the CPU run ahead and work on preparing GPU commands for frame N+1. This buffering of GPU commands into batches allows an application to submit just a few times per frame, minimizing the cost of scheduling and ensuring good CPU-GPU execution parallelism.

    An inherent side effect of buffering between CPU and GPU is that the user experiences increased latency. User input is picked up by the CPU during “frame N+1” but is not rendered by the GPU until the following frame. There is a fundamental tension between latency reduction and submission/scheduling overhead. Applications may submit more frequently, in smaller batches to reduce latency or they may submit larger batches of work to reduce submission and scheduling overhead.

    TechyInAZ said:
    That CPU is pretty powerful for that GPU, so that's probably another reason why nothing changed.
    If my interpretation is correct, then any potential improvement would be specific to the work-submission behavior of the game. Games written to work efficiently with CPU-based scheduling are unlikely to show much benefit from HW-assisted scheduling. So, to see benefits, you'd need either:
    A badly-written game.
    A game written to favor low-latency more than high-FPS.
    A game which has specifically been tuned to take advantage of HW-assisted GPU scheduling.
    Again, this is just my interpretation of the MS blog post. I clicked on it in hopes of learning more details, but the main nugget is just this:
    Windows continues to control prioritization and decide which applications have priority among contexts. We offload high frequency tasks to the GPU scheduling processor, handling quanta management and context switching of various GPU engines.
    Reply
  • TerryLaze
    bit_user said:
    Not sure about that.

    What MS' blog post seems to say is that in order to achieve better efficiency, games were coded in a way that increased latency. Switching the GPU scheduling to use more hardware assist doesn't change that, but it could reduce the efficiency benefit of batching, which would enable games to submit work more frequently, thereby reducing latency. In other words, it opens the door to latency-reductions in games, though the game is what ultimately determines whether any benefit is realized.

    Here's the relevant bit:


    If my interpretation is correct, then any potential improvement would be specific to the work-submission behavior of the game. Games written to work efficiently with CPU-based scheduling are unlikely to show much benefit from HW-assisted scheduling. So, to see benefits, you'd need either:
    A badly-written game.
    A game written to favor low-latency more than high-FPS.
    A game which has specifically been tuned to take advantage of HW-assisted GPU scheduling.Again, this is just my interpretation of the MS blog post. I clicked on it in hopes of learning more details, but the main nugget is just this:
    Actually the important part is this:
    However, throughout its evolution, one aspect of the scheduler was unchanged. We have always had a high-priority thread running on the CPU that coordinates, prioritizes, and schedules the work submitted by various applications.
    Because windows will outright stop already running tasks (like the game threads for example) to run a higher priority task and it doesn't care if there are enough resources it's just how windows works,so if this can make the driver run at a lower priority we will get much less stutter in games.
    https://docs.microsoft.com/en-us/windows/win32/procthread/scheduling-priorities
    If a higher-priority thread becomes available to run, the system ceases to execute the lower-priority thread (without allowing it to finish using its time slice) and assigns a full time slice to the higher-priority thread.
    Reply
  • bit_user
    TerryLaze said:
    Because windows will outright stop already running tasks (like the game threads for example) to run a higher priority task and it doesn't care if there are enough resources it's just how windows works,so if this can make the driver run at a lower priority we will get much less stutter in games.
    That's only an issue if the game is heavily trying to use all cores. Of course, on low core-count CPUs, that might actually be the case.

    However, I think it ties in with what I was saying - games probably avoid submitting work too frequently, specifically because doing so would unblock this high-priority thread, which then interrupts their frame computations just to forward the work onto the GPU.
    Reply
  • wifiburger
    Other sites also noted not much gains with i9 but on the 3900x showed gains.
    Tomb Raider showed 1% gains at 4k for me.
    I kept it on, since the cpu seems to reach / hold better boost clocks under gaming with it on.
    Reply