Multi-threaded rendering? "But," you’re saying, "we’ve had multi-core CPUs for several years now and developers have learned to use them. So multi-threading their rendering engines is nothing new with Direct3D 11." Well, this may come as a surprise to you, but current engines still use only a single thread for rendering. The other threads are used for sound, decompression of resources, physics, etc. But rendering is a heavy user of CPU time, so why not thread it, too? There are a several reasons, some of them related to the way GPUs operate and others to the 3D API. So Microsoft set about solving the latter and working around the former.
First of all, threading the rendering process seems attractive at first, but when you look at it a little closer, you realize that, after all, there’s only one GPU (even when several of them are connected via SLI or CrossFire, their purpose is to create the illusion that there’s only a single, virtual GPU) and consequently only one command buffer. When a single resource is shared by several threads, mutual exclusion (mutex) is used to prevent several threads from writing commands simultaneously and stepping on each others’ feet. That means that all the advantages of using several threads are canceled out by the critical section, which serializes code. No API can solve this problem—it’s inherent in the way the CPU and GPU communicate. But Microsoft is offering an API that can try to work around it. Direct3D 11 introduces secondary command buffers that can be saved and used later.
So, each thread has a deferred context, where the commands written are recorded in a display list that can then be inserted into the main processing stream. Obviously, when a display list is called by the main thread (the “Execute” in the “Multi-threaded Submission” diagram below) it has to be ascertained that its thread has finished filling it. So there’s still synchronization, but this execution model at least allows some of the rendering work to be parallelized, even if the resulting acceleration won’t be ideal.
Another problem with the previous Direct3D versions had to do with creation of resources—textures, for example. In the current versions of the API (9 and 10), resource creation had to take place in the rendering thread. Developers got around the problem by creating a thread that read and decompressed the texture from the disk and filled the resource (the Direct3D object), which itself was created in the main thread.
But as you can see, a large share of the workload was still on the main thread, which was already overloaded. That doesn’t ensure good balance, needed for good execution times. So, Microsoft has introduced a new interface with Direct3D 11: a programmer can create one Device object per thread, which will be used to load resources. Synchronization within the functions of a Device is more finely managed than in Direct3D 10 and is much more economical with CPU time.