Filed earlier this year as an extension of the existing patent 7,634,637, Nvidia has applied for a patent that describes a hierarchical processor array. The idea is that there are two or three tiers of processing cores with dedicated functions that alleviates a problem in core design that results in increasingly wide and ineffective graphics rendering pipelines.
Those pipelines include various shaders, such as a vertex shader unit, a geometry shader, a pixel shader, among others, and each of these shaders are getting wider at every level of parallel execution hardware. Nvidia says that "each massively parallel stage in a stage-by-stage pipeline tends to provide little granularity of control of portions of each parallel stage", each "massively parallel stage becomes unwieldy and prohibitively time-consuming to design". Additionally, "the level of utilization may decrease, as the massively parallel stage struggles during operation to find sufficiently wide units of work to fully occupy the data path."
To keep parallelization efficient, the company describes a processor with multiple levels of processing hierarchies with "multiple classes of graphics operations being associated with a different stage of graphics processing." However, each level would also include at least one module that is capable of processing all graphics functions. There would also be one top-level component that is able to distribute certain classes of work to lower level classes of processors. The patent specifically mentions a third-level class in the processor hierarchy that would be reserved for general purpose computations, as well as "at least one" specialized graphics function module that "is capable of performing a class of graphics operations carried out based on frame buffer data for scan out to a display."
According to the patent application, the resulting core design is "advantageously configured to execute a large number of threads in parallel, where the term 'thread' refers to an instance of a particular program executing on a particular set of input data." For example, a thread would refer to the execution of a single vertex in the shader program or individual pixel being processed by the pixel shader.
Besides greater processing efficiency, the document states that a hierarchical structure of multithreaded core array also enables a faster design of "derivative chip designs." Faster GPUs could be built simply by "adding additional components at one or more of the levels of the hierarchy".