Sign in with
Sign up | Sign in

Nvidia Files Patent For Hierarchical Graphics Processors

By - Source: USPTO | B 20 comments

Nvidia is looking for ways to keep parallelization of data in graphics processor efficient.

Filed earlier this year as an extension of the existing patent 7,634,637, Nvidia has applied for a patent that describes a hierarchical processor array. The idea is that there are two or three tiers of processing cores with dedicated functions that alleviates a problem in core design that results in increasingly wide and ineffective graphics rendering pipelines.

Those pipelines include various shaders, such as a vertex shader unit, a geometry shader, a pixel shader, among others, and each of these shaders are getting wider at every level of parallel execution hardware. Nvidia says that "each massively parallel stage in a stage-by-stage pipeline tends to provide little granularity of control of portions of each parallel stage", each "massively parallel stage becomes unwieldy and prohibitively time-consuming to design". Additionally, "the level of utilization may decrease, as the massively parallel stage struggles during operation to find sufficiently wide units of work to fully occupy the data path."

To keep parallelization efficient, the company describes a processor with multiple levels of processing hierarchies with "multiple classes of graphics operations being associated with a different stage of graphics processing." However, each level would also include at least one module that is capable of processing all graphics functions. There would also be one top-level component that is able to distribute certain classes of work to lower level classes of processors. The patent specifically mentions a third-level class in the processor hierarchy that would be reserved for general purpose computations, as well as "at least one" specialized graphics function module that "is capable of performing a class of graphics operations carried out based on frame buffer data for scan out to a display."

According to the patent application, the resulting core design is "advantageously configured to execute a large number of threads in parallel, where the term 'thread' refers to an instance of a particular program executing on a particular set of input data." For example, a thread would refer to the execution of a single vertex in the shader program or individual pixel being processed by the pixel shader.

Besides greater processing efficiency, the document states that a hierarchical structure of multithreaded core array also enables a faster design of "derivative chip designs." Faster GPUs could be built simply by "adding additional components at one or more of the levels of the hierarchy".

Discuss
Ask a Category Expert

Create a new thread in the News comments forum about this subject

Example: Notebook, Android, SSD hard drive

This thread is closed for comments
  • 3 Hide
    hannibal , April 30, 2012 7:15 AM
    Holy Bat smoke! Bring the popcorn! Be ready for patent fights!
    This can not be good for customers...
  • -5 Hide
    Pherule , April 30, 2012 8:46 AM
    The patent trolls have struck again...
  • -6 Hide
    digiex , April 30, 2012 9:30 AM
    Nvidia is not a patent troll, they are actually the crown holder of the fastest GPU.
  • Display all 20 comments.
  • 4 Hide
    Anonymous , April 30, 2012 10:23 AM
    /\ /\ sorry i wish it was the fastest since Crysis, Metro 2033 and compute disagree. The 7970 is underclocked!
  • -1 Hide
    theconsolegamer , April 30, 2012 10:30 AM
    Ok, so Nvidia wants to patent the GPU...... like are you serious?
  • 6 Hide
    sixdegree , April 30, 2012 11:12 AM
    AMD should patent "innovative power regulator in graphic processor" while Intel patent "Data throttling through PCI express to graphic processor".
  • -3 Hide
    xingmaodr , April 30, 2012 2:04 PM
    Awesome

  • 5 Hide
    ojas , April 30, 2012 2:57 PM
    I think they've patented a method of increasing gpu efficiency, rather than "patent a GPU". I don't know too much about GPUs on an architectural level, but this is what i got out of the patent as described in the article.
  • -6 Hide
    Soda-88 , April 30, 2012 4:16 PM
    Frankythetoe/\ /\ sorry i wish it was the fastest since Crysis, Metro 2033 and compute disagree. The 7970 is underclocked!

    7970's die size is 352mm^2 whereas GTX680 is 294mm^2 (that's smaller than 560Ti). nVidia didn't have to release it's flagship (GK110) because 680 (should've been called 660) was trading blows with AMD's flagship card. Obviously, nVidia didn't wanna shoot itself in the foot by not seizing this opportunity to gobble as much cash as possible with the '680'.
  • 7 Hide
    sykozis , April 30, 2012 4:34 PM
    PheruleThe patent trolls have struck again...

    How is nVidia a patent troll? nVidia has been the on the receiving end of the trolling as many times as Samsung has. nVidia is simply setting up for an entirely new architecture and properly filing for patents related to their work. This is actually what the patent system was setup for. If AMD or Intel want to follow suit, nVidia will license the patent to them.
  • 2 Hide
    blazorthon , April 30, 2012 7:04 PM
    Soda-887970's die size is 352mm^2 whereas GTX680 is 294mm^2 (that's smaller than 560Ti). nVidia didn't have to release it's flagship (GK110) because 680 (should've been called 660) was trading blows with AMD's flagship card. Obviously, nVidia didn't wanna shoot itself in the foot by not seizing this opportunity to gobble as much cash as possible with the '680'.


    ... The 7970 doesn't have a die size and neither does the 680. The GPUs inside each graphics card, the Tahiti and GK104, have a die size because they have dies. The 7970 and 680 are the entire card, not just the GPU. I still don't understand why some people refer to graphics cards as if they are GPUs. It's like calling an entire computer a CPU.

    Furthermore, the Tahiti in the 7970 is a compute oriented chip whereas the GK104 in the GTX 680 is a gaming oriented chip. That the Tahiti even comes as close as it does is quite a feat for AMD considering that it isn't even a gaming oriented chip like the GK104. Compare the two in compute performance. The Tahiti whens by over 50% in single precision compute performance and the Tahiti is nearly 6 times faster than the GK104 in dual precision compute performance. This is like Fermi versus Cayman, except not as bad. The GF110 has a 530mm2 die from the GTX 580 and the Cayman from the Radeon 6970 has a 374mm2 or so die, yet GF110 only somewhat outperforms it in gaming. Compute, on the other hand, shows the GF110 vastly outperforming the 6970.

    This is the same, except AMD doesn't want to make a huge 500mm2+ die to beat Nvidia, especially with the Tahiti being right behind the GK104 (680 versus 7970) despite the Tahiti not being a gaming oriented chip.

    Considering the deplorable yields of the GK104, the GK100 would have had FAR worse yields (it is supposed to have a 550mm2 die) due to it being even larger. Nvidia can't even keep the 680s in stock and it would have been FAR worse if they had GK100 dies instead of GK104s. Also, what's the point of having GK100 if the GK104 is already this fast? What, do you want to run triple 1080p displays on a single GPU for some ridiculous reason? There's no need for it because what we have today is already more than fast enough for current games. Four 680s or 7970s can handle triple 2560x1600 or triple 1080p3D right now in ANY and ALL games. The only resolutions higher than that are monitors that would cost more than the rest of the entire computer system. There's no point to making a faster single GPU card right now, especially considering how badly that would effect already horrible supply.

    Why should the 680 be called the 660 just because it has GK104 instead of GK100 (and yes, it's GK100, not GK110, so get it right)? The GK104, even before it was assigned to the 680, would have been the 670 or 660 TI, not just the 660 (660 would have probably had a cut down version of the GK104 instead of the full chip). Furthermore, the GK104-equipped GTX 680 performs on par with the GTX 590 for gaming, just like the GTX 680 should do if it is to follow the trend of Nvidia's top video card from their new architectures plus die shrinks roughly equaling the previous dual GPU card (for example, GTX 480 and GTX 295) in performance.

    So, GK104 managed to do what it needed to do with the GTX 680, GK100 would have had even worse yield problems than GK104 does, and GK100 would have had more performance than reasonably usable, so Nvidia made the best possible decision for themselves in this situation by using GK104 in the GTX 680.

    Also, Tahiti itself has more than the 2048 cores that the 7970 has access to, so not only could it have been made smaller, but the full version could be released by AMD at any time, closing the gap between AMD's top single GPU card (presumably the 7980 if AMD does this) and the 680. By then, Nvidia might be able to get GK100s going in a GTX 680 TI or GTX 685, and the gap will be widened again when the greater performance of even faster GPUs is more important. Then it will be more reminiscent of the GTX 580 versus the Radeon 6970, except this time the compute advantage switches from Nvidia to AMD.
  • 3 Hide
    _Cubase_ , April 30, 2012 11:39 PM
    Nvidia seems to have invented a unique method of utilizing existing hardware, and want to patent it before developing and integrating it. Fair enough.
    A patent troll finds and commonplace method, and tries to patent it without actually having to invent or implement anything.
  • -1 Hide
    dreadlokz , May 1, 2012 12:23 AM
    One day patents will die and the tech will be free! This will be the first day after the apocalipse =)
  • 3 Hide
    kronos_cornelius , May 1, 2012 5:30 AM
    Nvidia is definitely no patent troll... To accuse the company of such is to confuse the meaning of the word.

    I don't think I get the patent from this short description. Given that they have general processing cores (CUDA) and graphics cores(regular GL), is this patent only for graphics rendering ? or general processing ?
  • -7 Hide
    antilycus , May 1, 2012 5:54 AM
    AMD/ATI Copies (always have, always will) NVIDIA invents. There is an extremely large differnce. If you don't know, just look at their drivers. AMD's SUCK. SUCK SUCK. SUCK on the most ultimate level. NVIDIA's simpley dont. (nt4.dll is one nasty driver error that NVIDIA should be shot for, but still super tiny compared to the bazillion driver errors that go with the bloated Catalyst drivers).

    Seriously. I adore AMD... especially their processors and I love what they are doing with their GPU's but they are always following, never leading when it comes to GPU technology. Once they start engineering instead of stealing (nvidia's tech) then we can talk. But until then.... you are simply biased if you don't see that NVIDIA is a better graphic card company that ATI/AMD.
  • -2 Hide
    markem , May 1, 2012 12:36 PM
    Wow, so nvidia wants to follow apple (The biggest patent troll), maybe nvidia should learn something from current events (aka samsung vs apple law suits), especially if they have not even created the tech or maybe intel and amd should start applying for patents, then we will see how far CPU and gpu progresses...

    nVidia wants to kill progression and competition as is apple currently (In the end - troll always loses)
  • 2 Hide
    blazorthon , May 1, 2012 2:07 PM
    markemWow, so nvidia wants to follow apple (The biggest patent troll), maybe nvidia should learn something from current events (aka samsung vs apple law suits), especially if they have not even created the tech or maybe intel and amd should start applying for patents, then we will see how far CPU and gpu progresses...nVidia wants to kill progression and competition as is apple currently (In the end - troll always loses)


    How is this trolling? This is Nvidia patenting a new basis for their next architecture. This is no worse than Nvidia patenting Fermi and Kepler.

    kronos_corneliusNvidia is definitely no patent troll... To accuse the company of such is to confuse the meaning of the word.I don't think I get the patent from this short description. Given that they have general processing cores (CUDA) and graphics cores(regular GL), is this patent only for graphics rendering ? or general processing ?


    This should be used for gaming and compute workloads. Compute often scales the cores far better than gaming does, so technically, it's probably more for gaming than compute.

    I'll try to explain this a little. If I make a GPU with 1 core, then it won't have multi-core scaling issues. If I make a GPU with 64 cores, then it probably still won't have scaling problems. However, every time you add more cores, the distance between each core (and the other hardware in the GPU, all measured in transistors between each core and the other hardware) increases. Eventually, this leads to scaling problems. IE if you have a GPU with 1024 cores and they are in a monolithic block, then the ones furthest away from whatever they need to talk to (such as memory controllers and other hardware, or even the other cores on the other side of the chip) have to wait long times to talk to those other components. This decreases scaling (among other causes for the scaling decreases).

    One thing that AMD and Nvidia has done to decrease this problem is use the shader cores in blocks (IE Kepler has blocks of 192 shaders, Fermi had blocks of 64 shaders, I don't remember Cayman's and GCN's block sizes, but they were probably something like that). By using multiple smaller blocks instead of a single, large block, scaling improves somewhat because the cores only need to talk to other pieces of the block instead of hardware on the other side of the GPU.

    However, with so many blocks becoming necessary or very large blocks (Kepler's 192 core blocks are very large for such blocks, yet it still needed 8 of these blocks for the GK104, each almost as large by transistor count as one of the lower end Fermi GPUs), this is no longer enough to keep scaling up. A hierarchical structure can improve this scaling more. For example, look at Bulldozer. Sure, it's called a modular architecture, but think about it. Instead of cores, it has modules. That's like adding an additional hierarchical level to the CPU, rather than just throwing more cores into the same level. It's not really thr same as what Nvidia is trying to do (more comparable to the change from a ingle block of cores to multiple blocks), but just look at how it affected scaling. Under Windows 8, an OS optimized for this, Bulldozer shows incredible performance scaling with many cores (even if each core is slower than a Phenom II core, which was already pretty slow for this day and age when compared to a Nehalem core, let alone a Sandy or Ivy Bridge core).

    What this style of computing is supposed to be like (as far as I can tell) is that it will change the way the cores and hardware talk to each other and process the data. It will make the scheduling process into a more hierarchical structure at the hardware level. IE, instead of having the hardware strewn about the GPU, each block will instead of being a semi-independent component that does more or less all of the processing within the schedule for the data that goes into the block (a portion of the screen's pixels), each *block* will do all of the processing for a specific part of the schedule. One block will do something like the first part of a work, then shuffle the data to the next block for it to do the next part of the work, and so on. This could help improve the scaling because it should reduce the amount of data that get's shuffled all over each block and the GPU and make the data shuffling more uniform and more from certain points to others (instead of it going from all over the GPU to some other place all over the GPU, it will go from one block to the next block in the line, providing a more ordered and focused path from start to finish).

    So, this would improve parallelism (pretty much multi-core scaling, IE scaling between splitting the load of each step in the schedule between multiple more or less identical parts, such as splitting the load of a job between multiple shader cores) because everything that needs to talk will be right next to each other and the data transfers can be more optimized for because they will be more uniform and predictable. For example, a bus such as those extremely fast ring buses that Intel uses (another notable example would be the Cell chip in the PS3, if I remember correctly) could be put into use.

    However, I'm pretty much grasping at straws at this point, so if an engineer who knows more about this could come here and clear things up for us, I think I can speak for at least most of the commenters on this article if I say that we would be grateful.
  • -5 Hide
    Anonymous , May 2, 2012 1:34 AM
    antilycus: AMD/ATI Copies (always have, always will) NVIDIA invents. There is an extremely large differnce. If you don't know, just look at their drivers. AMD's SUCK. SUCK SUCK. SUCK on the most ultimate level. NVIDIA's simpley dont. (nt4.dll is one nasty driver error that NVIDIA should be shot for, but still super tiny compared to the bazillion driver errors that go with the bloated Catalyst drivers).

    Seriously. I adore AMD... especially their processors and I love what they are doing with their GPU's but they are always following, never leading when it comes to GPU technology. Once they start engineering instead of stealing (nvidia's tech) then we can talk. But until then.... you are simply biased if you don't see that NVIDIA is a better graphic card company that ATI/AMD.

    A bit of information on how top companies stay on top. First be the only worthwhile thing in said area. Then market it as the only one in said area. Time passes.... Your company is basically the standard now. Make it very hard for competition to rise up against you by....setting up "pitfalls" they have to jump through to catch up. Time is the biggest factor. Time gives the revenue to funnel into R&D when a competitor peaks it head.

    Can someone tell the last time you saw a commercial for AMD/ATI for a processor or a game that starts with an AMD logo telling you its the shiznit?

  • 2 Hide
    alextheblue , May 2, 2012 1:48 AM
    antilycusSeriously. I adore AMD...
    Really, where? Where you claim that they only copy and follow Nvidia? Where you bash their drivers while ignoring the fact that they've come a long way and most people have no issues anymore? Or where you ignore Nvidia driver issues like "Whoops we turned fan control off for some of you, have fun lighting your GPU on fire"? Yeah, you forgot about that one, didn't you. I think it was 196.75? It was a WHQL release, too.

    Guess what? They both screw up. Don't put Nvidia up on such a pedestal. Neither one of them has flawless driver records. But they're both better than Intel, in the graphics driver department. Regarding "copying" you have no idea what goes into designing a modern GPU or how long the process takes. If all they did was copy they'd be generations behind. Not to mention that they sometimes beat Nvidia to market with a new architecture, and vice versa.
  • 1 Hide
    alphaalphaalpha , May 2, 2012 2:26 AM
    dreadlokzOne day patents will die and the tech will be free! This will be the first day after the apocalipse =)


    No patents at all would be a worse system than keeping the patent system in it's current form, although only somewhat worse. That would destroy the right to intellectual property, meaning that anyone could use anything they wanted to. If I make a new architecture and make some GPUs, then I would be pretty angry if some jerk tries to use my architecture without my permission, especially if they try to ruin my business with my architecture. I would like to have a system of laws protecting my invention even if it's a semi-broken system of laws.