Larrabee: Intel's New GPU

The Vector Unit And The New Instruction Set

But as you can well imagine, the Pentium cores aren’t what gives Larrabee its processing power. To be able to compete with GPUs on their home playing field, you need a lot more than an FPU or even SSE. So Intel equipped each core with a vector unit operating on 16 elements simultaneously (compared to four for SSE or the Cell’s SPUs). These units are capable of operating on integers, single-precision floating-point numbers, and double-precision floating-point numbers. While the throughput is consequently reduced by half, it is still greater than current GPUs, which are between two and four times slower in the case of AMD’s and practically 10 times slower than Nvidia’s when moving from single to double precision.

Rather than extending the SSE instruction set (again) to support the new vector unit, the Intel engineers created a new one, called Larrabee new instructions (LRBni). Intel is rather vague about the instructions supported at the moment, but we should learn more about that at the upcoming Game Developers Conference (GDC). Intel plans several press conferences at the trade show during which Michael Abrash, of RAD Game Tools, and Intel’s Tom Forsyth should communicate details about the instruction set.

We do already know several things, however: The instruction set supports up to three operands, enabling implementation of multiply-and-add (MAD) instructions and also execution of non-destructive operations, unlike SSE, in which one of the source registers is overwritten to write the value of the result. Compared to the VMX instruction set found in the Cell’s PowerPC Processing Element (PPE), for example, which operates only on registers, here one of the operands can be read directly from the L1 cache, enabling its use as an extended register file. This unit is also very flexible, since it can reorganize the data in a register or execute various conversions in the “exotic” formats frequently found in GPUs without loss of performance, or in the worst case, with only a slight reduction in performance. These conversions can be executed directly at the time the data is loaded from cache memory, allowing them to be stored in memory in a compact form, which maximizes the quantity of data contained in the cache memory.

Another interesting particularity of the unit is its ability to execute scatter/gather operations, which are typically problematic in a GPU. SIMD units are generally very constraining when it comes to memory access. A vector is read in memory from a single address that often has particular constraints regarding memory alignment. Larrabee is much more flexible. It’s possible to load or store the 16 elements of a vector in memory from 16 different addresses contained in another vector. Obviously, totally incoherent memory accesses will negatively impact the cache memory, and in the worst case, up to 16 cycles will be necessary to perform this type of operation (a maximum of one line of cache is read per cycle).

  • thepinkpanther
    very interesting, i know nvidia cant settle for being the second best. As always its good for the consumer.
    Reply
  • IzzyCraft
    Yes interesting, but intel already makes like 50% of every gpu i rather not see them take more market share and push nvidia and amd out although i doubt it unless they can make a real performer, which i have no doubt on paper they can but with drivers etc i doubt it.
    Reply
  • I wonder if their aim is to compete to appeal to the gamer market to run high end games?
    Reply
  • Alien_959
    Very interesting, finally some more information about Intel upcoming "GPU".
    But as I sad before here if the drivers aren't good, even the best hardware design is for nothing. I hope Intel invests more on to the software side of things and will be nice to have a third player.
    Reply
  • crisisavatar
    cool ill wait for windows 7 for my next build and hope to see some directx 11 and openGL3 support by then.
    Reply
  • Stardude82
    Maybe there is more than a little commonality with the Atom CPUs: in-order execution, hyper threading, low power/small foot print.

    Does the duo-core NV330 have the same sort of ring architecture?
    Reply
  • "Simultaneous Multithreading (SMT). This technology has just made a comeback in Intel architectures with the Core i7, and is built into the Larrabee processors."

    just thought i'd point out that with the current amd vs intel fight..if intel takes away the x86 licence amd will take its multithreading and ht tech back leaving intel without a cpu and a useless gpu
    Reply
  • liemfukliang
    Driver. If Intel made driver as bad as Intel Extreme than event if Intel can make faster and cheaper GPU it will be useless.
    Reply
  • IzzyCraft
    Hope for an Omega Drivers equivalent lol?
    Reply
  • phantom93
    Damn, hoped there would be some pictures :(. Looks interesting, I didn't read the full article but I hope it is cheaper so some of my friends with reg desktps can join in some Orginal Hardcore PC Gaming XD.
    Reply