Larrabee: Intel's New GPU

The Vector Unit And The New Instruction Set

But as you can well imagine, the Pentium cores aren’t what gives Larrabee its processing power. To be able to compete with GPUs on their home playing field, you need a lot more than an FPU or even SSE. So Intel equipped each core with a vector unit operating on 16 elements simultaneously (compared to four for SSE or the Cell’s SPUs). These units are capable of operating on integers, single-precision floating-point numbers, and double-precision floating-point numbers. While the throughput is consequently reduced by half, it is still greater than current GPUs, which are between two and four times slower in the case of AMD’s and practically 10 times slower than Nvidia’s when moving from single to double precision.

Rather than extending the SSE instruction set (again) to support the new vector unit, the Intel engineers created a new one, called Larrabee new instructions (LRBni). Intel is rather vague about the instructions supported at the moment, but we should learn more about that at the upcoming Game Developers Conference (GDC). Intel plans several press conferences at the trade show during which Michael Abrash, of RAD Game Tools, and Intel’s Tom Forsyth should communicate details about the instruction set.

We do already know several things, however: The instruction set supports up to three operands, enabling implementation of multiply-and-add (MAD) instructions and also execution of non-destructive operations, unlike SSE, in which one of the source registers is overwritten to write the value of the result. Compared to the VMX instruction set found in the Cell’s PowerPC Processing Element (PPE), for example, which operates only on registers, here one of the operands can be read directly from the L1 cache, enabling its use as an extended register file. This unit is also very flexible, since it can reorganize the data in a register or execute various conversions in the “exotic” formats frequently found in GPUs without loss of performance, or in the worst case, with only a slight reduction in performance. These conversions can be executed directly at the time the data is loaded from cache memory, allowing them to be stored in memory in a compact form, which maximizes the quantity of data contained in the cache memory.

Another interesting particularity of the unit is its ability to execute scatter/gather operations, which are typically problematic in a GPU. SIMD units are generally very constraining when it comes to memory access. A vector is read in memory from a single address that often has particular constraints regarding memory alignment. Larrabee is much more flexible. It’s possible to load or store the 16 elements of a vector in memory from 16 different addresses contained in another vector. Obviously, totally incoherent memory accesses will negatively impact the cache memory, and in the worst case, up to 16 cycles will be necessary to perform this type of operation (a maximum of one line of cache is read per cycle).

Create a new thread in the US Reviews comments forum about this subject
This thread is closed for comments
95 comments
    Your comment
  • thepinkpanther
    very interesting, i know nvidia cant settle for being the second best. As always its good for the consumer.
    0
  • IzzyCraft
    Yes interesting, but intel already makes like 50% of every gpu i rather not see them take more market share and push nvidia and amd out although i doubt it unless they can make a real performer, which i have no doubt on paper they can but with drivers etc i doubt it.
    6
  • Anonymous
    I wonder if their aim is to compete to appeal to the gamer market to run high end games?
    0
  • Alien_959
    Very interesting, finally some more information about Intel upcoming "GPU".
    But as I sad before here if the drivers aren't good, even the best hardware design is for nothing. I hope Intel invests more on to the software side of things and will be nice to have a third player.
    0
  • crisisavatar
    cool ill wait for windows 7 for my next build and hope to see some directx 11 and openGL3 support by then.
    0
  • Stardude82
    Maybe there is more than a little commonality with the Atom CPUs: in-order execution, hyper threading, low power/small foot print.

    Does the duo-core NV330 have the same sort of ring architecture?
    0
  • Anonymous
    "Simultaneous Multithreading (SMT). This technology has just made a comeback in Intel architectures with the Core i7, and is built into the Larrabee processors."

    just thought i'd point out that with the current amd vs intel fight..if intel takes away the x86 licence amd will take its multithreading and ht tech back leaving intel without a cpu and a useless gpu
    -10
  • liemfukliang
    Driver. If Intel made driver as bad as Intel Extreme than event if Intel can make faster and cheaper GPU it will be useless.
    2
  • IzzyCraft
    Hope for an Omega Drivers equivalent lol?
    3
  • phantom93
    Damn, hoped there would be some pictures :(. Looks interesting, I didn't read the full article but I hope it is cheaper so some of my friends with reg desktps can join in some Orginal Hardcore PC Gaming XD.
    1
  • Slobogob
    I was quite suprised by the quality of this article and am quite eager to see the follow up.
    9
  • JeanLuc
    Well I am looking forward to Larrabee but I'll keep my optimisim under wraps until I start seeing some screenshots of Larabee in action playing real games i.e. not Intel demo's.

    I wonder just how compatible larrabee is going to be with older games?
    1
  • tipoo
    Great article! Keep ones like this coming!
    3
  • tipoo
    IzzyCraftHope for an Omega Drivers equivalent lol?



    That would be FANTASTIC! Maybe the same people who make the Omega drivers could make alternate Larrabee drivers? We all know Intel sucks balls at drivers.
    -2
  • armistitiu
    So this is Intel's approach to a GPU... we put lots of simple x86 cores in it , add SMT and vector operations and hope that they would do the job of a GPU. IMHO Larrabee will be a complete failure as GPU but as an x86 CPU that is highly parallel this thing could screw AMD's FireStream and NVIDIA's CUDA (OPENCL too) beacause it's x86 and the programming is pretty popular for this kind of architecture.
    7
  • wicko
    IzzyCraftYes interesting, but intel already makes like 50% of every gpu i rather not see them take more market share and push nvidia and amd out although i doubt it unless they can make a real performer, which i have no doubt on paper they can but with drivers etc i doubt it.

    Yeah but that 50% includes all the integrated cards that no consumer even realizes they're buying most of the time.. but not in discrete cards. I'd like to see a bit more competition on the discrete side.
    0
  • B-Unit
    wtfnl"Simultaneous Multithreading (SMT). This technology has just made a comeback in Intel architectures with the Core i7, and is built into the Larrabee processors." just thought i'd point out that with the current amd vs intel fight..if intel takes away the x86 licence amd will take its multithreading and ht tech back leaving intel without a cpu and a useless gpu


    Umm, what makes you think that AMD pioneered multi-threading? And Intel doesnt use HyperTransport, so they cant take it away.
    2
  • justaguy
    Now we know what they're trying to do with it. There's still no indication if it will work or not.

    I really don't see the 1st gen. being successful-it's not like AMD and nVidia are goofing around waiting for Intel to join up and show them a real GPU. Although there's no numbers on this that I've seen, I'm thinking Larry's going to have a pretty big die size to fit all those mini-cores so it better perform, because it will cost a decent sum.
    1
  • crockdaddy
    I would mention ... "but will it play crysis" but I am not sure how funny that is anymore.
    8
  • Pei-chen
    Can't wait for Larrabee; hopefully a single Larrabee can have the performance of 295. Nvidia and ATI are slacking as they know they can price fixing and stop coming out with better GPU, just more cards with the same old GPU.
    -4