Sign in with
Sign up | Sign in

In Detail: The Scalar Unit And SMT

Larrabee: Intel's New GPU
By

Now, let’s look at the cores in detail. As we said, they’re based on the Pentium’s design, while Intel has also made some significant modifications. The legacy of the P54C is undeniable in the scalar unit, which uses Pentium’s superscalar execution pipeline with two units, U and V.

The first is capable of executing all scalar x86 instructions, while the second is limited to a fairly complete subset (excluding, for example, complex arithmetic and logical instructions like multiplication and division). However, Intel has made several modifications to the Pentium core. First of all the engineers added 64-bit support, and they also added several instructions for controlling the level two cache memory. These instructions are especially important with streaming-type applications that don’t follow the principle of temporal locality found in traditional applications. That is, once an operation has been executed for the data, they’re certain not to be used again within a short period of time.

This behavior tends to prove disastrous with the LRU algorithm cache memories use, which will spend its time discarding important data to cache data that will be used only once. Aware of this problem, the Larrabee’s engineers added instructions for marking lines of cache data as a low priority, indicating that the data in them can be replaced as soon as they’ve been accessed. In this way, Intel has combined the best of both worlds: scratchpad-type (buffer memory) operation and the transparence of a standard cache memory, with a mechanism for coherence among the caches of the different cores.

Another change consisted of adding Simultaneous Multithreading (SMT). This technology has just made a comeback in Intel architectures with the Core i7, and is built into the Larrabee processors, where its importance is increased by the in-order nature of their cores. Modern CPUs are capable of re-organizing the execution of instructions to maximize use of the calculating units, which the Larrabee cores can’t do. Consequently, certain sequences of code can make very little use of resources, but by interlacing several threads, it’s possible to increase that use at a lower cost. If instruction one blocks execution of instruction two of thread A, then all you do is switch threads and execute instruction one on thread B.

The engineers have enabled execution of four threads per core, obviously with separate registers for each. Using four threads also enables the latency of access to the level one cache memory to be covered. In order not to diminish the efficiency of the L1 instruction and data caches, their size was increased from 8 KB each on the Pentium to 32 KB for the Larrabee cores.

Display all 95 comments.
This thread is closed for comments
  • 0 Hide
    thepinkpanther , March 23, 2009 6:35 AM
    very interesting, i know nvidia cant settle for being the second best. As always its good for the consumer.
  • 6 Hide
    IzzyCraft , March 23, 2009 6:49 AM
    Yes interesting, but intel already makes like 50% of every gpu i rather not see them take more market share and push nvidia and amd out although i doubt it unless they can make a real performer, which i have no doubt on paper they can but with drivers etc i doubt it.
  • 0 Hide
    Anonymous , March 23, 2009 6:50 AM
    I wonder if their aim is to compete to appeal to the gamer market to run high end games?
  • 0 Hide
    Alien_959 , March 23, 2009 8:12 AM
    Very interesting, finally some more information about Intel upcoming "GPU".
    But as I sad before here if the drivers aren't good, even the best hardware design is for nothing. I hope Intel invests more on to the software side of things and will be nice to have a third player.
  • 0 Hide
    crisisavatar , March 23, 2009 8:28 AM
    cool ill wait for windows 7 for my next build and hope to see some directx 11 and openGL3 support by then.
  • 0 Hide
    Stardude82 , March 23, 2009 8:32 AM
    Maybe there is more than a little commonality with the Atom CPUs: in-order execution, hyper threading, low power/small foot print.

    Does the duo-core NV330 have the same sort of ring architecture?
  • 2 Hide
    liemfukliang , March 23, 2009 10:27 AM
    Driver. If Intel made driver as bad as Intel Extreme than event if Intel can make faster and cheaper GPU it will be useless.
  • 3 Hide
    IzzyCraft , March 23, 2009 10:44 AM
    Hope for an Omega Drivers equivalent lol?
  • 1 Hide
    phantom93 , March 23, 2009 11:16 AM
    Damn, hoped there would be some pictures :( . Looks interesting, I didn't read the full article but I hope it is cheaper so some of my friends with reg desktps can join in some Orginal Hardcore PC Gaming XD.
  • 9 Hide
    Slobogob , March 23, 2009 11:51 AM
    I was quite suprised by the quality of this article and am quite eager to see the follow up.
  • 1 Hide
    JeanLuc , March 23, 2009 12:26 PM
    Well I am looking forward to Larrabee but I'll keep my optimisim under wraps until I start seeing some screenshots of Larabee in action playing real games i.e. not Intel demo's.

    I wonder just how compatible larrabee is going to be with older games?
  • 3 Hide
    tipoo , March 23, 2009 12:46 PM
    Great article! Keep ones like this coming!
  • -2 Hide
    tipoo , March 23, 2009 12:48 PM
    IzzyCraftHope for an Omega Drivers equivalent lol?



    That would be FANTASTIC! Maybe the same people who make the Omega drivers could make alternate Larrabee drivers? We all know Intel sucks balls at drivers.
  • 7 Hide
    armistitiu , March 23, 2009 12:49 PM
    So this is Intel's approach to a GPU... we put lots of simple x86 cores in it , add SMT and vector operations and hope that they would do the job of a GPU. IMHO Larrabee will be a complete failure as GPU but as an x86 CPU that is highly parallel this thing could screw AMD's FireStream and NVIDIA's CUDA (OPENCL too) beacause it's x86 and the programming is pretty popular for this kind of architecture.
  • 0 Hide
    wicko , March 23, 2009 1:18 PM
    IzzyCraftYes interesting, but intel already makes like 50% of every gpu i rather not see them take more market share and push nvidia and amd out although i doubt it unless they can make a real performer, which i have no doubt on paper they can but with drivers etc i doubt it.

    Yeah but that 50% includes all the integrated cards that no consumer even realizes they're buying most of the time.. but not in discrete cards. I'd like to see a bit more competition on the discrete side.
  • 2 Hide
    B-Unit , March 23, 2009 1:26 PM
    wtfnl"Simultaneous Multithreading (SMT). This technology has just made a comeback in Intel architectures with the Core i7, and is built into the Larrabee processors." just thought i'd point out that with the current amd vs intel fight..if intel takes away the x86 licence amd will take its multithreading and ht tech back leaving intel without a cpu and a useless gpu


    Umm, what makes you think that AMD pioneered multi-threading? And Intel doesnt use HyperTransport, so they cant take it away.
  • 1 Hide
    justaguy , March 23, 2009 2:02 PM
    Now we know what they're trying to do with it. There's still no indication if it will work or not.

    I really don't see the 1st gen. being successful-it's not like AMD and nVidia are goofing around waiting for Intel to join up and show them a real GPU. Although there's no numbers on this that I've seen, I'm thinking Larry's going to have a pretty big die size to fit all those mini-cores so it better perform, because it will cost a decent sum.
  • 8 Hide
    crockdaddy , March 23, 2009 2:09 PM
    I would mention ... "but will it play crysis" but I am not sure how funny that is anymore.
  • -4 Hide
    Pei-chen , March 23, 2009 2:12 PM
    Can't wait for Larrabee; hopefully a single Larrabee can have the performance of 295. Nvidia and ATI are slacking as they know they can price fixing and stop coming out with better GPU, just more cards with the same old GPU.
Display more comments