Larrabee: Intel's New GPU

In Detail: The Scalar Unit And SMT

Now, let’s look at the cores in detail. As we said, they’re based on the Pentium’s design, while Intel has also made some significant modifications. The legacy of the P54C is undeniable in the scalar unit, which uses Pentium’s superscalar execution pipeline with two units, U and V.

The first is capable of executing all scalar x86 instructions, while the second is limited to a fairly complete subset (excluding, for example, complex arithmetic and logical instructions like multiplication and division). However, Intel has made several modifications to the Pentium core. First of all the engineers added 64-bit support, and they also added several instructions for controlling the level two cache memory. These instructions are especially important with streaming-type applications that don’t follow the principle of temporal locality found in traditional applications. That is, once an operation has been executed for the data, they’re certain not to be used again within a short period of time.

This behavior tends to prove disastrous with the LRU algorithm cache memories use, which will spend its time discarding important data to cache data that will be used only once. Aware of this problem, the Larrabee’s engineers added instructions for marking lines of cache data as a low priority, indicating that the data in them can be replaced as soon as they’ve been accessed. In this way, Intel has combined the best of both worlds: scratchpad-type (buffer memory) operation and the transparence of a standard cache memory, with a mechanism for coherence among the caches of the different cores.

Another change consisted of adding Simultaneous Multithreading (SMT). This technology has just made a comeback in Intel architectures with the Core i7, and is built into the Larrabee processors, where its importance is increased by the in-order nature of their cores. Modern CPUs are capable of re-organizing the execution of instructions to maximize use of the calculating units, which the Larrabee cores can’t do. Consequently, certain sequences of code can make very little use of resources, but by interlacing several threads, it’s possible to increase that use at a lower cost. If instruction one blocks execution of instruction two of thread A, then all you do is switch threads and execute instruction one on thread B.

The engineers have enabled execution of four threads per core, obviously with separate registers for each. Using four threads also enables the latency of access to the level one cache memory to be covered. In order not to diminish the efficiency of the L1 instruction and data caches, their size was increased from 8 KB each on the Pentium to 32 KB for the Larrabee cores.

  • thepinkpanther
    very interesting, i know nvidia cant settle for being the second best. As always its good for the consumer.
    Reply
  • IzzyCraft
    Yes interesting, but intel already makes like 50% of every gpu i rather not see them take more market share and push nvidia and amd out although i doubt it unless they can make a real performer, which i have no doubt on paper they can but with drivers etc i doubt it.
    Reply
  • I wonder if their aim is to compete to appeal to the gamer market to run high end games?
    Reply
  • Alien_959
    Very interesting, finally some more information about Intel upcoming "GPU".
    But as I sad before here if the drivers aren't good, even the best hardware design is for nothing. I hope Intel invests more on to the software side of things and will be nice to have a third player.
    Reply
  • crisisavatar
    cool ill wait for windows 7 for my next build and hope to see some directx 11 and openGL3 support by then.
    Reply
  • Stardude82
    Maybe there is more than a little commonality with the Atom CPUs: in-order execution, hyper threading, low power/small foot print.

    Does the duo-core NV330 have the same sort of ring architecture?
    Reply
  • "Simultaneous Multithreading (SMT). This technology has just made a comeback in Intel architectures with the Core i7, and is built into the Larrabee processors."

    just thought i'd point out that with the current amd vs intel fight..if intel takes away the x86 licence amd will take its multithreading and ht tech back leaving intel without a cpu and a useless gpu
    Reply
  • liemfukliang
    Driver. If Intel made driver as bad as Intel Extreme than event if Intel can make faster and cheaper GPU it will be useless.
    Reply
  • IzzyCraft
    Hope for an Omega Drivers equivalent lol?
    Reply
  • phantom93
    Damn, hoped there would be some pictures :(. Looks interesting, I didn't read the full article but I hope it is cheaper so some of my friends with reg desktps can join in some Orginal Hardcore PC Gaming XD.
    Reply