Sign in with
Sign up | Sign in

Core i7-980X: Do You Want Six Cores Or 12 Threads?

Core i7-980X: Do You Want Six Cores Or 12 Threads?
By

Intel first used Hyper-Threading when it introduced the Pentium 4 “Northwood” processor at 3.06 GHz and the Xeon MP “Foster” series in 2002. The proprietary technology's main purpose is to improve processor utilization through increased parallelization. With the latest Core i7-980X and its six physical cores, Hyper-Threading yields 12 logical cores on desktop PCs.

This raises the question: how much of the software that you run truly takes advantage of eight or more threads? Is Hyper-Threading good or bad for power efficiency? And wouldn’t it make more sense to stay with six physical cores, rather than risking performance hits caused by less-heavily-threaded applications unnecessarily distributing workloads to logical units?

Intel’s Gulftown implements Hyper-Threading to provide 12 virtual processing cores. Serious performance benefits can only be found in a few, specific applications.

Hyper-Threading History

Hyper-Threading was introduced almost out of necessity. Because the Pentium 4 processor employed a rather long instruction pipeline, it was imperative to ramp up operating clocks as quickly as possible and keep the pipeline busy. Therefore, Intel duplicated the units that store the architectural state, allowing a Hyper-Threaded core to appear as two logical processors to the operating system. The scheduler could dispatch two threads or processes simultaneously, and if Intel’s branch prediction worked well, it would ensure that instructions got loaded and executed efficiently.

The benefits for the Pentium 4 were mainly increased system responsiveness on single-core systems and small performance gains on applications. However, this applied to the desktop space. In servers, where parallel processing is key, Hyper-Threading showed more impact. Naturally, this was a reflection on the software industry at the time. Applications written for desktop users weren't threaded yet, since the hardware enabling this usage wasn't around. Initially, Hyper-Threading got a bad rap because it failed to improve performance in those titles that ran in a single thread.

… and the Return

With the arrival of the Core 2 processor, Hyper-Threading disappeared. But Intel decided to resurrect it with the Nehalem micro-architecture, which is the basis for all Core i7, i5, and i3 CPUs available today—including the just-released six-core Core i7-980X.

The situation is much different today than when Hyper-Threading made its first rounds. For starters, software developers are much more in tune with the hardware ecosystem, so it's uncommon to find a popular title that can benefit from parallelism and isn't threaded. Beyond that, AMD currently can't apply pressure to Intel in the performance segment, and Hyper-Threading has turned into a value-added feature and series differentiator, rather than a must-have innovation. With six physical cores, does Hyper-Threading really make sense?

We decided to look at the quad-core Core i7-975 Extreme Edition (Bloomfield) alongside the new six-core Core i7-980X (Gulftown) and compare performance, as well as power efficiency, using our updated platform benchmark suite.

Display 71 Comments.
This thread is closed for comments
Top Comments
  • 27 Hide
    ta152h , March 22, 2010 7:33 AM
    Quote "The basic idea behind an instruction pipeline is to structure processing into independent steps, and putting more steps into a pipeline translates into higher execution throughput, especially at high clock speeds."

    That's really quite convoluted, and not even accurate. Apparently, the author of this doesn't really understand pipelining.

    Back in the bad old days of the 386, only one instruction was worked on at a time. There were separate parts of the 386, so it's not entirely true, but basically one instruction was being worked and after it was done, the next was started. Now, let's say this instruction took three clock cycles to perform, that's pretty much all you could do during those clock cycles (again, there was some parallelization on the 386, like memory pre-fetching, but I'm simplifying to illustrate the point).

    The 486 was a scalar processor, meaning it had pipelines. Now, let's say we have four stages in our pipeline. The first instruction starts on clock cycle one, and goes into stage one. Clock cycle two sees the first instruction go to stage two, and the next instruction go to stage one. The next cycle sees them move on down. The benefit is, without mispredictions or stalls, you can push out an instruction per cycle. You're parallelizing your execution since more than one instruction is being worked on at the same time in a different stage of the pipeline. Of course, they added more parallelization with more than one pipeline, but that's a different technology called "super-scalar".

    The other remark, which is just worded badly is " and putting more steps into a pipeline translates into higher execution throughput, especially at high clock speeds."

    The extra steps mean less work is done per cycle, so each cycle can take less time, meaning higher clock speeds. It's not "especially" at high clock speeds, the high clock speeds are why super-pipelined processors can execute quickly, as they would otherwise be slower since they have greater branch mispredict penalties.

    Having said that, the Pentium 4 was poor at Hyper-Threading, vis-a-vis the Nehalem for a different reason. The trace-cache was quite small, and there was only one decoder in front of it. Even on one thread, the cache misses on this processor crippled the performance of it, as it was only running as a scalar processor far too often. Add in Hyper-Threading, and you lower the cache hit rate even worse, and you're still limited by the one decoder in front of the trace-cache, so you do have the potential of lowering performance in some situations.

    The Nehalem doesn't have a trace-cache (although there is some loop caching after the decoders which was added to save power), and has far greater decoder capabilities because of it. It is also wider, so shows greater benefits since it's architecture is better suited for it.
  • 11 Hide
    uh_no , March 22, 2010 6:58 AM
    p1n3apqlexpr3ssSo using this theory of how HT isnt really that useful... the i3s are nothing better than a better architectured version of the C2D e8xxxs?

    or you could ignore the microarchitechtural differences.....
  • 10 Hide
    kokin , March 22, 2010 9:44 AM
    shreeharshaWhat about use in gaming. Is HT useless for gaming....

    It's like you never even bothered to read any of the HT-related articles Tom's has published... Most games don't even make use of quad cores, so expect most of them to also not make use of 8-12 logical cores. If you plan to play FSX or GTA4, then you'll probably see a small benefit as these two games rely heavily on the CPU, but having HT is no game changer by any means. This is why Tom's always recommends the Intel i5-750/AMD Phenom II 955/965 and any other CPUs below that.

    If your main purpose is gaming, stick with an AMD CPU. Otherwise, for any other type of work+gaming, go for the Intel CPUs.
Other Comments
  • 3 Hide
    shin0bi272 , March 22, 2010 6:18 AM
    One of the issues I have tried to get the guys at dbpoweramp to see is that at least with HT you have the opportunity to do twice the number of tracks (12 in the case of the 980) even if it doesnt actually finish much faster it's still working on all 12 at once. They so far have not adjusted their converter to support HT in multicore cpus though.
  • -4 Hide
    Lutfij , March 22, 2010 6:42 AM
    here we go AGAIN!

    then intel will wipe the HT off of the rest of its new processors line ups - i'm beginning to see this feature as a limited edition...back in 2004 i couldnt get my hands on 1 since it was darn expensive, coupled with a board that had HT support.
  • -5 Hide
    p1n3apqlexpr3ss , March 22, 2010 6:48 AM
    So using this theory of how HT isnt really that useful... the i3s are nothing better than a better architectured version of the C2D e8xxxs?
  • 11 Hide
    uh_no , March 22, 2010 6:58 AM
    p1n3apqlexpr3ssSo using this theory of how HT isnt really that useful... the i3s are nothing better than a better architectured version of the C2D e8xxxs?

    or you could ignore the microarchitechtural differences.....
  • 27 Hide
    ta152h , March 22, 2010 7:33 AM
    Quote "The basic idea behind an instruction pipeline is to structure processing into independent steps, and putting more steps into a pipeline translates into higher execution throughput, especially at high clock speeds."

    That's really quite convoluted, and not even accurate. Apparently, the author of this doesn't really understand pipelining.

    Back in the bad old days of the 386, only one instruction was worked on at a time. There were separate parts of the 386, so it's not entirely true, but basically one instruction was being worked and after it was done, the next was started. Now, let's say this instruction took three clock cycles to perform, that's pretty much all you could do during those clock cycles (again, there was some parallelization on the 386, like memory pre-fetching, but I'm simplifying to illustrate the point).

    The 486 was a scalar processor, meaning it had pipelines. Now, let's say we have four stages in our pipeline. The first instruction starts on clock cycle one, and goes into stage one. Clock cycle two sees the first instruction go to stage two, and the next instruction go to stage one. The next cycle sees them move on down. The benefit is, without mispredictions or stalls, you can push out an instruction per cycle. You're parallelizing your execution since more than one instruction is being worked on at the same time in a different stage of the pipeline. Of course, they added more parallelization with more than one pipeline, but that's a different technology called "super-scalar".

    The other remark, which is just worded badly is " and putting more steps into a pipeline translates into higher execution throughput, especially at high clock speeds."

    The extra steps mean less work is done per cycle, so each cycle can take less time, meaning higher clock speeds. It's not "especially" at high clock speeds, the high clock speeds are why super-pipelined processors can execute quickly, as they would otherwise be slower since they have greater branch mispredict penalties.

    Having said that, the Pentium 4 was poor at Hyper-Threading, vis-a-vis the Nehalem for a different reason. The trace-cache was quite small, and there was only one decoder in front of it. Even on one thread, the cache misses on this processor crippled the performance of it, as it was only running as a scalar processor far too often. Add in Hyper-Threading, and you lower the cache hit rate even worse, and you're still limited by the one decoder in front of the trace-cache, so you do have the potential of lowering performance in some situations.

    The Nehalem doesn't have a trace-cache (although there is some loop caching after the decoders which was added to save power), and has far greater decoder capabilities because of it. It is also wider, so shows greater benefits since it's architecture is better suited for it.
  • 2 Hide
    shreeharsha , March 22, 2010 9:26 AM
    What about use in gaming. Is HT useless for gaming....
  • 10 Hide
    kokin , March 22, 2010 9:44 AM
    shreeharshaWhat about use in gaming. Is HT useless for gaming....

    It's like you never even bothered to read any of the HT-related articles Tom's has published... Most games don't even make use of quad cores, so expect most of them to also not make use of 8-12 logical cores. If you plan to play FSX or GTA4, then you'll probably see a small benefit as these two games rely heavily on the CPU, but having HT is no game changer by any means. This is why Tom's always recommends the Intel i5-750/AMD Phenom II 955/965 and any other CPUs below that.

    If your main purpose is gaming, stick with an AMD CPU. Otherwise, for any other type of work+gaming, go for the Intel CPUs.
  • 7 Hide
    Tomtompiper , March 22, 2010 10:16 AM
    Any chance of doing some Linux tests to see if there are any benefits to running HT?
  • 1 Hide
    NucDsgr , March 22, 2010 10:32 AM
    Next time a hyperthreading comparison article is written, make sure you include the memory requirements for the application with and without hyperthreading. Hyperthreading requires more memory to an extend that is surprising.
  • 0 Hide
    amnotanoobie , March 22, 2010 10:38 AM
    kokinThis is why Tom's always recommends the Intel i5-750/AMD Phenom II 955/965 and any other CPUs below that.If your main purpose is gaming, stick with an AMD CPU. Otherwise, for any other type of work+gaming, go for the Intel CPUs.


    Plus (if you're gaming), you're probably better off saving the cash for a better video card.

    Even a lightly OCed 920 is overkill for a lot of games today.

    Quote:
    Next time a hyperthreading comparison article is written, make sure you include the memory requirements for the application with and without hyperthreading. Hyperthreading requires more memory to an extend that is surprising.



    Why would HT require more memory when it is just less of a real full-pledged core?


  • -1 Hide
    nekoangel , March 22, 2010 10:52 AM
    not much of a fan of hyper threading yet on an industry side though multi cores and visualization and cloud computing are just plain awesome. so much feasibility has been unlocked due to multi cores. I can see hyper threading still waiting for both the OS and applications. Apparently programing for more than one core is supposed to be quite the challenge and to make use of multi cores with hyper threading an even greater, though I look forward to condensing server farms even more in the future.
  • 1 Hide
    Hilarion , March 22, 2010 11:12 AM
    In our data processing and image conversion realm of business with current business software, hyper-threading has proved to be a hard and heavy hit on performance. We have realized greater speed by shutting off the hyper-threading though I had to prove it to our system admin by having him run benches on our systems with and without hyper-threading. None of our data acquisition/conversion machines are currently utilizing hyper-threading for our business applications.

    Software in the business world has a long way to go yet to catch up with our hardware.

    The only place I am aware that we are using hyper-threading at all is in our virtualized servers whose performance is out of my purview.
  • 0 Hide
    JohnnyLucky , March 22, 2010 11:16 AM
    Will software developers be able to keep up with Intel?
  • 2 Hide
    neiroatopelcc , March 22, 2010 11:47 AM
    JohnnyLuckyWill software developers be able to keep up with Intel?


    In games this is only going to happen if the engine designers (id, valve etc) figure out ways to put as much of their graphics engine into new threads as possible. The base graphics engine cannot be threaded if Direct3D is being used, which pretty much limits performance gains by having more threads. AI, physics, data aquisition and interface related things can still be moved to other threads, but if the graphics engine is bottlenecked by an insufficient core speed it won't help. *

    nekoangelnot much of a fan of hyper threading yet on an industry side though multi cores and visualization and cloud computing are just plain awesome. so much feasibility has been unlocked due to multi cores. I can see hyper threading still waiting for both the OS and applications. Apparently programing for more than one core is supposed to be quite the challenge and to make use of multi cores with hyper threading an even greater, though I look forward to condensing server farms even more in the future.

    It is quite beneficial on terminal servers and in virtualization enviroments. It won't make a difference on file servers and database servers with limited memory, but it does work. Also remember that any pre 2008 server only uses one core for networking, so network intensive stuff won't benefit at all - but that won't benefit from real cores either.

    NucDsgrNext time a hyperthreading comparison article is written, make sure you include the memory requirements for the application with and without hyperthreading. Hyperthreading requires more memory to an extend that is surprising.

    Really? I haven't noticed any difference in memory usage with it enabled or not. I usually have about 2-3GB memory that windows uses as file cache because it has no other use for it.

    shreeharshaWhat about use in gaming. Is HT useless for gaming....

    I've tested serveral games. Running the game at native resolution on a 22" display and having resmon and taskman running on a secondary monitor.
    I don't know how, but appearently windows 7 knows which cores are real and which are not. For instance Dragon age uses only 4 cores - and somehow windows makes sure it uses the real ones ie. you won't see core #0 and #1 being loaded at the same time unless you start something else in addition to dragon age.

    Most games I've seen so far don't use more than one or two cores. Some newer seem to use three or four, but I've yet to see a game that uses all eight.

    * don't know if the problem has been solved with the new directx versions ; I'm guessing it hasn't.
  • 2 Hide
    soky602 , March 22, 2010 12:50 PM
    hoof_heartedBut can it play Crysis?

    Why can't people save this over used line for new GPU's, obviously a multiple core CPU can run it, but you're not going to see anything if not for a GPU.
  • 0 Hide
    eaclou , March 22, 2010 1:01 PM
    Thanks for including Cinebench results - I'd love to see this included in future CPU articles, and maybe GPU for OpenGL performance.
  • 3 Hide
    Anonymous , March 22, 2010 1:08 PM
    All these benchmarks take only one running application into account. What if you're zipping something, converting a video, and doing some photoshop while listening to music all at the same time? Then what!
  • 6 Hide
    tipoo , March 22, 2010 1:38 PM
    You guys see the leaked pricing for AMD's Phenom II X6 on Techreport? Apparently its 200 dollars, or 300 for the BE. That in comparison to over 1000 for this. Intel may have the performance crown, but that price jump is MASSIVE!
Display more comments