Intel Core i7 (Nehalem): Architecture By AMD?

Page 6 of 13:

SMT Implementation

Still, the impact of SMT on performance is positive most of the time and the cost in terms of resources is still very limited, which explains why the technology is making a comeback. But programmers will have to pay attention because with Nehalem, all threads are not created equal. To help solve this puzzle, Intel provides a way of precisely determining the exact topology of the processor (the number of physical and logical processors), and programmers can then use the operating system affinity mechanism to assign each thread to a processor. This kind of thing shouldn’t be a problem for game programmers, who are already in the habit of working that way because of the way the Xenon processor (the one used in the Xbox 360) works. But unlike consoles, where programmers have very low-level access, on a PC the operating system’s thread scheduler will always have the last word.

Since SMT puts a heavier load on the out-of-order execution engine, Intel has increased the size of certain internal buffers to avoid turning them into bottlenecks. So the reorder buffer, which keeps track of all the instructions being executed in order to reorder them, has increased from 96 entries on the Core 2 to 128 entries on Nehalem. In practice, since this buffer is partitioned statically to keep any one thread from monopolizing all the resources, its size is reduced to 64 entries for each thread with SMT. Obviously, in cases where a single thread is executed, it has access to all the entries, which should mean that there won’t be any specific situations where Nehalem turns out to have worse performance than its predecessor.

The reservation station, which is the unit in charge of assigning instructions to the different execution units, has also increased in size: from 32 to 36 entries. But unlike the reorder buffer, here partitioning is dynamic, so that a thread can take up more or fewer entries as a function of its needs.

Two other buffers have also been resized: the load buffer and the store buffer. The former has 48 entries as opposed to 32 with Conroe, and the latter 32 instead of 20. Here too, partitioning between threads is static.

Another consequence of the return of SMT is that the performance of thread synchronization instructions has improved, according to Intel.

Current page: SMT Implementation

Prev Page The Return Of Hyper-Threading Next Page SSE 4.2 And Power Consumption

TOPICS

30 Comments Comment from the forums

cl_spdhax1

good write-up, cant wait for the new architecture , plus the "older" chips are going to become cheaper/affordable. big plus.
Reply
neiroatopelcc

No explaination as to why you can't use performance modules with higher voltage though.
Reply
neiroatopelcc

AuDioFreaK39TomsHardware is just now getting around to posting this?Not to mention it being almost a direct copy/paste from other articles I've seen written about Nehalem architecture.I regard being late as a quality seal really. No point being first, if your info is only as credible as stuff on inquirer. Better be last, but be sure what you write is correct.

Reply
cangelini

AuDioFreaK39TomsHardware is just now getting around to posting this?Not to mention it being almost a direct copy/paste from other articles I've seen written about Nehalem architecture.
Perhaps, if you count being translated from French.
Reply
randomizer

Yea, 13 pages is quite alot to translate. You could always use google translation if you want it done fast :kaola:
Reply
Duncan NZ

Speaking of french... That link on page 3 goes to a French article that I found fascinating... Would be even better if there was an English version though, cause then I could actually read it. Any chance of that?

Nice article, good depth, well written
Reply
neiroatopelcc

randomizerYea, 13 pages is quite alot to translate. You could always use google translation if you want it done fast I don't know french, so no idea if it actually works. But I've tried from english to germany and danish, and viseversa. Also tried from danish to german, and the result is always the same - it's incomplete, and anything that is slighty technical in nature won't be translated properly. In short - want it done right, do it yourself.
Reply
neiroatopelcc

I don't think cangelini meant to say, that no other english articles on the subject exist.
You claimed the article on toms was a copy paste from another article. He merely stated that the article here was based on a french version.
Reply
enewmen

Good article.
I actually read the whole thing.
I just don't get TLP when RAM is cheap and the Nehalem/Vista can address 128gigs. Anyway, things have changed a lot since running Win NT with 16megs RAM and constant memory swapping.
Reply
cangelini

I can't speak for the author, but I imagine neiro's guess is fairly accurate. Written in French, translated to English, and then edited--I'm fairly confident they're different stories ;)
Reply

Show more comments