Intel Core i7 (Nehalem): Architecture By AMD?

TLB

For many years now, processors have been working not with physical memory addresses, but with virtual addresses. Among other advantages, this approach lets more memory be allocated to a program than the computer actually has, keeping only the data necessary at a given moment in actual physical memory with the rest remaining on the hard disk. This means that for each memory access a virtual address has to be translated into a physical address, and to do that an enormous table is put in charge of keeping track of the correspondences. The problem is that this table gets so large that it can’t be stored on-chip—it’s placed in main memory, and can even be paged (part of the table can be absent from memory and itself kept on the hard disk).

If this translation stage were necessary at each memory access, it would make access much too slow. As a result, engineers returned to the principle of physical addressing by adding a small cache memory directly on the processor that stored the correspondences for a few recently accessed addresses. This cache memory is called a Translation Lookaside Buffer (TLB).Intel has completely revamped the operation of the TLB in their new architecture. Up until now, the Core 2 has used a level 1 TLB that is extremely small (16 entries) but also very fast for loads only, and a larger level 2 TLB (256 entries) that handled loads missed in the level 1 TLB, as well as stores.

Nehalem now has a true two-level TLB: the first level of TLB is shared between data and instructions. The level 1 data TLB now stores 64 entries for small pages (4K) or 32 for large pages (2M/4M), while the level 1 instruction TLB stores 128 entries for small pages (the same as with Core 2) and seven for large pages. The second level is a unified cache that can store up to 512 entries and operates only with small pages. The purpose of this improvement is to increase the performance of applications that use large sets of data. As with the introduction of two-level branch predictors, this is further evidence of the architecture’s server orientation.

Let’s go back to SMT for a moment, since it also has an impact on the TLBs. The level 1 data TLB and the level 2 TLB are shared dynamically between the two threads. Conversely, the level 1 instruction TLB is statically shared for small pages, whereas the one dedicated to large pages is entirely replicated—this is understandable given its small size (seven entries per thread).

  • cl_spdhax1
    good write-up, cant wait for the new architecture , plus the "older" chips are going to become cheaper/affordable. big plus.
    Reply
  • neiroatopelcc
    No explaination as to why you can't use performance modules with higher voltage though.
    Reply
  • neiroatopelcc
    AuDioFreaK39TomsHardware is just now getting around to posting this?Not to mention it being almost a direct copy/paste from other articles I've seen written about Nehalem architecture.I regard being late as a quality seal really. No point being first, if your info is only as credible as stuff on inquirer. Better be last, but be sure what you write is correct.
    Reply
  • cangelini
    AuDioFreaK39TomsHardware is just now getting around to posting this?Not to mention it being almost a direct copy/paste from other articles I've seen written about Nehalem architecture.
    Perhaps, if you count being translated from French.
    Reply
  • randomizer
    Yea, 13 pages is quite alot to translate. You could always use google translation if you want it done fast :kaola:
    Reply
  • Duncan NZ
    Speaking of french... That link on page 3 goes to a French article that I found fascinating... Would be even better if there was an English version though, cause then I could actually read it. Any chance of that?

    Nice article, good depth, well written
    Reply
  • neiroatopelcc
    randomizerYea, 13 pages is quite alot to translate. You could always use google translation if you want it done fast I don't know french, so no idea if it actually works. But I've tried from english to germany and danish, and viseversa. Also tried from danish to german, and the result is always the same - it's incomplete, and anything that is slighty technical in nature won't be translated properly. In short - want it done right, do it yourself.
    Reply
  • neiroatopelcc
    I don't think cangelini meant to say, that no other english articles on the subject exist.
    You claimed the article on toms was a copy paste from another article. He merely stated that the article here was based on a french version.
    Reply
  • enewmen
    Good article.
    I actually read the whole thing.
    I just don't get TLP when RAM is cheap and the Nehalem/Vista can address 128gigs. Anyway, things have changed a lot since running Win NT with 16megs RAM and constant memory swapping.
    Reply
  • cangelini
    I can't speak for the author, but I imagine neiro's guess is fairly accurate. Written in French, translated to English, and then edited--I'm fairly confident they're different stories ;)
    Reply