You know that cache stores copies of data from various main memory addresses. Because the cache cannot hold copies of the data from all the addresses in main memory simultaneously, there has to be a way to know which addresses are currently copied into the cache so that, if we need data from those addresses, it can be read from the cache rather than from the main memory. This function is performed by Tag RAM, which is additional memory in the cache that holds an index of the addresses that are copied into the cache. Each line of cache memory has a corresponding address tag that stores the main memory address of the data currently copied into that particular cache line. If data from a particular main memory address is needed, the cache controller can quickly search the address tags to see whether the requested address is currently being stored in the cache (a hit) or not (a miss). If the data is there, it can be read from the faster cache; if it isn’t, it has to be read from the much slower main memory.
Various ways of organizing or mapping the tags affect how cache works. A cache can be mapped as fully associative, direct-mapped, or set associative.
In a fully associative mapped cache, when a request is made for data from a specific main memory address, the address is compared against all the address tag entries in the cache tag RAM. If the requested main memory address is found in the tag (a hit), the corresponding location in the cache is returned. If the requested address is not found in the address tag entries, a miss occurs, and the data must be retrieved from the main memory address instead of the cache.
In a direct-mapped cache, specific main memory addresses are preassigned to specific line locations in the cache where they will be stored. Therefore, the tag RAM can use fewer bits because when you know which main memory address you want, only one address tag needs to be checked, and each tag needs to store only the possible addresses a given line can contain. This also results in faster operation because only one tag address needs to be checked for a given memory address.
A set associative cache is a modified direct-mapped cache. A direct-mapped cache has only one set of memory associations, meaning a given memory address can be mapped into (or associated with) only a specific given cache line location. A two-way set associative cache has two sets, so that a given memory location can be in one of two locations. A four-way set associative cache can store a given memory address into four different cache line locations (or sets). By increasing the set associativity, the chance of finding a value increases; however, it takes a little longer because more tag addresses must be checked when searching for a specific location in the cache. In essence, each set in an n-way set associative cache is a subcache that has associations with each main memory address. As the number of subcaches or sets increases, eventually the cache becomes fully associative—a situation in which any memory address can be stored in any cache line location. In that case, an n-way set associative cache is a compromise between a fully associative cache and a direct-mapped cache.
In general, a direct-mapped cache is the fastest at locating and retrieving data from the cache because it has to look at only one specific tag address for a given memory address. However, it also results in more misses overall than the other designs. A fully associative cache offers the highest hit ratio but is the slowest at locating and retrieving the data because it has many more address tags to check through. An n-way set associative cache is a compromise between optimizing cache speed and hit ratio, but the more associativity there is, the more hardware (tag bits, comparator circuits, and so on) is required, making the cache more expensive. Obviously, cache design is a series of trade-offs, and what works best in one instance might not work best in another. Multitasking environments such as Windows are good examples of environments in which the processor needs to operate on different areas of memory simultaneously and in which an n-way cache can improve performance.
The contents of the cache must always be in sync with the contents of main memory to ensure that the processor is working with current data. For this reason, the internal cache in the 486 family was a write-through cache. Write-through means that when the processor writes information to the cache, that information is automatically written through to main memory as well.
By comparison, Pentium and later chips have an internal write-back cache, which means that both reads and writes are cached, further improving performance.
Another feature of improved cache designs is that they are nonblocking. This is a technique for reducing or hiding memory delays by exploiting the overlap of processor operations with data accesses. A nonblocking cache enables program execution to proceed concurrently with cache misses as long as certain dependency constraints are observed. In other words, the cache can handle a cache miss much better and enable the processor to continue doing something nondependent on the missing data.
The cache controller built into the processor also is responsible for watching the memory bus when alternative processors, known as bus masters, control the system. This process of watching the bus is referred to as bus snooping. If a bus master device writes to an area of memory that also is stored in the processor cache currently, the cache contents and memory no longer agree. The cache controller then marks this data as invalid and reloads the cache during the next memory access, preserving the integrity of the system.
All PC processor designs that support cache memory include a feature known as a translation lookaside buffer (TLB) to improve recovery from cache misses. The TLB is a table inside the processor that stores information about the location of recently accessed memory addresses. The TLB speeds up the translation of virtual addresses to physical memory addresses.
As clock speeds increase, cycle time decreases. Newer systems no longer use cache on the motherboard because the faster system memory used in modern systems can keep up with the motherboard speed. Modern processors integrate the L2 cache into the processor die just like the L1 cache, and most recent models include on-die L3 as well. This enables the L2/L3 to run at full-core speed because it is now part of the core.
- Processor Specifications Explained
- Data I/O Bus, Address Bus, And Internal Registers
- Processor Modes: Real Mode
- IA-32 Mode: 32-Bit And Virtual Real
- IA-32e 64-Bit Extension Mode (x64, AMD64, x86-64, EM64T)
- Processor Benchmarks And Comparing Performance
- Processor Efficiency
- Cache Memory
- How Cache Works
- Level 2 And Level 3 Cache
- Cache Performance And Design
- Cache Organization
Shouldn't that be "Foreword?"
Should be 10^2, 10^3, and in the next para, 2^x.
This was very interesting, considering both instructions were supported even by the humble 8086.
These sections seem more or less unchanged, except for the mention of Ivy and Vishera, and i think the CPU-z screenshots are new as well.
This was very interesting, considering both instructions were supported even by the humble 8086.
https://en.wikipedia.org/wiki/X86-64#Older_implementations
Yet at the very least the 80386 supported them:
http://css.csail.mit.edu/6.858/2011/readings/i386/LAHF.htm
So it appears that it was an early-64 bit CPU issue only.
The Prescott introduced 64-bit to the Intel world, not the Core 2. Kind of common knowledge. The Athlon XP had a 36-bit address bus? I don't remember ever seeing that.
Then we go to the misinformation about the 8086/8088 to 386.
In actuality, there were four modes in the 80386. Real, Virtual 86, Protected 286, and Protected 386. Yup, four. And no, Windows 3.0 was not expected to run on an 8088 or 80286, because it DID use Virtual 86, which those processors could not support. You know, the part where they let you go from one DOS task to another. That was in the hardware. And that hardware started with the 80386.
Moreover, the 80286 did NOT have the same instruction set as the 8086. Only in real mode did it. And why do you suppose it was called real mode? Maybe because the addresses were not virtualized? The 80286, as mentioned above, did have virtual addresses in what was called the 80286 Protected Mode. It not only ran Real Mode apps much faster, but when in Protected Mode was very capable of running multitasking Operating Systems, something that could not be done well on the 8086. It also increased the memory bus to 24-bits, albeit still using 64K bit segments.
OS/2 1.x was the best example of an OS using 286 Protected mode, although any software using "Extended Memory" was taking advantage of the greater addressing of the 286, albeit in an inelegant way.
I stopped reading after page three, as it's just discouraging to think people are writing books without being accurate. OK, so we have the author that got it wrong, fair enough, but what about the people who are supposed to error check it. I certainly don't know everything, and I know this stuff, and it's pretty basic. No one caught this? Are you kidding me? The 286 stuff might be a bit far away, but not knowing that x86-64 first appeared in the Prescott line is really difficult to understand, and is very basic. This is made more so because of all the rumors that the processor was made to support it, but Intel was hiding it so as to not undercut the Itanium. In time, it was proven true.
Please, don't spread misinformation. Someone will repeat this stuff, and then someone else will, and it becomes 'fact' despite being wrong. If you publish a book, make a friggin effort! I'm sure I could errors the rest of the way, but it's just too annoying for me to wade through this rubbish.
By the way, the term CPU bus is an ambiguous one. The CPU has multiple buses, and if you used that term with me, I'd wonder which one you were referring to. Find a more accurate term, like PCI-E bus if that's what you are trying to say.
The Prescott introduced 64-bit to the Intel world, not the Core 2. Kind of common knowledge. The Athlon XP had a 36-bit address bus? I don't remember ever seeing that.
Then we go to the misinformation about the 8086/8088 to 386.
In actuality, there were four modes in the 80386. Real, Virtual 86, Protected 286, and Protected 386. Yup, four. And no, Windows 3.0 was not expected to run on an 8088 or 80286, because it DID use Virtual 86, which those processors could not support. You know, the part where they let you go from one DOS task to another. That was in the hardware. And that hardware started with the 80386.
Moreover, the 80286 did NOT have the same instruction set as the 8086. Only in real mode did it. And why do you suppose it was called real mode? Maybe because the addresses were not virtualized? The 80286, as mentioned above, did have virtual addresses in what was called the 80286 Protected Mode. It not only ran Real Mode apps much faster, but when in Protected Mode was very capable of running multitasking Operating Systems, something that could not be done well on the 8086. It also increased the memory bus to 24-bits, albeit still using 64K bit segments.
OS/2 1.x was the best example of an OS using 286 Protected mode, although any software using "Extended Memory" was taking advantage of the greater addressing of the 286, albeit in an inelegant way.
I stopped reading after page three, as it's just discouraging to think people are writing books without being accurate. OK, so we have the author that got it wrong, fair enough, but what about the people who are supposed to error check it. I certainly don't know everything, and I know this stuff, and it's pretty basic. No one caught this? Are you kidding me? The 286 stuff might be a bit far away, but not knowing that x86-64 first appeared in the Prescott line is really difficult to understand, and is very basic. This is made more so because of all the rumors that the processor was made to support it, but Intel was hiding it so as to not undercut the Itanium. In time, it was proven true.
Please, don't spread misinformation. Someone will repeat this stuff, and then someone else will, and it becomes 'fact' despite being wrong. If you publish a book, make a friggin effort! I'm sure I could errors the rest of the way, but it's just too annoying for me to wade through this rubbish.
By the way, the term CPU bus is an ambiguous one. The CPU has multiple buses, and if you used that term with me, I'd wonder which one you were referring to. Find a more accurate term, like PCI-E bus if that's what you are trying to say.