Just as with the L1 cache, most L2 caches have a hit ratio also in the 90% range; therefore, if you look at the system as a whole, 90% of the time it runs at full speed (233 MHz in this example) by retrieving data out of the L1 cache. Ten percent of the time it slows down to retrieve the data from the L2 cache. Ninety percent of the time the processor goes to the L2 cache, the data is in the L2, and 10% of that time it has to go to the slow main memory to get the data because of an L2 cache miss. So, by combining both caches, our sample system runs at full processor speed 90% of the time (233 MHz in this case), at motherboard speed 9% (90% of 10%) of the time (66 MHz in this case), and at RAM speed about 1% (10% of 10%) of the time (16 MHz in this case). You can clearly see the importance of both the L1 and L2 caches; without them the system uses main memory more often, which is significantly slower than the processor.
This brings up other interesting points. If you could spend money doubling the performance of either the main memory (RAM) or the L2 cache, which would you improve? Considering that main memory is used directly only about 1% of the time, if you doubled performance there, you would double the speed of your system only 1% of the time! That doesn’t sound like enough of an improvement to justify much expense. On the other hand, if you doubled L2 cache performance, you would be doubling system performance 9% of the time, which is a much greater improvement overall. I’d much rather improve L2 than RAM performance. The same argument holds true for adding and increasing the size of L3 cache, as many recent processors from AMD and Intel have done.
The processor and system designers at Intel and AMD know this and have devised methods of improving the performance of L2 cache. In Pentium (P5) class systems, the L2 cache usually was found on the motherboard and had to run at motherboard speed. Intel made the first dramatic improvement by migrating the L2 cache from the motherboard directly into the processor and initially running it at the same speed as the main processor. The cache chips were made by Intel and mounted next to the main processor die in a single chip housing. This proved too expensive, so with the Pentium II, Intel began using cache chips from third-party suppliers such as Sony, Toshiba, NEC, and Samsung. Because these were supplied as complete packaged chips and not raw die, Intel mounted them on a circuit board alongside the processor. This is why the Pentium II was designed as a cartridge rather than what looked like a chip.
One problem was the speed of the available third-party cache chips. The fastest ones on the market were 3 ns or higher, meaning 333 MHz or less in speed. Because the processor was being driven in speeds above that, in the Pentium II and initial Pentium III processors, Intel had to run the L2 cache at half the processor speed because that is all the commercially available cache memory could handle. AMD followed suit with the Athlon processor, which had to drop L2 cache speed even further in some models to two-fifths or one-third the main CPU speed to keep the cache memory speed less than the 333 MHz commercially available chips.
Then a breakthrough occurred, which first appeared in Celeron processors 300A and above. These had 128 KB of L2 cache, but no external chips were used. Instead, the L2 cache had been integrated directly into the processor core just like the L1. Consequently, both the L1 and L2 caches now would run at full processor speed, and more importantly scale up in speed as the processor speeds increased in the future. In the newer Pentium III, as well as all the Xeon and Celeron processors, the L2 cache runs at full processor core speed, which means there is no waiting or slowing down after an L1 cache miss. AMD also achieved full-core speed on-die cache in its later Athlon and Duron chips. Using on-die cache improves performance dramatically because 9% of the time the system uses the L2. It now remains at full speed instead of slowing down to one-half or less the processor speed or, even worse, slowing down to motherboard speed as in Socket 7 designs. Another benefit of on-die L2 cache is cost, which is less because fewer parts are involved. L3 on-die caches offer the same benefits for those times when L1 and L2 cache do not contain the desired data. And, because L3 cache is much larger than L2 cache (6 MB in AMD Phenom II and 12 MB in Core i7 Extreme Edition), the odds of all three cache levels not containing the information desired are reduced over processors which have only L1 and L2 cache. Let’s revisit the restaurant analogy using a 3.6 GHz processor. You would now be taking a bite every half second (3.6 GHz = 0.28 ns cycling). The L1 cache would also be running at that speed, so you could eat anything on your table at that same rate (the table = L1 cache). The real jump in speed comes when you want something that isn’t already on the table (L1 cache miss), in which case the waiter reaches over to the cart (which is now directly adjacent to the table) and nine out of 10 times is able to find the food you want in just over one-quarter second (L2 speed = 3.6 GHz or 0.28 ns cycling). In this system, you would run at 3.6 GHz 99% of the time (L1 and L2 hit ratios combined) and slow down to RAM speed (wait for the kitchen) only 1% of the time, as before. With faster memory running at 800 MHz (1.25 ns), you would have to wait only 1.25 seconds for the food to come from the kitchen. If only restaurant performance would increase at the same rate processor performance has!
- Processor Specifications Explained
- Data I/O Bus, Address Bus, And Internal Registers
- Processor Modes: Real Mode
- IA-32 Mode: 32-Bit And Virtual Real
- IA-32e 64-Bit Extension Mode (x64, AMD64, x86-64, EM64T)
- Processor Benchmarks And Comparing Performance
- Processor Efficiency
- Cache Memory
- How Cache Works
- Level 2 And Level 3 Cache
- Cache Performance And Design
- Cache Organization
Shouldn't that be "Foreword?"
Should be 10^2, 10^3, and in the next para, 2^x.
This was very interesting, considering both instructions were supported even by the humble 8086.
These sections seem more or less unchanged, except for the mention of Ivy and Vishera, and i think the CPU-z screenshots are new as well.
This was very interesting, considering both instructions were supported even by the humble 8086.
https://en.wikipedia.org/wiki/X86-64#Older_implementations
Yet at the very least the 80386 supported them:
http://css.csail.mit.edu/6.858/2011/readings/i386/LAHF.htm
So it appears that it was an early-64 bit CPU issue only.
The Prescott introduced 64-bit to the Intel world, not the Core 2. Kind of common knowledge. The Athlon XP had a 36-bit address bus? I don't remember ever seeing that.
Then we go to the misinformation about the 8086/8088 to 386.
In actuality, there were four modes in the 80386. Real, Virtual 86, Protected 286, and Protected 386. Yup, four. And no, Windows 3.0 was not expected to run on an 8088 or 80286, because it DID use Virtual 86, which those processors could not support. You know, the part where they let you go from one DOS task to another. That was in the hardware. And that hardware started with the 80386.
Moreover, the 80286 did NOT have the same instruction set as the 8086. Only in real mode did it. And why do you suppose it was called real mode? Maybe because the addresses were not virtualized? The 80286, as mentioned above, did have virtual addresses in what was called the 80286 Protected Mode. It not only ran Real Mode apps much faster, but when in Protected Mode was very capable of running multitasking Operating Systems, something that could not be done well on the 8086. It also increased the memory bus to 24-bits, albeit still using 64K bit segments.
OS/2 1.x was the best example of an OS using 286 Protected mode, although any software using "Extended Memory" was taking advantage of the greater addressing of the 286, albeit in an inelegant way.
I stopped reading after page three, as it's just discouraging to think people are writing books without being accurate. OK, so we have the author that got it wrong, fair enough, but what about the people who are supposed to error check it. I certainly don't know everything, and I know this stuff, and it's pretty basic. No one caught this? Are you kidding me? The 286 stuff might be a bit far away, but not knowing that x86-64 first appeared in the Prescott line is really difficult to understand, and is very basic. This is made more so because of all the rumors that the processor was made to support it, but Intel was hiding it so as to not undercut the Itanium. In time, it was proven true.
Please, don't spread misinformation. Someone will repeat this stuff, and then someone else will, and it becomes 'fact' despite being wrong. If you publish a book, make a friggin effort! I'm sure I could errors the rest of the way, but it's just too annoying for me to wade through this rubbish.
By the way, the term CPU bus is an ambiguous one. The CPU has multiple buses, and if you used that term with me, I'd wonder which one you were referring to. Find a more accurate term, like PCI-E bus if that's what you are trying to say.
The Prescott introduced 64-bit to the Intel world, not the Core 2. Kind of common knowledge. The Athlon XP had a 36-bit address bus? I don't remember ever seeing that.
Then we go to the misinformation about the 8086/8088 to 386.
In actuality, there were four modes in the 80386. Real, Virtual 86, Protected 286, and Protected 386. Yup, four. And no, Windows 3.0 was not expected to run on an 8088 or 80286, because it DID use Virtual 86, which those processors could not support. You know, the part where they let you go from one DOS task to another. That was in the hardware. And that hardware started with the 80386.
Moreover, the 80286 did NOT have the same instruction set as the 8086. Only in real mode did it. And why do you suppose it was called real mode? Maybe because the addresses were not virtualized? The 80286, as mentioned above, did have virtual addresses in what was called the 80286 Protected Mode. It not only ran Real Mode apps much faster, but when in Protected Mode was very capable of running multitasking Operating Systems, something that could not be done well on the 8086. It also increased the memory bus to 24-bits, albeit still using 64K bit segments.
OS/2 1.x was the best example of an OS using 286 Protected mode, although any software using "Extended Memory" was taking advantage of the greater addressing of the 286, albeit in an inelegant way.
I stopped reading after page three, as it's just discouraging to think people are writing books without being accurate. OK, so we have the author that got it wrong, fair enough, but what about the people who are supposed to error check it. I certainly don't know everything, and I know this stuff, and it's pretty basic. No one caught this? Are you kidding me? The 286 stuff might be a bit far away, but not knowing that x86-64 first appeared in the Prescott line is really difficult to understand, and is very basic. This is made more so because of all the rumors that the processor was made to support it, but Intel was hiding it so as to not undercut the Itanium. In time, it was proven true.
Please, don't spread misinformation. Someone will repeat this stuff, and then someone else will, and it becomes 'fact' despite being wrong. If you publish a book, make a friggin effort! I'm sure I could errors the rest of the way, but it's just too annoying for me to wade through this rubbish.
By the way, the term CPU bus is an ambiguous one. The CPU has multiple buses, and if you used that term with me, I'd wonder which one you were referring to. Find a more accurate term, like PCI-E bus if that's what you are trying to say.