As processor core speeds increased, memory speeds could not keep up. How could you run a processor faster than the memory from which you fed it without having performance suffer terribly? The answer was cache. In its simplest terms, cache memory is a high-speed memory buffer that temporarily stores data the processor needs, allowing the processor to retrieve that data faster than if it came from main memory. But there is one additional feature of a cache over a simple buffer, and that is intelligence. A cache is a buffer with a brain.
A buffer holds random data, usually on a first-in, first-out basis or a first-in, last-out basis. A cache, on the other hand, holds the data the processor is most likely to need in advance of it actually being needed. This enables the processor to continue working at either full speed or close to it without having to wait for the data to be retrieved from slower main memory. Cache memory is usually made up of static RAM (SRAM) memory integrated into the processor die, although older systems with cache also used chips installed on the motherboard.
Recent low-cost processor designs typically include two levels of processor/memory cache: Level 1 (L1) and Level 2 (L2). Mid-range and high-end designs also have Level 3 cache. These caches and their functioning are described in the following sections.
Use the popular CPU-Z utility discussed earlier in this chapter to determine the types and sizes of cache memory in your computer’s CPUs.
Internal Level 1 Cache
All modern processors starting with the 486 family include an integrated L1 cache and controller. The integrated L1 cache size varies from processor to processor, starting at 8 KB for the original 486DX and now up to 128 KB or more in the latest processors.
NoteMulti-core processors include separate L1 caches for each processor core. Also, L1 cache is divided into equal amounts for instructions and data.
To understand the importance of cache, you need to know the relative speeds of processors and memory. The problem with this is that processor speed usually is expressed in MHz or GHz (millions or billions of cycles per second), whereas memory speeds are often expressed in nanoseconds (billionths of a second per cycle). Most newer types of memory express the speed in either MHz or in megabyte per second (MB/s) bandwidth (throughput).
Both are really time- or frequency-based measurements. You will note that a 233 MHz processor equates to 4.3-nanosecond cycling, which means you would need 4 ns memory to keep pace with a 200 MHz CPU. Also, note that the motherboard of a 233 MHz system typically runs at 66 MHz, which corresponds to a speed of 15 ns per cycle and requires 15 ns memory to keep pace. Finally, note that 60 ns main memory (common on many Pentium-class systems) equates to a clock speed of approximately 16 MHz. So, a typical Pentium 233 system has a processor running at 233 MHz (4.3 ns per cycle), a motherboard running at 66 MHz (15 ns per cycle), and main memory running at 16 MHz (60 ns per cycle). This might seem like a rather dated example, but in a moment, you will see that the figures listed here make it easy for me to explain how cache memory works.
Because L1 cache is always built into the processor die, it runs at the full-core speed of the processor internally. By full-core speed, I mean this cache runs at the higher clock multiplied internal processor speed rather than the external motherboard speed. This cache basically is an area of fast memory built into the processor that holds some of the current working set of code and data. Cache memory can be accessed with no wait states because it is running at the same speed as the processor core.
Using cache memory reduces a traditional system bottleneck because system RAM is almost always much slower than the CPU; the performance difference between memory and CPU speed has become especially large in recent systems. Using cache memory prevents the processor from having to wait for code and data from much slower main memory, thus improving performance. Without the L1 cache, a processor would frequently be forced to wait until system memory caught up.
Cache is even more important in modern processors because it is often the only memory in the entire system that can truly keep up with the chip. Most modern processors are clock multiplied, which means they are running at a speed that is really a multiple of the motherboard into which they are plugged. The only types of memory matching the full speed of the processor are the L1, L2, and L3 caches built into the processor core.
If the data that the processor wants is already in L1 cache, the CPU does not have to wait. If the data is not in the cache, the CPU must fetch it from the Level 2 or Level 3 cache or (in less sophisticated system designs) from the system bus—meaning main memory directly.