Every modern processor comes with a dedicated cache that holds processor instructions and data meant for almost immediate use. This is referred to as the first level cache, or L1, and it first appeared on the 486DX processor. Recently, AMD processors standardized on 64KB of L1 per core while Intel processors use 32KB of dedicated data and instruction L1 cache.
The first level caches from Intel were introdoced on the 486DX and are still an integral part of its microprocessors today.
The second level cache (L2) has been available on all processors since the Pentium III, although the first on-chip implementation arrived with the Pentium Pro (not on die, though). Today’s processors offer up to 6MB of L2 cache on-die. This is the amount you’ll find being shared between the two cores on Intel’s Core 2 Duo, for example. Typical L2 cache configurations usually offer 512KB or 1MB cache per core. Processors with less L2 cache are often found in lower-end products. Here is an overview on early L2 cache configurations:
Pentium Pro had L2 cache on the processor. The following Pentium III and Athlon generation implemented L2 cache through surface-mounted SRAM chips common at that time (1998, 1999).
The introduction of 180nm manufacturing processes allowed manuacturers to finally integrate L2 caches within the processor die.
The first quad-core processors simply utilized existing designs and duplicated them. AMD did this on one die and added the memory controller and a crossbar switch, while Intel simply placed two single-core dies into a processor package to create the first dual-core.
The first cache that was shared between two cores was the Core 2 Duo's L2. AMD labored away and created its Phenom quad-core from scratch, while Intel decided once again to pair two dies—this time two Core 2 dual-cores—in an effort to create economical quad-cores.
Third level cache has existed since the early days of Alpha’s 21165 (96KB, released in 1995) or IBM’s Power 4 (256KB, 2001). However, it wasn’t until the advent of Intel’s Itanium 2, the Pentium 4 Extreme (Gallatin, both in 2003), and the Xeon MP (2006) that L3 caches were used on x86 and related architectures.
First implementations represented just an additional level, while recent architectures provide the L3 cache as a large and shared data buffer on multi-core processors. The high associativity underlines this. It’s preferable to seach a little longer inside the cache memory than have several cores trigger slow memory accesses. AMD was first to introduce L3 cache on a desktop product, namely the Phenom family. The 65nm Phenom X4 offered 2MB of shared L3 cache, while the current 45nm Phenom II X4 comes with 6MB of shared L3. Intel’s Core i7 and i5 both feature 8MB of L3 cache.
The latest quad-core processors come with dedicated L1 and L2 caches for each core and a larger, shared L3 cache available for all cores. This shared L3 is also able to exchange data the cores might be working on in parallel.