AMD's Athlon 64 Has Arrived: the Athlon 64 FX and Athlon 64 (and Intel's P4 Extreme)

Athlon XP-64 Core: 95 Percent Athlon, Continued

The real innovations are in the details. At the heart of the CPU is a crossbar switch (XBAR), which manages the data streams between the memory controller, CPU core and the three HyperTransport ports. Compared to the Athlon 64, which is only meant for single-processor operation, the Opteron has a controller logic that allows multi-processor operation. Thus, when used for a server, up to eight Opterons can work together without a Northbridge. Furthermore, an SSE2-compatible unit has been added, which has twice the amount of registers (16) as the Intel P4. Fundamental changes are to be found at the command processing level: the Transition Look-aside Buffers (TLB) have been reworked for larger workloads (1000 entries max.). Basically, the more entries in the TLB, the less frequently the translation tables have to be accessed in system memory when transmitting the physical address.

The fundamental structure of the Hammer is not much different from the Athlon: the 3 integer and 3 floating point units have remained as unchanged as the three x86 decoders. The caches now have ECC circuitry. The crucial changes are in the details.

Swipe to scroll horizontally
CPU coreHammerBartonThoroughbred "B"
Wafer area (200 mm diameter)31416 mm²31416 mm²31416 mm²
Die area193 mm²101 mm²84 mm²
Process technology0.13 µm0.13 µm0.13 µm
Waste18 percent18 percent18 percent
Theoretical maximum yield122 pcs/wafer255 pcs/wafer306 pcs/wafer
Yield at 60% rate73 pcs/wafer153 pcs/wafer183 pcs/wafer
Swipe to scroll horizontally
CPU coreThoroughbred "A"PalominoThunderbird
Wafer area (200 mm diameter)31416 mm²31416 mm²31416 mm²
Die area80 mm²128 mm²128 mm²
Process technology0.13 µm0.18 µm0.18 µm
Waste18 percent18 percent18 percent
Theoretical maximum yield322 pcs/wafer201 pcs/wafer201 pcs/wafer
Yield at 60% rate193 pcs/wafer120 pcs/wafer120 pcs/wafer

Ultimately, that saves time so that commands require less speed. Compared to the Thoroughbred and Barton cores, the TLBs work with a smaller latency time, which, in turn, leads to an increase in speed. The branch prediction was also reworked in that the History Counter stores up to 16,000 entries (Athlon XP - 4,000). In order to accommodate higher clock speeds, AMD extended the pipeline of the Hammer to 12 stages - the old Athlon has only 10 stages, while the current Intel P4 (and Xeon) uses 20 stages. This allows the execution units to be supplied more quickly with consecutive commands, and does away with wait states.

A significant new feature: the extended 64 bit register.