Duel of the Titans: Opteron vs. Xeon

Details On The Opteron Core: An Enhanced Athlon, Continued

The really cool stuff is in the details of the chip. At the heart of the CPU is a crossbar switch (XBAR), which manages the data streams between the memory controller, CPU core and the three Hypertransport ports. Compared to the Athlon 64, which is only meant for single-processor operation, the Opteron has a controller logic that allows multi-processor operation. Thus, when used for a server, up to eight Opterons can work together without a Northbridge.

Furthermore, an SSE2-compatible unit has been added, which has twice the amount of registers (16) as the Intel P4. Fundamental changes are to be found at the command processing level: the Transition Look-aside Buffers (TLB) have been reworked for larger workloads (1000 entries max.). Basically, the more entries in the TLB, the less frequently the translation tables have to be accessed in system memory when transmitting the physical address.

The fundamental structure of the Opteron is not much different from the Athlon: the 3 ALUs and 3 AGUs are now capable of 64 bit wide arithmetic. The caches now have ECC circuitry. A lot more changes are can be found in the details.

Swipe to scroll horizontally
CPU coreHammerBartonThoroughbred "B"
Wafer surface (200 mm diameter)31416 mm²31416 mm²31416 mm²
Die surface193 mm²101 mm²84 mm²
Process technology0.13 µm0.13 µm0.13 µm
Waste (approx)18%18%18%
Yield (theoretical)148 units/wafer255 units/wafer306 units/wafer
Yield (at 60% yield rate)89 units/wafer153 units/wafer183 units/wafer
Swipe to scroll horizontally
CPU coreThoroughbred "A"PalominoThunderbird
Wafer surface (200 mm diameter)31416 mm²31416 mm²31416 mm²
Die surface80 mm²128 mm²128 mm²
Process technology0.13 µm0.18 µm0.18 µm
Waste (approx)18%18%18%
Yield (theoretical)322 units/wafer201 units/wafer201 units/wafer
Yield (at 60% yield rate)193 units/wafer120 units/wafer120 units/wafer

Compared to the Thoroughbred and Barton cores, the TLBs work with a smaller latency time, which, in turn, leads to an increase in speed. The branch prediction was also reworked in that the History Counter stores up to 16,000 entries (Athlon XP - 4,000).

In order to accommodate the 64 bit instruction set, AMD extended the pipeline of the Opteron to 12 stages - AthlonXP has only 10 stages, while the high speed architecture of the Intel P4 (and Xeon) uses 20 stages.