AMD is using the same die for its desktop, 1-2P server, and 1-4P server processors, code-named Zambezi, Valencia, and Interlagos, respectively. Of course that’s not a new approach; both AMD and Intel regularly leverage similar silicon in multiple markets.
This first incarnation of AMD’s newest micro-architecture includes four Bulldozer modules, facilitating eight cores. You can pretty easily do the math from our architecture break-down to figure out the SoC’s vitals: 128 KB of total L1 data cache (16 KB per core x 8), 256 KB total L1 instruction cache (64 KB shared per module x 4), and 8 MB of L2 cache (2 MB per module x 4).
Also on-die is 8 MB of shared L3 cache, split between four 2 MB slices of memory. Now, you typically don’t see a 1:1 ratio between L2 and L3. But AMD says that this cache structure is the result of performance modeling that demonstrated optimal performance. Moreover, AMD says that, while the L3 has a modest effect on desktop performance, it’s more impactful in the server space.
If you go one generation back, Phenom II had 512 KB of L2 cache per core and a shared 6 MB L3. Sandy Bridge sports 256 KB per core and up to 8 MB of shared L3. That pyramid-shaped hierarchy is ideal for feeding each level as quickly as possible. So, despite AMD’s assurances that Bulldozer is organized correctly, 8 MB of total L2 and 8 MB of L3 still doesn’t fit into that mold of what we’re taught to expect.
One key difference is that Bulldozer’s L3 cache is exclusive (just like Phenom II), whereas Sandy Bridge’s is inclusive. That means that Bulldozer doesn’t require data to be in L2 and L3 at the same time. As a result, the FX’s caches are able to store more data—information in L3 is simply further away from the cores.
Sandwiched between two of the 2 MB L3 cache slices is Bulldozer’s integrated northbridge, responsible for managing communication between the L3 cache, both 72-bit DDR3 memory channels, and as many as four 16-bit HyperTransport links. On the desktop, that northbridge runs at up to 2.2 GHz. The server parts will also sport 2.0 and 2.2 GHz northbridges.
As you can see in the block diagram, the northbridge’s system request queue and crossbar are tasked with taking transactions from a Bulldozer module, checking the L3 cache, directing them out to the memory controller, and then sending data back to the module that requested it. In addition to those subsystems, the northbridge also handles transactions from the chipset and other sockets (in multi-processor configurations).
All of the desktop processors based on the Bulldozer architecture center on Zambezi. AMD makes these chips Socket AM3+-compatible. Again, it’s not supporting AM3-based platforms. AM3+ adds CPU voltage loadline support, higher current to enable a faster HyperTransport link, and increased current to support higher-speed memory.
To that point, whereas previous-generation desktop processors officially topped out at DDR3-1333, Zambezi supports two channels of DDR3-1866. The HyperTransport link also enjoys a speed-up, from 2 GHz (4 GT/s) up to 2.6 GHz (5.2 GT/s).
The Valencia part is designed for one- and two-processor servers. Of course, it’s the exact same piece of silicon as Zambezi. Only its infrastructure is different.
AMD designed this to be a drop-in replacement for the 45 nm Lisbon-based Opteron 4000-series processors. So, existing Socket C32-based motherboards will accommodate the new chips after a BIOS update.
Valencia supports the same two-channel memory controller as Zambezi, but because of its enterprise-oriented emphasis, it takes unregistered DIMMs, registered DIMMs, and load-reduced DIMMs running at up to DDR3-1600. Its three HyperTransport links operate at a snappier 6.4 GT/s to better address traffic between sockets.
Although most references to Interlagos call it a 16-core processor, it’s actually a dual-die multi-chip module designed for one- to four-socket servers. The very same eight-core SoC we’ve been talking about is attached to a slave die via HyperTransport, leaving four externally-facing HyperTransport links running at up to 6.4 GT/s for devices like chipsets.
Combined, the two dies offer up to 16 MB of L2 and L3 cache, plus four total memory channels that take unregistered DIMMs, registered DIMMs, and load-reduced DIMMs operating at up to DDR3-1866.
It’s a drop-in replacement for the 45 nm Magny-Cours Opteron 6000-series processors. So, existing Socket G34-based motherboards should have no trouble recognizing Interlagos after a BIOS update.
- AMD Sets The Stage For FX’s Performance
- Platform Support For FX: Make Sure It’s AM3+
- The Idea Behind AMD’s Bulldozer
- A Shared Front-End And Dual Integer Cores
- Single Floating-Point Unit, AVX Performance, And L2
- Per-Core Performance
- Power Management
- Enabling Turbo Core
- AMD’s Roadmap Through 2014
- Meet AMD Zambezi, Valencia, And Interlagos
- Hardware Setup And Benchmarks
- Benchmark Results: PCMark 7
- Benchmark Results: 3DMark 11
- Benchmark Results: Sandra 2011
- Benchmark Results: Content Creation
- Benchmark Results: Productivity
- Benchmark Results: Media Encoding
- Benchmark Results: Crysis 2
- Benchmark Results: F1 2011
- Benchmark Results: World of Warcraft: Cataclysm
- Overclocking FX-8150 (On Air)
- Power Consumption
- Sneak Peek: AMD’s Bulldozer Architecture On Windows 8
- AMD FX-8150: The Bottom Line