AMD's 4x4 Platform & Athlon 64 FX-70 - Brute Force Quad Cores

Who Needs Two Processors And Four Cores Anyway?

AMD targets the top of the line enthusiast and the real power user. No, not just "real", AMD means "real real". The buzzword for this mega platform concept is "megatasking", which AMD defines as follows:

  • Running multiple, heavily multi-threaded, processor-intensive applications simultaneously
  • Running digital media applications
  • Playing future multi-core games

The AMD Advantage

Most of the tasks described above will run properly on a powerful dual core processor today, but not necessarily in the future. While the general rule for multi-threaded environments is "the more cores the better", an increasing core count - four and more cores per processor - will create tremendous demand for processor interface bandwidth. Large amounts of data will need to be transferred to and fetched from the main memory as quickly as possible.

Intel will upgrade its 266 MHz quad-pumped FSB1066 processor Front Side Bus to 333 MHz (FSB1333) early next year with its Bearlake chipset (X38 for enthusiasts and P35 for upper mainstream). This which will provide a faster data path for dual and quad core CPUs to communicate with each other and with the main memory and motherboard core logic.

However, AMD has always had an advantage when it comes to memory efficiency, because each Athlon 64 processor has its own integrated, high-speed, low-latency memory controller. The first Athlon 64 processors supported DDR400 (Socket 754) or dual channel DDR400 memory (Socket 939). The current generation of Athlon 64 and Athlon 64 X2 processors on Socket AM2 runs dual channel DDR2-800 memory. A CPU core can fetch requested data directly, without the detour through a memory controller that is part of the motherboard core logic.

Why is this important? Well, it clearly doesn't seem to be today at least. This is mostly because Intel has the 65 nm manufacturing process advantage, which allows the company to fit larger L2 caches into the processor. Large and efficient caches allow for adequate performance, because they're capable of compensating for a memory subsystem that might not be state of the art. Efficient and shared caches (used by two or more cores at the same time) also help to reduce memory access.

As soon as Microsoft Vista arrives, things will change a bit. Windows XP is unable to tell physical processors apart, and it will distribute the processing of threads to all CPU cores as they are available. Whether the cores are located on one, two or multiple CPUs is irrelevant here, which can cause inter-processor thread-switching. This should be avoided, as it might entail the relocation of thread data as well - isn't that a nice new bottleneck!

Threads that were processed on one physical core might be handled by a different unit the next time they are being executed. If you now think of an Intel quad core created by putting two dual core processors into one physical processor package, you will realize that the Front Side Bus is the only way for inter-processor communication and main memory access.

In a worst case Windows XP scenario, processing unit A has to wait until unit B completes memory access. Then A accesses the memory to fetch data, which it stores in its L2 cache to provide it locally for processing. If, however, Windows assigns the thread to CPU B, it will have to fetch the current data from A's L2 cache, causing additional Front Side Bus traffic. For coherency and performance reasons, data cannot be fetched from the main memory again at this point, since it was already processed. In the end, all of the elements involved are slowed down by this maneuver.

Windows Vista Ultimate Edition will be able to tell processors or nodes apart from simple processing cores. This allows the operating system to assign threads in a more resource-efficient manner: one large task can be executed exclusively on CPU A, while another huge workload runs autonomously on CPU B. Inter-processor task switching is eliminated due to the enhanced hardware awareness of Vista Ultimate, and performance will scale much better with increased core count per processor.

Now, let's go back to the beginning: why is this so important? Remember that an Athlon 64 processor has its own memory controller, and thus a higher memory bandwidth per processing core. As soon as intelligence is added to the multi-threading/multi-tasking game, AMD's Quad FX platform will be able to show some serious muscle that it cannot in today's operating environments. Again, though, you have to bear in mind that we're talking about serious workloads that exceed what you and I typically do with our PCs.