Designed primarily for server and workstation applications, dual Xeon systems have largely led a niche existence. Additionally, their high price made them unattractive for standard users. Dual Xeon systems also required expensive storage modules, special power packs and big, ugly cases. Now, however, the situation has changed considerably.
When we compare, for example, the price of a Pentium 4 Extreme 3.2 GHz against two Xeons with 2.8 GHz, we see that the latter option turns out to be much less expensive. A Pentium 4 Extreme costs $950, while two 2.8 GHz Xeons can be had for $760. Applications that explicitly support the dual processor environments usually operate much faster with two CPUs than with one.
Also, a lot has happened in the area of memory technology. Thanks to the introduction of the AMD Athlon FX, Registered DDR memory has clearly become cheaper; even many no-name manufacturers have switched over to it. Two 512 MB modules, for example, can already be had for $250. In addition to that, finally there are currently motherboards for the Xeon Socket 604, which can operate with unbuffered memory - provided they are based on the E7505 chipset from Intel. Until now, this market segment was dominated by the space-hogging WTX boards, but now many manufacturers also offer such systems in the usual ATX format, and a Dual Socket 604 board fits without any problems into a conventional desktop tower. The prices for such motherboards start at around $260. Due to the price situation, the dual-capable E7505/Placer chipset is an obvious choice, especially for the Xeon.

Cinema 4D with scene renderings
Even when taking HyperThreading processes into consideration, there are big everyday advantages for certain users who have a PC equipped with dual processors. As a result, software for graphics rendering, video and audio encoding and simultaneous operation of two or more calculation-intensive applications profit from the impressive increases in performance. In the area of graphics rendering, there is dual-capable software, such as 3D Studio MAX, Cinema 4D and Lightwave; in video encoding, there is, for example, MainConcept Encoder, Pinnacle Studio 9 or Flask Mpeg.
In addition to multiprocessor software usage, the user's work environment is also slowly changing. Because graphics cards often have two slots, and monitors are relatively inexpensive, many users already use two displays. Ambitious home users can tell you a thing or two about that: whoever wants to encode a video and start a game at the same time will immediately experience the limits of a single processor system. An intelligently configured dual platform reacts differently.
Here, we analyze Intel's dual-processor capable E7505/Placer chipset and offer tips for memory usage. In a subsequent article, using a self-programmed tool, we will show that increases in performance can be achieved with certain applications, as long as certain threads are not managed by an operating system but are manually assigned to a CPU. In connection with that, we have also completed a comparison test of E7505 motherboards, which will be posted soon on the website.
The Intel E7505 chipset, code named "Placer," is based on a 180-nanometer process and is designed for two processors. The chipset has the same FC-BGA package as the 875/Canterwood, therefore it also has the same number of 1,005 soldering balls.
With 143 mm2, the surface size of the silicon die seems bigger, because the 875 requires only 100 mm2. The HUB 2.0 interface and the memory controller account for the larger surface area. This also keeps the price low for the motherboard manufacturers. For the E7505, you have to pay $100 per unit in quantities of 1,000 - twice as much compared to the 875.

E7505 chipset as a block diagram

The E7505 Northbridge from Intel
The E7505 Northbridge (a.k.a. Memory Controller Hub, abbreviated as MCH) is typically bundled with the ICH4 and P64H2 Southbridge. The ICH4 is connected to the HUB 1.5 interface and clocks a speed of 66 MHz. This interface can transfer files to Northridge at a maximum speed of 266 MB per second with an 8 bit bus width.
The P64H2 bridge, on the other hand, operates at 133 MHz according to the HUB 2.0 protocol. This speed can accommodate data transfer rates of up to 1 GB per second over a 16 bit wide bus. Furthermore, the E7505 has a dual memory interface as well as an AGP 8x interface.

The ICH4 Southbridge from Intel
The ICH4 (82801DB) Southbridge, based on a 250-nm process, offers connectivity for six USB 2.0 ports, four ATA100 drives, a 100 MBit LAN chip, an AC97 sound decoder and support for a maximum of six PCI master devices, each with 133 MB per second bandwidth.

Intel's P64H2 Southbridge
Things work differently with the P64H2 (82870P2) Bridge. It was designed for the fast PCI 64 and PCI X interfaces. The PCI 64 interface corresponds to Version 2.3. Both operate in 64 bit mode. All motherboards with an E7505 chipset in the WTX format have connection possibilities for a maximum of three PCI 64 and one PCI X cards. PCI 64 operates either with 33 MHz or 66 MHz, resulting in transfer rates of between 266 MB/sec and 533 MB/sec (maximum). In comparison, the PCI X operates with 66 MHz, 100 MHz and 133 MHz. Data transfer rates of between 533 MB/sec and 1066 MB/sec are reached.

Block diagram of the E7505 chipset with P64H2 Southbridge
| Standard | Bit | Clock | Transfer rates
(bi-directional) |
|---|---|---|---|
| PCI 2.3 | 32 Bit | 33 MHz | 133 MB/sec |
| PCI 2.3 | 32 Bit | 66 MHz | 266 MB/sec |
| PCI 64 | 64 Bit | 33 MHz | 266 MB/sec |
| PCI 64 | 64 Bit | 66 MHz | 533 MB/sec |
| PCI-X 1.0 | 64 Bit | 66 MHz | 533 MB/sec |
| PCI-X 1.0 | 64 Bit | 100 MHz | 800 MB/sec |
| PCI-X 1.0 | 64 Bit | 133 MHz | 1066 MB/sec |
| PCI-X 2.0 (DDR) | 64 Bit | 133 MHz | 2132 MB/sec |
| PCI-X 2.0 (QDR) | 64 Bit | 133 MHz | 4264 MB/sec |
| PCI-Express | 1 Lines 8 Bit | 2.5 GHz | 512 MB/sec |
| PCI-Express | 2 Lines 8 Bit | 2.5 GHz | 1 GB/sec |
| PCI-Express | 4 Lines 8 Bit | 2.5 GHz | 2 GB/sec |
| PCI-Express | 8 Lines 8 Bit | 2.5 GHz | 4 GB/sec |
| PCI-Express | 16 Lines 8 Bit | 2.5 GHz | 8 GB/sec |
The HUB 2 connection offers a maximum data transfer rate of 1 GB per second between Southbridge and Northbridge, the following combination possibilities resulting for a maximum interface load for a P64H2 chip:
- 1x PCI-X 133 MHz = 1066 MB/sec
- 1x PCI-X 100 MHz = 800 MB/sec
- 2x PCI-X 66 MHz = 1066 MB/sec
- 2x PCI 64 66 MHz = 1066 MB/sec
- 3x PCI 64 33 MHz = 798 MB/sec
Standard cards do not exhaust data transfer rates of 1,066 MB per second. Only high-performance products, such as SCSI320 cards (320 MB/s) or 10 GB LAN chips (max. 1250 MB per second), would be sensible candidates.
Many PCI cards are capable of performing their services not only with conventional PCI 2.3 slots, but also with a PCI 64 slot. Examples of these are network cards, RAID controllers and even 56K modems. In order to avoid incorrect configurations, they have an additional notch on the connection contacts.

A 56K modem for a 64 bit slot
The 56K modem operates here with conventional 33 MHz in 32 bit mode.

A Promise SATA controller for a 64 bit slot
This Raid controller can handle even 66 MHz in 32 bit mode.
| Chipset | I860 | I875P | E7205 | E7505 |
|---|---|---|---|---|
| MCH | 82860 | 82875P | E7205 | E7505 |
| Codename | Colusa | Canterwood | Granite Bay | Placer |
| Developed for | Xeon DP | Pentium 4 | Pentium 4 | Xeon DP |
| Hyper Threading Support | Yes | yes | yes | yes |
| Number of supported CPUs | 1-2 | 1 | 1 | 1-2 |
| FSB | 100 MHz | 133/200 MHz | 100/133 MHz | 100/133 MHz |
| Memory modules | 4 RIMMS
(8 with MRH-R) |
4 DIMMs | 4 DIMMs | 4 DIMMs |
| Channels | Single-Channel | Dual-Channel | Dual-Channel | Dual-Channel |
| Memory type | PD800/600 RDRAM | DDR266/333/400 | DDR200/266 | DDR266 |
| Max. Memory | 4 GB (with 2 Repeaters) | 4 GB | 4 GB | 16 GB |
| Number of Rows | 32 | 8 | 4 | 6 |
| Mbit Support | 288/256
144/128 |
128/256/512 | 128/256/512 | 128/256/512
1024 |
| ECC | Yes | Yes | Yes | Yes |
| Graphic Interface | ||||
| AGP | 2x/4x (1.5) | 4x/8x (1.5V) | 1x/2x/4x (1.5V)
4x/8x (0.8V) |
1x/2x/4x (1.5V)
4x/8x (0.8V) |
| I/O HUB | ||||
| Southbridges | ICH2 (82801BA) | ICH5 (82801EB)
ICH5R (82801ER) |
ICH4 (82801DB) | ICH4 (82801DB) |
| PCI-Standard | 2.2 | 2.3 | 2.2 | 2.2 |
| PCI Master Slots (max) | 6 | 6 | 6 | 6 |
| IDE | ATA 33/66/100 | ATA 33/66/100 | ATA 33/66/100 | ATA 33/66/100 |
| SATA Support | No | 2 | No | No |
| USB Ports | 4x USB 1.1 | 8x USB 2.0 | 6x USB 2.0 | 6x USB 1.1
USB 2.0 (P64H2) |
| LAN | Yes | CSA 266 MHz | Yes | Integrated10/100 Mbit |
| AC'97 | Audio/Modem | AC'97 2.3 | Yes | AC'97 2.3 |
| Manageability | ||||
| I/O Management | SMBus/GPIO | SMBus 2.0/GPIO | SMBus/GPIO | SMBus 2.0/GPIO |
| I/O HUB (Expansion) | ||||
| PCI Controller | P64H | n/a | n/a | P64H2 |
| PCI Support | PCI 64 (2x 66 MHz) or PCI 33 (4x 33 MHz) | n/a | n/a | 2x 64Bit PCI/PCI-X
PCI max 66 MHz PCI-X max 133 MHz |
| PCI Master | 6 | n/a | n/a | 3 |
Chipset Price
| Chipset | Codename | Price per 1000 |
|---|---|---|
| E7505 | Placer | $100 |
| E7501 | Plumas 533 | $92 |
| E7500 | Plumas | $92 |
| E7205 | Granite Bay | $57 |
| I875P | Canterwood | $50 |
| I865PE | Springdale | $28 |
Because the Intel E7505 chipset always synchronizes the processor data bus with the main memory (1:1), only DDR266 memory is suitable for such a platform. As with the 875, the chipset has a dual memory controller, with which it can attain a theoretical memory performance of up to 4.2 GB per second at 133 MHz. For comparison purposes : The 875 chipset manages 6.4 GB per second on the basis of its higher speed of 200 MHz. System security, however, plays a bigger role, and that’s why Intel integrates the ECC (Error Checking and Correction) option.
Upgrade : ECC Requires An Additional Chip Per Row
Like the 875 chipset, the E7505 manages 8 rows (also called pages). Reminder : 1 memory module has either one row (single page) or two rows (double page). The following table provides a sample calculation for the respective maximum memory upgrade of the platforms (without ECC) :
| Memory expansion | Module | typical structure (non-ECC) |
|---|---|---|
| 1 GB | 2 | 4 Rows x 8 Chips x 256 MBit = 8,192 MBit |
| 2 GB | 4 | 8 Rows x 8 Chips x 256 MBit = 16,384 MBit |
| 4 GB | 4 | 8 Rows x 8 Chips x 512 MBit = 32,796 MBit |
| 8 GB | 4 | 8 Rows x 16 Chips x 512 MBit = 65,536 MBit |
| 16 GB | 4 | 8 Rows x 16 Chips x 1 GBit = 131,072 MBit |
However, if the user wants to play it safe and use modules with ECC, then he should note that an additional chip would have to be added per row. This chip is merely responsible for the proof totals and does not have any influence on the maximum memory upgrade.
| Number of possible chips without ECC | Number of possible chips with ECC |
|---|---|
| 8 | 9 |
| 16 | 18 |
Memory from Corsair with CL 2.0-3-2-6 timings
Registered and ECC memory from Mushkin with CL 2.0-3-2 timings
Registered and ECC memory from Legacy Electronics with CL 2.5 timings
DDR333 Registered and ECC memory from Infineon with CL 2.5 timings
To give you a worst case example : Modules with 16 GB ECC system memory can consist of 144 chips - an enormous burden for the Memory Controller Hub ! However, only 128 of these chips are used for actual memory functions, while the rest is used for administrative tasks.
Registered Versus Unbuffered Memory
Classical memory is always available in unbuffered versions. What’s new is registered memory (previously referred to as buffered memory). The more chips a memory controller has to manage, the less clear the data signals will be.
And now the trick : If you put a small manager in front of the nose of individual memory chips, every row/page will trick the memory controller into believing that only one chip is available. And this improves the signal quality and data security. But this comes at the cost of speed because the small register chip causes a short time delay in the electrical signals.

In Task Manager, two real and two virtual processors are shown.
As a rule, every Socket 604 Xeon CPU is suitable for HyperThreading technology. The E7505 is capable of simultaneously operating with two processors, as well as with HyperThreaded applications. As a result, four processors (two physical and two virtual) are available for the operating system. Nevertheless, the chipset has only one CPU interface, which means that both processors have to share one bus. At a speed of 133 MHz (533 MHz QDR), a bandwidth of 4.2 GB per second results. In a worst-case scenario, each virtual CPU will receive only one data flow at only 1 GB per second. However, this could have negative effects only with some OpenGL applications.

On the left, the Xeon and on the right, the P4 Northwood from Intel
The Intel Xeon (code name "Prestonia") is based on the same core as the Pentium 4 "Northwood". The latter operates with an FSB of 200 MHz (800 MHz QDR), and compiles at 6.4 GB per second. In order to balance out the up to 34% lower bandwidth with the Xeon, Intel also offers models with 1 or 2 MB L3 cache, beginning with the 2.4 GHz versions.
Prices For Current Xeon Processors
| Intel Xeon Processor (Socket 604) | |||||
|---|---|---|---|---|---|
| Processor | Codename | FSB | L2 Cache | L3 Cache | Price per 1000 |
| Xeon 2.0 GHz | Prestonia | 133 MHz | 512 kB | n/a | $198 |
| Xeon 2.4 GHz | Prestonia | 133 MHz | 512 kB | n/a | $209 |
| Xeon 2.66 GHz | Prestonia | 133 MHz | 512 kB | n/a | $256 |
| Xeon 2.8 GHz | Prestonia | 133 MHz | 512 kB | n/a | $316 |
| Xeon 3.06 GHz | Prestonia | 133 MHz | 512 kB | n/a | $455 |
| Xeon 2.4 GHz | Prestonia | 133 MHz | 512 kB | 1024 kB | $316 |
| Xeon 2.8 GHz | Prestonia | 133 MHz | 512 kB | 1024 kB | $455 |
| Xeon 3.06 GHz | Prestonia | 133 MHz | 512 kB | 1024 kB | $690 |
| Xeon 3.2 GHz | Prestonia | 133 MHz | 512 kB | 1024 kB | $581 |
| Xeon 3.2 GHz | Prestonia 2M | 133 MHz | 512 kB | 2048 kB | $1043 |
An analysis of availability and prices shows that 2.66 GHz models provide the best price-performance ratio. Intel's next step is to increase the FSB to 200 MHz (800 MHz QDR). Once again, this will mean new chipsets.

On the left, the big WTX format and on the right, the ATX format
Compared to standard ATX boards, Xeon workstation boards have considerably more units, including, for example, PCI64/X interfaces, two Southbridges, LAN chips, voltage regulators, CPU socket or an additional SCSI controller. In order to accommodate the higher number of components, larger boards in WTX standard are required. These have a 32.94% larger surface area, measuring 33 x 33.5 cm compared to the ATX boards (30.5 x 24.5 cm). Boards with a WTX form factor do not fit in a conventional home PC case. The manufacturers MSI and Tyan also offer motherboards without the additional components, such as P64H2 Bridge and LAN in an ATX format. At any rate, installing them in a conventional tower would not be a problem.
The Right Power Adapter: ATX Or EPS12V

Fully stocked Xeon system
Because we are talking here about a dual CPU platform, the processing unit's power loss is also doubled. The fastest Xeon models with a Prestonia 2-M core and 3.2 GHz speed have, as a pair, a maximum power loss of 184 watts. Added to that are board components (an average of 50 watts), a high-performance graphics card with 70 watts, and a large memory upgrade - all together, it quickly uses up 350 watts.

A 20-pole plug provides a motherboard with voltage
This overloads the power supply to the motherboard. As a result, the boards in WTX format have another power adapter standard, which goes by the name of EPS12V. They have connections with more power and mass cores, as well as wider plugs in order to distribute the load better. As with the ATX form factor, the power pins are also made of gold in order to attain a lower resistance and to therefore improve the quality of the signals.

A voltage adapter from a Tagan power adapter (TG480-U01)
With more than 350 watts, today's ATX power adapters deliver sufficient power in order to be able to supply dual systems in ATX format as well. In the meantime, there are power adapters on the market that support both ATX and EPS12V standards with the aid of a special adapter cable. Consequently, this eliminates the need for an eventual power-adapter replacement, and it saves additional expenses incurred when changing systems. Many motherboards are capable of operating with both power adapter standards.

On the left, a 24-pin WTX plug and on the right, a 20-pin ATX plug

On the left an 8-pin WTX plug and on the right, a 4-pin PWR plug
The "20/24P" marking on the large voltage connection indicates that it can operate with the 24-pin WTX as well as with the 20-pin ATX memory. The same applies as well to the "12V-8/4P" marking on the small AUX connection - it supports the 8-pin as well as the 4-pin connections. Each of the four missing leads is a redundant voltage pin for load sharing.

The various allocations of ATX and WTX plugs
For the power adapter with the EPS12V standard, additional +12V, +3.3V, +5 V and mass leads are connected to the board.
AGP: Support For All Cards
The E7505 Northbridge offers support for AGP graphics cards, while most motherboards have a "Pro" slot. With the Pro versions, the card is supplied with additional voltage pins.
| Signaling Level | |||
|---|---|---|---|
| Data Rate | AGP 3.0 | 1.5 V | 3.3 V |
| PCI-66 | Yes | Yes | No |
| 1 x AGP | No | Yes | No |
| 2 x AGP | No | Yes | No |
| 3 x AGP | Yes | Yes | No |
| 4 x AGP | Yes | No | No |
Support for the 3.0 standard is also offered, and all graphics cards available on the market can be used without any problem.
| Intel Processors (Socket 604) | |
|---|---|
| 133 MHz FSB (DUAL DDR266) | Intel Xeon 3.06 GHz (3066 MHz, 12-8/512/1024 kB) |
| Intel Processors (Socket 478) | |
| 133 MHz FSB (DUAL DDR266) | Pentium 4 3.06 GHz (3066 MHz, 12-8/512 kB) |
| 200 MHz FSB (DUAL DDR400) | Pentium 4 3.2E GHz (3200 MHz, 12-8/1024 kB) |
| 200 MHz FSB (DUAL DDR400) | Pentium 4EE 3.2 GHz (3400 MHz, 12-8/512/2048 kB) |
| Memory | |
| DDR400 (200 MHz) | 2 x 512 MB / 5ns / 64 Bit (Corsair)
CMX512-3200LL (CL 2.0-3-2-6) |
| DDR400 (200 MHz) | 2 x 512 MB / 5ns / 64 Bit (Mushkin) REG ECC
MS64D64020U-5 (CL 2.0-3-2-6) |
| Common Hardware | |
| Sound Card | Terratec Aureon 7.1 Space
96.00 kHz sample rate |
| Graphics Card | Asus A9800XT/TVD, Rev. 1.01
GPU : ATI Radeon 9800XT, 412 MHz Chip Clock Memory : 256 MB DDR-SDRAM, 365 MHz Chip Clock |
| Hard Drive | FastTrak S150 TX2plus (Bios : 1.00.0.30)
2 x SATA Maxtor 6Y080M0 (Raid 0) 80 GB / 8 MB Cache / 7200 rpm |
| DVD/CD-ROM | MSI MS-8216 16x DVD |
| Software | |
| Chipset | Chipset Installation Utility Ver. 5.1.1.1002
IAA RAID Edition 3.5.3 |
| Graphics | ATI Catalyst XP 4.3 (Driver 6.14.10.6430) |
| Promise RAID | 1.00.0.37 |
| DirectX | Version : 9b |
| OS | Windows XP, Build 2600 SP1 (English) |
| OpenGL | |
|---|---|
| Quake III Team Arena | Version 1.32
1024x768 - 32 bit Timedemo1 / demo thg3 "custom timedemo" Graphics detail = Normal |
| DirectX 9a | |
| 3DMark 2003 | Version 3.4.0
Graphics and CPU Default Benchmark 1024 x 786 - 32 bit |
| Video | |
| Mainconcept MPEG Encoder | Version 1.4.1
1.2 GB DV to MPEG II (720x576, Audio) converting |
| Pinnacle Studio 9 | Version: 9.0.0
Rendering - DVD Compatible no Audio |
| Windows Media Encoder 9 | Version: 9.00.00.2980
436 MB AVI File conversion to WMV Windows Media Server (Streaming) |
| Microsoft Movie Maker | Version 2.0.3312.0
416 MB DV to WMV |
| TMPGEnc Plus | Version 2.521
1.2 GB DV to MPEG I (720x576, Audio) converting |
| Audio | |
| magix mp3 maker 2004 | Version 4.11 Build 19593 |
| Syntrillum Cool Edit Pro | Version 2.1
Amplitude Normalizing 2.6 GB Wave Audio file |
| Applications | |
| 3D Studio Max 6.0 | Rendering Single, 1024x768 |
| Newtek Lightwave | Version 7.5c - Build 572
Render First Frame = 1 Render Last Frame = 60 Render Frame Step = 1 Rendering Bench "variation.lws" Show Rendering in Progress = 320x240 Ray Trace Shadows, Reflection, Refraction, Transparency = on Multithreading = 8 Threads |
| Maxon Cinema 4D XL 8 | Version 8.503
Rendering in 1024x768, "ship_dirt" |
| Microsoft Visual Studio .NET | Version 2003 (Enterprise Architect)
Visual C++: compiling Emule 0.42b |
| LIUtilities WinBackup | Version 1.84
650 MB Wave file Encryption: 256 Bit DES, Password "test" |
| Synthetic | |
| PCMark 2004 Pro | Build 1.1.0
CPU and Memory Tests |
| SiSoftware Sandra 2004 | Version 2004.10.9.89
CPU Test: CPU Multimedia / CPU Arithmetic Memory Test: Memory Bandwidth Benchmark |
In the following benchmarks, the differences in performance can be seen between a dual platform and a "normal" Pentium 4 in single operation.
OpenGL

DirectX 9a







Audio







Synthetic





Conclusion
Applications already optimized for HyperThreading see performance gains from the use of two physical CPUs. In view of system costs, it is therefore worthwhile for users to go with a Dual Xeon as their next system if most of their time is spent rendering or encoding.
In the subsequent articles, we will show how the various E7505 motherboards measure up in a head-to-head comparison. We will also soon publish an article on how to increase performance by using a self-programmed tool, which can assign tasks to certain CPUs. This tool takes away automatic task assignments from the operating system and forces an application to run on a manually-specified CPU.