Sign in with
Sign up | Sign in
Two Xeon CPUs Are Better Than One Intel P4 Extreme Platform
By ,
1. A Dual Vs. Single Processor Price Comparison

Designed primarily for server and workstation applications, dual Xeon systems have largely led a niche existence. Additionally, their high price made them unattractive for standard users. Dual Xeon systems also required expensive storage modules, special power packs and big, ugly cases. Now, however, the situation has changed considerably.

When we compare, for example, the price of a Pentium 4 Extreme 3.2 GHz against two Xeons with 2.8 GHz, we see that the latter option turns out to be much less expensive. A Pentium 4 Extreme costs $950, while two 2.8 GHz Xeons can be had for $760. Applications that explicitly support the dual processor environments usually operate much faster with two CPUs than with one.

Also, a lot has happened in the area of memory technology. Thanks to the introduction of the AMD Athlon FX, Registered DDR memory has clearly become cheaper; even many no-name manufacturers have switched over to it. Two 512 MB modules, for example, can already be had for $250. In addition to that, finally there are currently motherboards for the Xeon Socket 604, which can operate with unbuffered memory - provided they are based on the E7505 chipset from Intel. Until now, this market segment was dominated by the space-hogging WTX boards, but now many manufacturers also offer such systems in the usual ATX format, and a Dual Socket 604 board fits without any problems into a conventional desktop tower. The prices for such motherboards start at around $260. Due to the price situation, the dual-capable E7505/Placer chipset is an obvious choice, especially for the Xeon.

Cinema 4D with scene renderings

Even when taking HyperThreading processes into consideration, there are big everyday advantages for certain users who have a PC equipped with dual processors. As a result, software for graphics rendering, video and audio encoding and simultaneous operation of two or more calculation-intensive applications profit from the impressive increases in performance. In the area of graphics rendering, there is dual-capable software, such as 3D Studio MAX, Cinema 4D and Lightwave; in video encoding, there is, for example, MainConcept Encoder, Pinnacle Studio 9 or Flask Mpeg.

In addition to multiprocessor software usage, the user's work environment is also slowly changing. Because graphics cards often have two slots, and monitors are relatively inexpensive, many users already use two displays. Ambitious home users can tell you a thing or two about that: whoever wants to encode a video and start a game at the same time will immediately experience the limits of a single processor system. An intelligently configured dual platform reacts differently.

Here, we analyze Intel's dual-processor capable E7505/Placer chipset and offer tips for memory usage. In a subsequent article, using a self-programmed tool, we will show that increases in performance can be achieved with certain applications, as long as certain threads are not managed by an operating system but are manually assigned to a CPU. In connection with that, we have also completed a comparison test of E7505 motherboards, which will be posted soon on the website.

2. E7505/Placer Chipset Technology In Detail

The Intel E7505 chipset, code named "Placer," is based on a 180-nanometer process and is designed for two processors. The chipset has the same FC-BGA package as the 875/Canterwood, therefore it also has the same number of 1,005 soldering balls.

With 143 mm2, the surface size of the silicon die seems bigger, because the 875 requires only 100 mm2. The HUB 2.0 interface and the memory controller account for the larger surface area. This also keeps the price low for the motherboard manufacturers. For the E7505, you have to pay $100 per unit in quantities of 1,000 - twice as much compared to the 875.


E7505 chipset as a block diagram

The E7505 Northbridge from Intel

The E7505 Northbridge (a.k.a. Memory Controller Hub, abbreviated as MCH) is typically bundled with the ICH4 and P64H2 Southbridge. The ICH4 is connected to the HUB 1.5 interface and clocks a speed of 66 MHz. This interface can transfer files to Northridge at a maximum speed of 266 MB per second with an 8 bit bus width.

3. E7505/Placer Chipset Technology In Detail, Continued

The P64H2 bridge, on the other hand, operates at 133 MHz according to the HUB 2.0 protocol. This speed can accommodate data transfer rates of up to 1 GB per second over a 16 bit wide bus. Furthermore, the E7505 has a dual memory interface as well as an AGP 8x interface.

The ICH4 Southbridge from Intel

The ICH4 (82801DB) Southbridge, based on a 250-nm process, offers connectivity for six USB 2.0 ports, four ATA100 drives, a 100 MBit LAN chip, an AC97 sound decoder and support for a maximum of six PCI master devices, each with 133 MB per second bandwidth.

Intel's P64H2 Southbridge

Things work differently with the P64H2 (82870P2) Bridge. It was designed for the fast PCI 64 and PCI X interfaces. The PCI 64 interface corresponds to Version 2.3. Both operate in 64 bit mode. All motherboards with an E7505 chipset in the WTX format have connection possibilities for a maximum of three PCI 64 and one PCI X cards. PCI 64 operates either with 33 MHz or 66 MHz, resulting in transfer rates of between 266 MB/sec and 533 MB/sec (maximum). In comparison, the PCI X operates with 66 MHz, 100 MHz and 133 MHz. Data transfer rates of between 533 MB/sec and 1066 MB/sec are reached.

Block diagram of the E7505 chipset with P64H2 Southbridge
4. Data Transfer Rates Depending On PCI Standard
Standard Bit Clock Transfer rates
(bi-directional)
PCI 2.3 32 Bit 33 MHz 133 MB/sec
PCI 2.3 32 Bit 66 MHz 266 MB/sec
PCI 64 64 Bit 33 MHz 266 MB/sec
PCI 64 64 Bit 66 MHz 533 MB/sec
PCI-X 1.0 64 Bit 66 MHz 533 MB/sec
PCI-X 1.0 64 Bit 100 MHz 800 MB/sec
PCI-X 1.0 64 Bit 133 MHz 1066 MB/sec
PCI-X 2.0 (DDR) 64 Bit 133 MHz 2132 MB/sec
PCI-X 2.0 (QDR) 64 Bit 133 MHz 4264 MB/sec
PCI-Express 1 Lines 8 Bit 2.5 GHz 512 MB/sec
PCI-Express 2 Lines 8 Bit 2.5 GHz 1 GB/sec
PCI-Express 4 Lines 8 Bit 2.5 GHz 2 GB/sec
PCI-Express 8 Lines 8 Bit 2.5 GHz 4 GB/sec
PCI-Express 16 Lines 8 Bit 2.5 GHz 8 GB/sec

The HUB 2 connection offers a maximum data transfer rate of 1 GB per second between Southbridge and Northbridge, the following combination possibilities resulting for a maximum interface load for a P64H2 chip:

  1. 1x PCI-X 133 MHz = 1066 MB/sec
  2. 1x PCI-X 100 MHz = 800 MB/sec
  3. 2x PCI-X 66 MHz = 1066 MB/sec
  4. 2x PCI 64 66 MHz = 1066 MB/sec
  5. 3x PCI 64 33 MHz = 798 MB/sec

Standard cards do not exhaust data transfer rates of 1,066 MB per second. Only high-performance products, such as SCSI320 cards (320 MB/s) or 10 GB LAN chips (max. 1250 MB per second), would be sensible candidates.

Many PCI cards are capable of performing their services not only with conventional PCI 2.3 slots, but also with a PCI 64 slot. Examples of these are network cards, RAID controllers and even 56K modems. In order to avoid incorrect configurations, they have an additional notch on the connection contacts.

A 56K modem for a 64 bit slot

The 56K modem operates here with conventional 33 MHz in 32 bit mode.

A Promise SATA controller for a 64 bit slot

This Raid controller can handle even 66 MHz in 32 bit mode.

5. Comparison Of Current Workstation Chipsets From Intel
Chipset I860 I875P E7205 E7505
MCH 82860 82875P E7205 E7505
Codename Colusa Canterwood Granite Bay Placer
Developed for Xeon DP Pentium 4 Pentium 4 Xeon DP
Hyper Threading Support Yes yes yes yes
Number of supported CPUs 1-2 1 1 1-2
FSB 100 MHz 133/200 MHz 100/133 MHz 100/133 MHz
Memory modules 4 RIMMS
(8 with MRH-R)
4 DIMMs 4 DIMMs 4 DIMMs
Channels Single-Channel Dual-Channel Dual-Channel Dual-Channel
Memory type PD800/600 RDRAM DDR266/333/400 DDR200/266 DDR266
Max. Memory 4 GB (with 2 Repeaters) 4 GB 4 GB 16 GB
Number of Rows 32 8 4 6
Mbit Support 288/256
144/128
128/256/512 128/256/512 128/256/512
1024
ECC Yes Yes Yes Yes
Graphic Interface
AGP 2x/4x (1.5) 4x/8x (1.5V) 1x/2x/4x (1.5V)
4x/8x (0.8V)
1x/2x/4x (1.5V)
4x/8x (0.8V)
I/O HUB
Southbridges ICH2 (82801BA) ICH5 (82801EB)
ICH5R (82801ER)
ICH4 (82801DB) ICH4 (82801DB)
PCI-Standard 2.2 2.3 2.2 2.2
PCI Master Slots (max) 6 6 6 6
IDE ATA 33/66/100 ATA 33/66/100 ATA 33/66/100 ATA 33/66/100
SATA Support No 2 No No
USB Ports 4x USB 1.1 8x USB 2.0 6x USB 2.0 6x USB 1.1
USB 2.0 (P64H2)
LAN Yes CSA 266 MHz Yes Integrated10/100 Mbit
AC'97 Audio/Modem AC'97 2.3 Yes AC'97 2.3
Manageability
I/O Management SMBus/GPIO SMBus 2.0/GPIO SMBus/GPIO SMBus 2.0/GPIO
I/O HUB (Expansion)
PCI Controller P64H n/a n/a P64H2
PCI Support PCI 64 (2x 66 MHz) or PCI 33 (4x 33 MHz) n/a n/a 2x 64Bit PCI/PCI-X
PCI max 66 MHz
PCI-X max 133 MHz
PCI Master 6 n/a n/a 3

Chipset Price

Chipset Codename Price per 1000
E7505 Placer $100
E7501 Plumas 533 $92
E7500 Plumas $92
E7205 Granite Bay $57
I875P Canterwood $50
I865PE Springdale $28
6. Memory Of Up To 16 GB

Because the Intel E7505 chipset always synchronizes the processor data bus with the main memory (1:1), only DDR266 memory is suitable for such a platform. As with the 875, the chipset has a dual memory controller, with which it can attain a theoretical memory performance of up to 4.2 GB per second at 133 MHz. For comparison purposes : The 875 chipset manages 6.4 GB per second on the basis of its higher speed of 200 MHz. System security, however, plays a bigger role, and that’s why Intel integrates the ECC (Error Checking and Correction) option.

Upgrade : ECC Requires An Additional Chip Per Row

Like the 875 chipset, the E7505 manages 8 rows (also called pages). Reminder : 1 memory module has either one row (single page) or two rows (double page). The following table provides a sample calculation for the respective maximum memory upgrade of the platforms (without ECC) :

Memory expansion Module typical structure (non-ECC)
1 GB 2 4 Rows x 8 Chips x 256 MBit = 8,192 MBit
2 GB 4 8 Rows x 8 Chips x 256 MBit = 16,384 MBit
4 GB 4 8 Rows x 8 Chips x 512 MBit = 32,796 MBit
8 GB 4 8 Rows x 16 Chips x 512 MBit = 65,536 MBit
16 GB 4 8 Rows x 16 Chips x 1 GBit = 131,072 MBit

However, if the user wants to play it safe and use modules with ECC, then he should note that an additional chip would have to be added per row. This chip is merely responsible for the proof totals and does not have any influence on the maximum memory upgrade.

Number of possible chips without ECC Number of possible chips with ECC
8 9
16 18

Memory from Corsair with CL 2.0-3-2-6 timings

Registered and ECC memory from Mushkin with CL 2.0-3-2 timings

Registered and ECC memory from Legacy Electronics with CL 2.5 timings

DDR333 Registered and ECC memory from Infineon with CL 2.5 timings

To give you a worst case example : Modules with 16 GB ECC system memory can consist of 144 chips - an enormous burden for the Memory Controller Hub ! However, only 128 of these chips are used for actual memory functions, while the rest is used for administrative tasks.

Registered Versus Unbuffered Memory

Classical memory is always available in unbuffered versions. What’s new is registered memory (previously referred to as buffered memory). The more chips a memory controller has to manage, the less clear the data signals will be.

And now the trick : If you put a small manager in front of the nose of individual memory chips, every row/page will trick the memory controller into believing that only one chip is available. And this improves the signal quality and data security. But this comes at the cost of speed because the small register chip causes a short time delay in the electrical signals.

7. Dual Xeon Double Offer

In Task Manager, two real and two virtual processors are shown.

As a rule, every Socket 604 Xeon CPU is suitable for HyperThreading technology. The E7505 is capable of simultaneously operating with two processors, as well as with HyperThreaded applications. As a result, four processors (two physical and two virtual) are available for the operating system. Nevertheless, the chipset has only one CPU interface, which means that both processors have to share one bus. At a speed of 133 MHz (533 MHz QDR), a bandwidth of 4.2 GB per second results. In a worst-case scenario, each virtual CPU will receive only one data flow at only 1 GB per second. However, this could have negative effects only with some OpenGL applications.

On the left, the Xeon and on the right, the P4 Northwood from Intel

The Intel Xeon (code name "Prestonia") is based on the same core as the Pentium 4 "Northwood". The latter operates with an FSB of 200 MHz (800 MHz QDR), and compiles at 6.4 GB per second. In order to balance out the up to 34% lower bandwidth with the Xeon, Intel also offers models with 1 or 2 MB L3 cache, beginning with the 2.4 GHz versions.

Prices For Current Xeon Processors

Intel Xeon Processor (Socket 604)
Processor Codename FSB L2 Cache L3 Cache Price per 1000
Xeon 2.0 GHz Prestonia 133 MHz 512 kB n/a $198
Xeon 2.4 GHz Prestonia 133 MHz 512 kB n/a $209
Xeon 2.66 GHz Prestonia 133 MHz 512 kB n/a $256
Xeon 2.8 GHz Prestonia 133 MHz 512 kB n/a $316
Xeon 3.06 GHz Prestonia 133 MHz 512 kB n/a $455
Xeon 2.4 GHz Prestonia 133 MHz 512 kB 1024 kB $316
Xeon 2.8 GHz Prestonia 133 MHz 512 kB 1024 kB $455
Xeon 3.06 GHz Prestonia 133 MHz 512 kB 1024 kB $690
Xeon 3.2 GHz Prestonia 133 MHz 512 kB 1024 kB $581
Xeon 3.2 GHz Prestonia 2M 133 MHz 512 kB 2048 kB $1043

An analysis of availability and prices shows that 2.66 GHz models provide the best price-performance ratio. Intel's next step is to increase the FSB to 200 MHz (800 MHz QDR). Once again, this will mean new chipsets.

8. The Right Motherboard Format: ATX Or WTX

On the left, the big WTX format and on the right, the ATX format

Compared to standard ATX boards, Xeon workstation boards have considerably more units, including, for example, PCI64/X interfaces, two Southbridges, LAN chips, voltage regulators, CPU socket or an additional SCSI controller. In order to accommodate the higher number of components, larger boards in WTX standard are required. These have a 32.94% larger surface area, measuring 33 x 33.5 cm compared to the ATX boards (30.5 x 24.5 cm). Boards with a WTX form factor do not fit in a conventional home PC case. The manufacturers MSI and Tyan also offer motherboards without the additional components, such as P64H2 Bridge and LAN in an ATX format. At any rate, installing them in a conventional tower would not be a problem.

The Right Power Adapter: ATX Or EPS12V

Fully stocked Xeon system

Because we are talking here about a dual CPU platform, the processing unit's power loss is also doubled. The fastest Xeon models with a Prestonia 2-M core and 3.2 GHz speed have, as a pair, a maximum power loss of 184 watts. Added to that are board components (an average of 50 watts), a high-performance graphics card with 70 watts, and a large memory upgrade - all together, it quickly uses up 350 watts.

A 20-pole plug provides a motherboard with voltage

This overloads the power supply to the motherboard. As a result, the boards in WTX format have another power adapter standard, which goes by the name of EPS12V. They have connections with more power and mass cores, as well as wider plugs in order to distribute the load better. As with the ATX form factor, the power pins are also made of gold in order to attain a lower resistance and to therefore improve the quality of the signals.

A voltage adapter from a Tagan power adapter (TG480-U01)
9. The Right Power Adapter: ATX Or EPS12V, Continued

With more than 350 watts, today's ATX power adapters deliver sufficient power in order to be able to supply dual systems in ATX format as well. In the meantime, there are power adapters on the market that support both ATX and EPS12V standards with the aid of a special adapter cable. Consequently, this eliminates the need for an eventual power-adapter replacement, and it saves additional expenses incurred when changing systems. Many motherboards are capable of operating with both power adapter standards.

On the left, a 24-pin WTX plug and on the right, a 20-pin ATX plug

On the left an 8-pin WTX plug and on the right, a 4-pin PWR plug

The "20/24P" marking on the large voltage connection indicates that it can operate with the 24-pin WTX as well as with the 20-pin ATX memory. The same applies as well to the "12V-8/4P" marking on the small AUX connection - it supports the 8-pin as well as the 4-pin connections. Each of the four missing leads is a redundant voltage pin for load sharing.

The various allocations of ATX and WTX plugs

For the power adapter with the EPS12V standard, additional +12V, +3.3V, +5 V and mass leads are connected to the board.

AGP: Support For All Cards

The E7505 Northbridge offers support for AGP graphics cards, while most motherboards have a "Pro" slot. With the Pro versions, the card is supplied with additional voltage pins.

Signaling Level
Data Rate AGP 3.0 1.5 V 3.3 V
PCI-66 Yes Yes No
1 x AGP No Yes No
2 x AGP No Yes No
3 x AGP Yes Yes No
4 x AGP Yes No No

Support for the 3.0 standard is also offered, and all graphics cards available on the market can be used without any problem.

10. Test Configuration
Intel Processors (Socket 604)
133 MHz FSB (DUAL DDR266) Intel Xeon 3.06 GHz (3066 MHz, 12-8/512/1024 kB)
Intel Processors (Socket 478)
133 MHz FSB (DUAL DDR266) Pentium 4 3.06 GHz (3066 MHz, 12-8/512 kB)
200 MHz FSB (DUAL DDR400) Pentium 4 3.2E GHz (3200 MHz, 12-8/1024 kB)
200 MHz FSB (DUAL DDR400) Pentium 4EE 3.2 GHz (3400 MHz, 12-8/512/2048 kB)
Memory
DDR400 (200 MHz) 2 x 512 MB / 5ns / 64 Bit (Corsair)
CMX512-3200LL (CL 2.0-3-2-6)
DDR400 (200 MHz) 2 x 512 MB / 5ns / 64 Bit (Mushkin) REG ECC
MS64D64020U-5 (CL 2.0-3-2-6)
Common Hardware
Sound Card Terratec Aureon 7.1 Space
96.00 kHz sample rate
Graphics Card Asus A9800XT/TVD, Rev. 1.01
GPU : ATI Radeon 9800XT, 412 MHz Chip Clock
Memory : 256 MB DDR-SDRAM, 365 MHz Chip Clock
Hard Drive FastTrak S150 TX2plus (Bios : 1.00.0.30)
2 x SATA Maxtor 6Y080M0 (Raid 0)
80 GB / 8 MB Cache / 7200 rpm
DVD/CD-ROM MSI MS-8216 16x DVD
Software
Chipset Chipset Installation Utility Ver. 5.1.1.1002
IAA RAID Edition 3.5.3
Graphics ATI Catalyst XP 4.3 (Driver 6.14.10.6430)
Promise RAID 1.00.0.37
DirectX Version : 9b
OS Windows XP, Build 2600 SP1 (English)
11. Benchmarks And Settings
OpenGL
Quake III Team Arena Version 1.32
1024x768 - 32 bit
Timedemo1 / demo thg3
"custom timedemo"
Graphics detail = Normal
DirectX 9a
3DMark 2003 Version 3.4.0
Graphics and CPU Default Benchmark
1024 x 786 - 32 bit
Video
Mainconcept MPEG Encoder Version 1.4.1
1.2 GB DV to MPEG II
(720x576, Audio) converting
Pinnacle Studio 9 Version: 9.0.0
Rendering - DVD Compatible
no Audio
Windows Media Encoder 9 Version: 9.00.00.2980
436 MB AVI File conversion to WMV
Windows Media Server (Streaming)
Microsoft Movie Maker Version 2.0.3312.0
416 MB DV to WMV
TMPGEnc Plus Version 2.521
1.2 GB DV to MPEG I
(720x576, Audio) converting
Audio
magix mp3 maker 2004 Version 4.11 Build 19593
Syntrillum Cool Edit Pro Version 2.1
Amplitude Normalizing
2.6 GB Wave Audio file
Applications
3D Studio Max 6.0 Rendering Single, 1024x768
Newtek Lightwave Version 7.5c - Build 572
Render First Frame = 1
Render Last Frame = 60
Render Frame Step = 1
Rendering Bench "variation.lws"
Show Rendering in Progress = 320x240
Ray Trace Shadows, Reflection, Refraction,
Transparency = on
Multithreading = 8 Threads
Maxon Cinema 4D XL 8 Version 8.503
Rendering in 1024x768, "ship_dirt"
Microsoft Visual Studio .NET Version 2003 (Enterprise Architect)
Visual C++: compiling Emule 0.42b
LIUtilities WinBackup Version 1.84
650 MB Wave file
Encryption: 256 Bit DES, Password "test"
Synthetic
PCMark 2004 Pro Build 1.1.0
CPU and Memory Tests
SiSoftware Sandra 2004 Version 2004.10.9.89
CPU Test: CPU Multimedia / CPU Arithmetic
Memory Test: Memory Bandwidth Benchmark
12. Benchmark Results

In the following benchmarks, the differences in performance can be seen between a dual platform and a "normal" Pentium 4 in single operation.

OpenGL

DirectX 9a

13. Video

14. Video, Continued

Audio

15. Applications

16. Applications, Continued

Synthetic

17. Synthetic, Continued

Conclusion

Applications already optimized for HyperThreading see performance gains from the use of two physical CPUs. In view of system costs, it is therefore worthwhile for users to go with a Dual Xeon as their next system if most of their time is spent rendering or encoding.

In the subsequent articles, we will show how the various E7505 motherboards measure up in a head-to-head comparison. We will also soon publish an article on how to increase performance by using a self-programmed tool, which can assign tasks to certain CPUs. This tool takes away automatic task assignments from the operating system and forces an application to run on a manually-specified CPU.