Sign in with
Sign up | Sign in
Your question

Great Woodcrest, Socket F, S940, Dempsey Review

Tags:
  • CPUs
  • Socket
Last response: in CPUs
Share
September 16, 2006 11:47:38 PM

It's extremely comprehensive to say the least.

http://tweakers.net/reviews/646/1

Here they compared 2.67GHz 5150s to 2.4GHz 2216. While the 5150 does have a clock speed advantage, the 2.67GHz 5150 is actually cheaper than the 2216. The 5150 is listed at $690 while the 2216 is officially priced at $698, so from a price perspective it is fair. In any case, the 5150s performance lead over the 2216 was on average 30%, which is far more than the 267MHz clock speed advantage would justify.

The caveat is that the Socket F system only had 4GB of RAM to the 7GB on Woodcrest. Still I want to point out that Woodcrest's lead was a significant 30% so just adding RAM and matching clock speeds wouldn't solve that. Also, Woodcrest was using DDR2 533 instead of DDR2 667 memory and was also using a non-standard configuration of 6x1GB DIMMs and 2x512MB DIMMs. Between the lower memory bandwidth and unbalanced memory configuration, Woodcrest still has more potential to grow.

What I think is more important is on page 14 and 15. The results show that Socket F performs 6% worse than a comparable S940 configuration. Again, the Socket F was limited to 4GB compared to the 8GB on the S940, but I'm thinking that even if the RAM were matched, Socket F would still be slightly slower on average. As I've mentioned before, the higher latency DDR2 667 memory would hamper K8 performance especially considering the need for AM2 to use DDR2 800 with at least 4-4-4 timings to get best performance.

On page 15 they also show scaling graphs. This is where AMD's architecture shows it's benefits. Woodcrest's scaling from 1 core to 2 and 2 to 4 is lower than Socket F's. The reason why Woodcrest still holds it's lead is that each core is simply so much more powerful. In some benchmarks 2 Woodcrest cores beats all 4 Opteron cores. It's also interesting that no-one can achieve "perfect" 80% scaling. The highest is from Opteron from 1 to 2 cores which was 62% increase. Even Opteron could only achieve 44% scaling from 2 core to 4.

More about : great woodcrest socket s940 dempsey review

September 17, 2006 12:29:12 AM

It would be neato to see a tulsa in there as well. Nice showing by the woodcrest though thats for sure.
September 17, 2006 6:10:45 AM

Quote:

On page 15 they also show scaling graphs. This is where AMD's architecture shows it's benefits. Woodcrest's scaling from 1 core to 2 and 2 to 4 is lower than Socket F's. The reason why Woodcrest still holds it's lead is that each core is simply so much more powerful. In some benchmarks 2 Woodcrest cores beats all 4 Opteron cores. It's also interesting that no-one can achieve "perfect" 80% scaling. The highest is from Opteron from 1 to 2 cores which was 62% increase. Even Opteron could only achieve 44% scaling from 2 core to 4.


It is not unlikely that other factors affect scaling here. These are database tests; if I understood the benchmark description well, it could be network limited. So maybe 4-core woodcrest setup is so fast that network interface is unable to keep the pace :) 
Related resources
September 18, 2006 1:09:22 AM

Quote:

On page 15 they also show scaling graphs. This is where AMD's architecture shows it's benefits. Woodcrest's scaling from 1 core to 2 and 2 to 4 is lower than Socket F's. The reason why Woodcrest still holds it's lead is that each core is simply so much more powerful. In some benchmarks 2 Woodcrest cores beats all 4 Opteron cores. It's also interesting that no-one can achieve "perfect" 80% scaling. The highest is from Opteron from 1 to 2 cores which was 62% increase. Even Opteron could only achieve 44% scaling from 2 core to 4.


It is not unlikely that other factors affect scaling here. These are database tests; if I understood the benchmark description well, it could be network limited. So maybe 4-core woodcrest setup is so fast that network interface is unable to keep the pace :) 

On what regards network bandwidth limitations, seems that Sun's Niagara II (Sparc T2) will not suffer from it anytime soon, since it's able to deliver a massive 268.8GB/s (crossbar partitioned into eight cores, a measle 33.6GB/s per core), although within a single MPU:

Quote:
All together, the crossbar supports 8 data destinations (the SPARC cores) and 9 data sources (8 L2 cache banks, and I/O). Using the rumored 1.4GHz clock speed, that suggests 268.8GB/s of crossbar bandwidth. This is backed by an impressive 42.7GB/s (FBD-667) of memory bandwidth.


Actually, this appears to be a very interesting chip, although with the sole purpose of intensive server TLP (maybe Sun's Rock will address the workstation space, as well); aside doubling Sun's previous T1 performance (estimate), the T2 has an impressive set of features, namely, on what concerns its architectural simplicity (due, of course, to its server-optimized HW/SW) and its also simple but (apparently) very efficient memory hierarchy approach.

While not carrying all the bells & whistles of its contenders (IBM, AMD, Intel, ...), its "simplicity" is almost disarming:

Quote:
One of the biggest improvements in Niagara II was the enhanced floating point support. As a general rule of thumb, performance critical floating point applications are rich in ILP, which would make Niagara II a less than ideal processor. However, some workloads simply require a massive amount of bandwidth, and Niagara II is fairly impressive in that regard. Moreover, perhaps this will push Sun into researching techniques to convert ILP into TLP. Certainly, it should be easy to distribute loop iterations (with no carried dependencies) between different threads. More robust techniques along these lines could turn Niagara II into a very attractive HPC system and help the industry as a whole, although the financial merit of such an idea is unclear.


http://www.realworldtech.com/includes/templates/articles.cfm?ArticleID=RWT090406012516&mode=print

(The abstract from HOT CHIPS18):

http://www.hotchips.org/hc18/program/tutorials.htm

Just a curiosity regarding the multi-core processing en vogue trend (by Intel's Justin Rattner), addressing Multi-Core & RMS (Recognition, Mining & Synthesis) which, I believe, is pertinent in any multi-core landscape prospect:

http://www.hotchips.org/hc18/docs/keynote1_hc18.pdf


Cheers!
September 18, 2006 3:21:08 AM

Quote:
On what regards network bandwidth limitations, seems that Sun's Niagara II (Sparc T2) will not suffer from it anytime soon, since it's able to deliver a massive 268.8GB/s (crossbar partitioned into eight cores, a measle 33.6GB/s per core), although within a single MPU:

I think that 268.8GB/s figure is the required figure that the crossbar must support to not bottleneck the system rather than the confirmed throughput. Still it's impressive. I particularly like the 4 dual channel FB-DIMM controllers each supporting 2 L2 caches. Their implementation of dual thread processing per core is also interesting with it's combination of dedicated and shared execution units.

It seems that the T2 design is also the basis for Intel's Kiefer since the design philosophy is very similar. Namely multiple L2 caches caches and multiple limited channel (single or dual rather than quad or hexa) FB-DIMM memory controllers all shared by multiple cores through some form of crossbar or ring-bus system. Kiefer's 8 nodes with 4 "cores" each may also be reminescent of T2's 8 cores with their 2 thread parallelism. I've mentioned it before, but a flexible design to Kiefer would have each "core" in a node to be devoted to specific functions, but the 4 cores in a node could be combined in some form of "multiplexing" so that each node could operate as a full core as we currently know it. Kind of like how each of T2's cores can process simple integer operations nearly as 2 separate cores but come together for complex integer or FP work.
September 18, 2006 7:06:20 AM

Quote:

On page 15 they also show scaling graphs. This is where AMD's architecture shows it's benefits. Woodcrest's scaling from 1 core to 2 and 2 to 4 is lower than Socket F's. The reason why Woodcrest still holds it's lead is that each core is simply so much more powerful. In some benchmarks 2 Woodcrest cores beats all 4 Opteron cores. It's also interesting that no-one can achieve "perfect" 80% scaling. The highest is from Opteron from 1 to 2 cores which was 62% increase. Even Opteron could only achieve 44% scaling from 2 core to 4.


It is not unlikely that other factors affect scaling here. These are database tests; if I understood the benchmark description well, it could be network limited. So maybe 4-core woodcrest setup is so fast that network interface is unable to keep the pace :) 

On what regards network bandwidth limitations, seems that Sun's Niagara II (Sparc T2) will not suffer from it anytime soon, since it's able to deliver a massive 268.8GB/s (crossbar partitioned into eight cores, a measle 33.6GB/s per core), although within a single MPU:


Ooops, no, crossbar traffic is not what I am speaking about. Just simple ethernet NIC.

These database tests were setup so that client machines are performing queries to the webserver. Clients are connected with ethernet. Webserver is connected with ethernet to database server (testing CPUs):

"The database runs on a dedicated system and gets its instructions from a load-balanced cluster of web servers. This separation of data and web apps is based on a classic 'two-tier' pattern.

Maybe simply 1G ethernet is limiting factor there.
September 18, 2006 11:37:48 AM

Quote:
Ooops, no, crossbar traffic is not what I am speaking about. Just simple ethernet NIC.

These database tests were setup so that client machines are performing queries to the webserver. Clients are connected with ethernet. Webserver is connected with ethernet to database server (testing CPUs):

"The database runs on a dedicated system and gets its instructions from a load-balanced cluster of web servers. This separation of data and web apps is based on a classic 'two-tier' pattern.

Maybe simply 1G ethernet is limiting factor there.


Yes, I should've stated interconnect bandwidth instead of network's (although the later can also be applied to some sort of chip interconnects...); but, even if you only consider Ethernet, the T2 is no dwarf at all:

Quote:
The I/O devices are all capable of DMA, but the crossbar is equipped with a port for the cores to read from I/O devices. Niagara II implements two built in 10/1 Gigabit Ethernet ports with packet classification and filtering and a x8 PCI Express port, presumably to be used for storage. By integrating the I/O devices on-die, Niagara II will save a fair amount of power, money and design complexity, compared to systems that use multi chip solutions. Handling 20 gigabits/s of Ethernet traffic is rather remarkable, as a single 10GBE port will overwhelm modern MPUs that do not use TCP/IP coprocessor or offload engines. This is another feat that is only possible because Sun owns the entire stack; hopefully the appropriate hooks are all in place, so that Linux will be able to achieve the same performance. If Sun's implementation works well, it will set the bar for other processors from server rivals Intel, AMD and IBM.
:wink:


Cheers!
!