Interlagos and Valencia Discussion

A little bit of technical background here courtest of Chris Angelini:

http://www.tomshardware.com/reviews/fx-8150-zambezi-bulldozer-990fx,3043-10.html

Big hello to Sylvie B at eetimes ... whose fine stories many of us once enjoyed at "The Inq" ... with a decent dash of humour I might add!!

http://www.eetimes.com/electronics-news/4230565/AMD-s-Interlagos-and-Valencia-finally-emerge

This is a sticky for you to post and discuss benchmarks and the architecture for the new Interlagos and Valencia Server CPU's released today, based on the Bulldozer modular design.

I have put it under the server subsection here.

Enjoy.

http://www.amd.com/us/aboutamd/newsroom/Pages/newsroom.aspx



:)
 
IIRC Silvie was pretty hot, although I don't see her pic at EETimes..

Anyway, the article states "AMD predicts the updated Opteron 6276 will have 84 percent higher performance than rival Intel’s Xeon processor Model X5670, while the new line of processors will also purportedly deliver increased scalability for Virtualization with up to 73 percent more memory bandwidth and half the power per core than Intel's lowest power per core server processor, the L5630, at just 4.375W per core."

However according to Johan de Gelas' Interlagos review at Anandtech http://www.anandtech.com/show/5058/amds-opteron-interlagos-6200:

The specifications (16 threads, 32MB of cache) and AMD's promises that Interlagos would outperform Magny-cours by a large margin created the impression that the Interlagos Opteron would give the current top Xeons a hard time. However, the newest Opteron cannot reach higher clock speeds than the current Opteron (6276 at 2.3GHz), and AMD positions the Opteron 6276 2.3GHz as an alternative to the Xeon E5649 at 2.53GHz. As the latter has a lower TDP, it is clear that the newest Opteron has to outperform this Xeon by a decent margin. In fact most server buyers expect a price/performance bonus from AMD, so the Opteron 6276 needs to perform roughly at the level of the X5650 to gain the interest of IT customers.

Judging from the current positioning, the high-end is a lost cause for now. First, AMD needs a 140W TDP chip to compete with the slower parts of Intel's high-end armada. Second, Sandy Bridge EP is coming out in the next quarter--we've already seen the desktop Sandy Bridge-E launch, and adding two more cores (four more threads) for the server version will only increase the performance potential. The Sandy Bridge cores have proven to be faster than Westmere cores, and the new Xeon E5 will have eight of them. Clock speeds will be a bit lower (2.0-2.5GHz), but we can safely assume that the new Xeon E5 will outperform its older brother by a noticeable margin and make it even harder for the new Opteron to compete in the higher end of the 2P market.

In the benchmarks, the 2-socket Xeon X5670 server (12 cores/24 threads at 2.93Ghz beat the 2-socket Opteron 6276 (32 cores/32 threads at 2.3GHz) in just about every benchmark, sometimes as much as 3X better as in the MySQL response bench, and with signficantly lower power consumption to boot.
 

BaronMatrix

Splendid
Dec 14, 2005
6,655
0
25,790
Actually, Interlagos supports 16DIMMS at 1.25V (a first) for 512MB per 4 socket. Dell just released the new C6145 Which has dual 4P in a 2U config with 1TB RAM.

Dell, HP and Acer show up in Munich


I posted some SPEC scores over in CPUs and recreate them here.

SPECINT Rate
CPU: E7- 8830

Threads: 64
Cores: 32
Sockets: 4
Speed: 2133MHz

Base: 737
Peak 783

CPU: Opteron 6276
Threads: 64
Cores: 64
Sockets: 4
Speed 2300MHz

Base:835
Peak: 959


This is a comparison of MC and Interlagos. It shows that more sockets do help with the same core count. The MC has 4X the sockets and gets about 15% more perf. I'm waiting to see a 12 core comparison with the same amount of sockets.

CPU: Opteron 6276

Threads: 32
Cores: 32
Sockets: 1

Speed: 2300MHz

base: 419
Peak: 480


CPU: Opteron 6134

Threads:32
Cores: 32
Sockets: 4

Speed:2300MHz

Base:485
Peak:545


Anand has a VM type compare but as is usual, they find tweaks after they declare results and gloss over the true advantages, such as the Opteron server costing 50% less or the fact that Intel gets the majority of optimization while AMD is behind the curve there.

At any rate, if there is a B3 server rev coming it should even improve upon these numbers.
 
According to the Interlagos thread over on AMDZone, looks like the 6276 doesn't fare so well in SAP either, so it looks like those with SQL database, VM virtualization and SAP uses - a large part of the market - won't be too impressed. As Baron mentioned, it's cheap however, at least for the hardware. I'd like to see some TCO comparisons though.

 

BaronMatrix

Splendid
Dec 14, 2005
6,655
0
25,790



And as I said also, these SAP\Oracle\MySQL, etc. benchmarks need people who really know how to set them up for Enterprise use. Anand was given tweaks by someone and also admitted that BD is ahead of the software. That's why I think Project Win should be software optimizations on all platforms. Fusion is doign well, but FMAC shoudl work for INT (XOP).

We'll definitely see enhancements as we go. It'll be interesting to see what C2 would bring (as the launch Rev of Deneb). They usually drop a new rev every quarter so if B3 launches for Q1, then there could be a C3 before PileDriver. We can already assume power tweaks and more than likely some minor changes to microcode and maybe the branch predictor. They coudl also use some work on the L1\L2 bandwidth as latencies aren't too bad.
 


IIRC that's what S/A and some other sites mentioned - project Win is merely internally justifying the layoffs to the remaining AMD employees (i.e., cheerleading & propaganda).

IOW, it's not "Win" as in Windows 8, but akin to "Win" as opposed to "Lose (some people)".
 

Chad Boga

Distinguished
Dec 30, 2009
1,095
0
19,290
From Ars Technica, a discussion and analysis of various server benchmarks.

AMD's Bulldozer server benchmarks are here, and they're a catastrophe.
.
.
Some commentators have even suggested that Bulldozer was, first and foremost, a server processor; relatively weak desktop performance was to be expected, but it would all come good in the server room.

Unfortunately for AMD, it looks as though the decisions that hurt Bulldozer on the desktop continue to hurt it in the server room. Although the server benchmarks don't show the same regressions as were found on the desktop, they do little to justify the design of the new architecture.
.
.
After the poor desktop performance, the possibility still existed that the Bulldozer architecture would start to make sense once we could see the server performance. Now the benchmarks have arrived, AMD's perseverance with Bulldozer is bordering on the incomprehensible. There's just no upside to the decisions AMD has made. All of which raises a question: why did AMD go this route? The company must have known about the weak single-threaded performance and the detrimental effect this would have in real-world applications long before the product actually shipped, so why stick with it? Perhaps AMD's anticipation of high clock speeds caused the company to stick with the design, and there's still a possibility that it might one day attain those clock speeds—but we've seen AMD's arch-competitor, Intel, make a similar gamble with the Pentium 4, and for Intel, it never really paid off.

AMD is boasting that Opteron 6200 is the "first and only" 16-core x86 processor on the market. Not only is this not really true (equating threads and cores is playing fast and loose with the truth), it just doesn't matter. In its effort to add all those "cores," performance has been severely compromised. AMD faces an uphill struggle just to compete with its own old chips—let alone with Intel.
 

BaronMatrix

Splendid
Dec 14, 2005
6,655
0
25,790



I read that review and could have been a little more impartial. Reporting doesn't include some of that wording as I mentioned there. Anand's own testing showed that Server 2008 recognized Interlagos and lowered it's idle power to the same level as 6C12T chips. I believe in the SPEC suite simply because I see too much variation in testing methods and workloads - especially for server. Anand previously did INSTR mix analyses and actual comparative eview but now it's throw the AMD system in hope it works - or at least that's how it appears.


To continue from your salary post:



I guess you'll enjoy searching through all of the forums for proof of this (1%). At most I would say I underestimated how much optimization would help or lack of same would hurt. It's a bold new arch that has legs. AMD will surely increase efficiency as they can pay for faster process optimization (where they used to have to pay for the whole Fab).
There do seem to be issues with certain aspects of GF's 32nm process (I'd say teething pains with GPUs on SOI HKMG) and there are rumors that they won't ramp 28nm until Q212 which is a slight problem, though even improving clockspeed on the current Bobcat APU will keep the low cost 11.6-13" market.

I don't remember mentioning my salary as anything other than proof that I must know something.

You can all drive a person to it - and could be purposeful in your machinations (insert maniacal laugh).

At any rate, even the ability (EASY) ability to set thread affinity quickly before a game, you gain up to 30% from what I've seen. As new revs are produced, usually every quarter or so, power will be handled and clocks can go up. Everyone is always keen to talk about Deneb\Thuban which both released at C2 not the B2 of FX\Interlagos. We coudl say that it was released too soon or released at a point where the "8\16 core part" can flex its muscles. When 8 threads are used, most benches are faster than 990X and 2600K.


GF perhaps dropped the ball in seeking additional clients, Fabs and processes, but that doesn't take away from what is a great architecture - especially for server. AMD even improved their INT perf and the additional cores put most "heavy" loads out of reach of Magny Cours and in some cases even E7 Xeons. We'll see how it shakes out over the coming months with them pushing hard for Trinity and perhaps a B3 rev.

They have to push and pay or whatever because only IBM has enough SOI fab space for AMDs CPU needs.


And they do have a worthwhile path to FMA support in 2012 with OpenCL able to use vector math - with I believe an OpenGL layer. I am of the opinion that a 16 core BD will be as fast as a low end GPU in FMAC ops.

Project Win should be getting that support in time for Haswell - which should get FMA3 support - AMD will have FMA3 in PD in addition to FMA4. Their AES numbers are in line and XOP is as fast as AVX.

video-x264-pass2-avx.png


From Tom's

In its own comparisons against an Intel Xeon X5670-based system, the Opteron 6276 scored an 84 percent higher performance in Linpack. In Stream, the Opteron 6276 had 73 percent more memory bandwidth over the Xeon X5670.

And from AMD's launch material.

mem-bandwidth-four-socket-serv.jpg
 

Chad Boga

Distinguished
Dec 30, 2009
1,095
0
19,290
Why Moar Cores over IPC, isn't without its downsides.


New SQL Server 2012 per core licensing – Thank you Microsoft

http://sqlblog.com/blogs/joe_chang/archive/2011/11/16/new-sql-server-2012-per-core-licensing-thank-you-microsoft.aspx

Many of us have probably seen the new SQL Server 2012 per core licensing, with Enterprise Edition at $6,874 per core super ceding the $27,495 per socket of SQL Server 2008 R2 (discounted to $19,188 for 4-way and $23,370 for 2-way in TPC benchmark reports) with Software Assurance at $6,874 per processor? Datacenter was $57,498 per processor, so the new per-core licensing puts 2012 EE on par with 2008R2 DC, at 8-cores per socket.

This is a significant increase for EE licensing on Intel Xeon 5600 6-core systems (6x$6,874 = $41,244 per socket) and a huge increase for Xeon E7 10-cores systems, now $68,740 per socket. I do not intend to discuss justification of the new model. I will say that SQL Server licensing had gotten out of balance with the growing performance capability of server systems over time. So perhaps the more correct perspective is that SQL Server had become underpriced in recent years. (Consider that there was a 30%+ increase in the hardware cost structure in the transition from Core 2 architectures systems to Nehalem systems for both 2-way and 4-way to accommodate the vastly increased memory and IO channels.)

Previously, I had discussed that the default choice for SQL Server used to be a 4-way system. In the really old days, server sizing and capacity planning was an important job category. From 1995/6 on, the better strategy for most people was to buy the 4-way Intel standard high-volume platform rather than risk the temperamental nature of big-iron NUMA systems (and even worse, the consultant to get SQL Server to run correctly by steering the execution plan around operations that were broken on NUMA). With the compute, memory and IO capabilities of Intel Xeon 5500 (Nehalem-EP), the 2-way became the better default system choice from mid-2009 on.

By “default choice”, I mean in the absence of detailed technical sizing analysis. I am not suggesting that ignorance is good policy (in addition to bliss), but rather the cost of knowledge was typically more than the value of said knowledge. Recall that in the past, there were companies that made load testing tools. I think they are mostly gone now. An unrestricted license for the load test product might be $100K. The effort to build scripts might equal or exceed that. All to find out whether a $25K or $50K server is the correct choice?

So now there will also be a huge incentive on software licensing to step down from a 4-way 10-core system with 40 cores total to a 2-way system with perhaps 8-12 cores total (going forward, this cost structure essentially kills the new AMD Bulldozer 16-core processor, which had just recently achieved price performance competitiveness with the Intel 6-core Westmere-EP in 2-way systems).

In the world of database performance consulting, for several years I had been advocating a careful balance between performance tuning effort (billed at consultant rates) with hardware. The price difference between a fully configured 2-way and 4-way system might be $25,000. For a two-node cluster, this is $50K difference in hardware, with perhaps another $50K in SQL Server licensing cost, with consideration that blindly stepping up to bigger hardware does not necessarily improve the critical aspect of performance proportionately, sometimes not at all, and may even have negative impact.

With performance tuning, it is frequently possible to achieve significant performance gains in the first few weeks. But after that, additional gains become either progressively smaller, limited in scope, or involve major re-architecture. In the long ago past, when hardware was so very expensive, not mention the hard upper limits on performance, it was not uncommon for a consultant to get a long term contract to do performance work exclusively.

More recently, performance consulting work tended to be shorter-term. Just clean up the long hanging fruit, and crush moderate inefficiencies with cheap powerful hardware. While this is perfectly viable work, it also precludes the justification for the deep skills necessary to resolve complex problems, which also calls into question the need to endure an intolerably arrogant, exorbitantly expensive consultant.

It had gotten to the point that I had given thought to retiring, and go fishing in some remote corner of the world. But now with the new SQL Server per core licensing, Microsoft has restored the indispensable (though still intolerable) status to arrogant, exorbitantly expensive, performance consultant. So, thank you Microsoft.
 

earl45

Distinguished
Nov 10, 2009
434
0
18,780



Nice Read Chad!
 

Chad Boga

Distinguished
Dec 30, 2009
1,095
0
19,290
With AMD's understandable reluctance to post non-rate Spec scores, Intel were kind enough to submit them on their behalf.

If AMD think they can get better or "more reflective" scores, they are of course free to submit their own Spec submissions, but I doubt they will.


http://www.spec.org/cpu2006/results/res2011q4/

SPECint_base2006/SPECfp_base2006 (autoparallel=yes)

i7-2700k (3.5/3.9 GHz) 45.5 / 56.1
FX-8150 (3.6/4.2 GHz) 20.8 / 25.7
X6-1100T (3.3/3.7 GHz) 25.0 / 32.2
 
We had an earlier post in another thread about socket compatability so I found the answer here:

http://www.insidehw.com/reviews/cpu/6666-amd-bulldozer-radical-changes?start=3

An interesting fact is that on server platforms, Bulldozer remains fully compatible with C32 and G34 CPU sockets. Valencia and Interlagos (codenames for Bulldozer-based server CPUs) can perform flawlessly on existing San Marino (C32) and Maranello (G34) platforms. The most important difference between these is that the G34 socket supports CPUs in MCM packaging, such as Interlagos, which contains two monolithic octa-core CPUs! A server platform such as this can take up to 64 cores on a single motherboard. In terms of percentages, that’s 60% more than anything Intel can offer with their Xeon Nehalem EX CPUs (series E7000). Furthermore, the G34 platform supports up to four memory channels per physical socket, i.e. 16 memory channels on four sockets.

 

BaronMatrix

Splendid
Dec 14, 2005
6,655
0
25,790




AMD publishes SPEC for Rate and Single. For server.
 

BaronMatrix

Splendid
Dec 14, 2005
6,655
0
25,790



It seems like you were just looking for a problem for AMD. This licensing model is only for the Enterprise Edition where cost is almost NEVER the qualifier. Enterprise picks up for DataCenter, which means TBs or space which means, "whatever it takes." Also, the companies who have EE already don't have to increase their costs. They just upgrade.

Here si the full PDF


It doesn't hurt existing customers and it does sell in 2 core packs, so 8 licenses are four $6000 packs - or around $25K but ONLY with the EE edition for NEW purchases. BI and Standard still use server + CAL.