Sign in with
Sign up | Sign in
Your question

Interlagos and Valencia Discussion

Last response: in Components
Share
November 14, 2011 9:28:01 AM

A little bit of technical background here courtest of Chris Angelini:

http://www.tomshardware.com/reviews/fx-8150-zambezi-bul...

Big hello to Sylvie B at eetimes ... whose fine stories many of us once enjoyed at "The Inq" ... with a decent dash of humour I might add!!

http://www.eetimes.com/electronics-news/4230565/AMD-s-I...

This is a sticky for you to post and discuss benchmarks and the architecture for the new Interlagos and Valencia Server CPU's released today, based on the Bulldozer modular design.

I have put it under the server subsection here.

Enjoy.

http://www.amd.com/us/aboutamd/newsroom/Pages/newsroom....



:) 
November 14, 2011 3:52:16 PM

bit off topic, but the idea of a 16 core single processor is awesome, no matter how bad it performs i still want one in my computer :) 
m
0
l
November 15, 2011 4:34:47 AM

AMD will also support up to 12 DIMMs per CPU for up to 384GB memory per CPU
m
0
l
November 15, 2011 4:37:17 AM

The new Opterons will have up to four memory channels with up to 1600 MHz memory.
m
0
l
November 16, 2011 3:31:52 PM

IIRC Silvie was pretty hot, although I don't see her pic at EETimes..

Anyway, the article states "AMD predicts the updated Opteron 6276 will have 84 percent higher performance than rival Intel’s Xeon processor Model X5670, while the new line of processors will also purportedly deliver increased scalability for Virtualization with up to 73 percent more memory bandwidth and half the power per core than Intel's lowest power per core server processor, the L5630, at just 4.375W per core."

However according to Johan de Gelas' Interlagos review at Anandtech http://www.anandtech.com/show/5058/amds-opteron-interla...:

Quote:
The specifications (16 threads, 32MB of cache) and AMD's promises that Interlagos would outperform Magny-cours by a large margin created the impression that the Interlagos Opteron would give the current top Xeons a hard time. However, the newest Opteron cannot reach higher clock speeds than the current Opteron (6276 at 2.3GHz), and AMD positions the Opteron 6276 2.3GHz as an alternative to the Xeon E5649 at 2.53GHz. As the latter has a lower TDP, it is clear that the newest Opteron has to outperform this Xeon by a decent margin. In fact most server buyers expect a price/performance bonus from AMD, so the Opteron 6276 needs to perform roughly at the level of the X5650 to gain the interest of IT customers.

Judging from the current positioning, the high-end is a lost cause for now. First, AMD needs a 140W TDP chip to compete with the slower parts of Intel's high-end armada. Second, Sandy Bridge EP is coming out in the next quarter--we've already seen the desktop Sandy Bridge-E launch, and adding two more cores (four more threads) for the server version will only increase the performance potential. The Sandy Bridge cores have proven to be faster than Westmere cores, and the new Xeon E5 will have eight of them. Clock speeds will be a bit lower (2.0-2.5GHz), but we can safely assume that the new Xeon E5 will outperform its older brother by a noticeable margin and make it even harder for the new Opteron to compete in the higher end of the 2P market.


In the benchmarks, the 2-socket Xeon X5670 server (12 cores/24 threads at 2.93Ghz beat the 2-socket Opteron 6276 (32 cores/32 threads at 2.3GHz) in just about every benchmark, sometimes as much as 3X better as in the MySQL response bench, and with signficantly lower power consumption to boot.
m
0
l
November 16, 2011 7:10:08 PM

Actually, Interlagos supports 16DIMMS at 1.25V (a first) for 512MB per 4 socket. Dell just released the new C6145 Which has dual 4P in a 2U config with 1TB RAM.

Dell, HP and Acer show up in Munich


I posted some SPEC scores over in CPUs and recreate them here.

SPECINT Rate
CPU: E7- 8830

Threads: 64
Cores: 32
Sockets: 4
Speed: 2133MHz

Base: 737
Peak 783

CPU: Opteron 6276
Threads: 64
Cores: 64
Sockets: 4
Speed 2300MHz

Base:835
Peak: 959


This is a comparison of MC and Interlagos. It shows that more sockets do help with the same core count. The MC has 4X the sockets and gets about 15% more perf. I'm waiting to see a 12 core comparison with the same amount of sockets.

CPU: Opteron 6276

Threads: 32
Cores: 32
Sockets: 1

Speed: 2300MHz

base: 419
Peak: 480


CPU: Opteron 6134

Threads:32
Cores: 32
Sockets: 4

Speed:2300MHz

Base:485
Peak:545


Anand has a VM type compare but as is usual, they find tweaks after they declare results and gloss over the true advantages, such as the Opteron server costing 50% less or the fact that Intel gets the majority of optimization while AMD is behind the curve there.

At any rate, if there is a B3 server rev coming it should even improve upon these numbers.
m
0
l
November 18, 2011 11:31:52 AM

According to the Interlagos thread over on AMDZone, looks like the 6276 doesn't fare so well in SAP either, so it looks like those with SQL database, VM virtualization and SAP uses - a large part of the market - won't be too impressed. As Baron mentioned, it's cheap however, at least for the hardware. I'd like to see some TCO comparisons though.

m
0
l
November 20, 2011 1:00:53 AM

I'd like to see the clockspeed raised up a bit.
m
0
l
November 20, 2011 5:06:40 AM

fazers_on_stun said:
According to the Interlagos thread over on AMDZone, looks like the 6276 doesn't fare so well in SAP either, so it looks like those with SQL database, VM virtualization and SAP uses - a large part of the market - won't be too impressed. As Baron mentioned, it's cheap however, at least for the hardware. I'd like to see some TCO comparisons though.



And as I said also, these SAP\Oracle\MySQL, etc. benchmarks need people who really know how to set them up for Enterprise use. Anand was given tweaks by someone and also admitted that BD is ahead of the software. That's why I think Project Win should be software optimizations on all platforms. Fusion is doign well, but FMAC shoudl work for INT (XOP).

We'll definitely see enhancements as we go. It'll be interesting to see what C2 would bring (as the launch Rev of Deneb). They usually drop a new rev every quarter so if B3 launches for Q1, then there could be a C3 before PileDriver. We can already assume power tweaks and more than likely some minor changes to microcode and maybe the branch predictor. They coudl also use some work on the L1\L2 bandwidth as latencies aren't too bad.
m
0
l
November 20, 2011 8:54:07 AM

I thought project win was all about leveraging higher profits by reducing the size of the salried workforce, and relying more on hiring for short term project based staffing requirements?

Thats the way I read it?

Correct me if I am wrong.

m
0
l
November 20, 2011 3:47:49 PM

Reynod said:
I thought project win was all about leveraging higher profits by reducing the size of the salried workforce, and relying more on hiring for short term project based staffing requirements?

Thats the way I read it?

Correct me if I am wrong.


IIRC that's what S/A and some other sites mentioned - project Win is merely internally justifying the layoffs to the remaining AMD employees (i.e., cheerleading & propaganda).

IOW, it's not "Win" as in Windows 8, but akin to "Win" as opposed to "Lose (some people)".
m
0
l
November 20, 2011 9:34:04 PM

Yes ... its what they once called "right sizing" instead of "down sizing".

Amounts to the same thing.

m
0
l
November 22, 2011 10:24:04 AM

From Ars Technica, a discussion and analysis of various server benchmarks.

Quote:
AMD's Bulldozer server benchmarks are here, and they're a catastrophe.
.
.
Some commentators have even suggested that Bulldozer was, first and foremost, a server processor; relatively weak desktop performance was to be expected, but it would all come good in the server room.

Unfortunately for AMD, it looks as though the decisions that hurt Bulldozer on the desktop continue to hurt it in the server room. Although the server benchmarks don't show the same regressions as were found on the desktop, they do little to justify the design of the new architecture.
.
.
After the poor desktop performance, the possibility still existed that the Bulldozer architecture would start to make sense once we could see the server performance. Now the benchmarks have arrived, AMD's perseverance with Bulldozer is bordering on the incomprehensible. There's just no upside to the decisions AMD has made. All of which raises a question: why did AMD go this route? The company must have known about the weak single-threaded performance and the detrimental effect this would have in real-world applications long before the product actually shipped, so why stick with it? Perhaps AMD's anticipation of high clock speeds caused the company to stick with the design, and there's still a possibility that it might one day attain those clock speeds—but we've seen AMD's arch-competitor, Intel, make a similar gamble with the Pentium 4, and for Intel, it never really paid off.

AMD is boasting that Opteron 6200 is the "first and only" 16-core x86 processor on the market. Not only is this not really true (equating threads and cores is playing fast and loose with the truth), it just doesn't matter. In its effort to add all those "cores," performance has been severely compromised. AMD faces an uphill struggle just to compete with its own old chips—let alone with Intel.
m
0
l
November 22, 2011 9:29:54 PM


Thanks Chad.

That was a good read.
m
0
l
November 25, 2011 12:52:26 PM

Well that's definitely not good for AMD, oh wait when Windows server 8 come out the performance will be better.
m
0
l
November 26, 2011 3:45:41 PM

earl45 said:
Well that's definitely not good for AMD, oh wait when Windows server 8 come out the performance will be better.



I read that review and could have been a little more impartial. Reporting doesn't include some of that wording as I mentioned there. Anand's own testing showed that Server 2008 recognized Interlagos and lowered it's idle power to the same level as 6C12T chips. I believe in the SPEC suite simply because I see too much variation in testing methods and workloads - especially for server. Anand previously did INSTR mix analyses and actual comparative eview but now it's throw the AMD system in hope it works - or at least that's how it appears.


To continue from your salary post:



I guess you'll enjoy searching through all of the forums for proof of this (1%). At most I would say I underestimated how much optimization would help or lack of same would hurt. It's a bold new arch that has legs. AMD will surely increase efficiency as they can pay for faster process optimization (where they used to have to pay for the whole Fab).
There do seem to be issues with certain aspects of GF's 32nm process (I'd say teething pains with GPUs on SOI HKMG) and there are rumors that they won't ramp 28nm until Q212 which is a slight problem, though even improving clockspeed on the current Bobcat APU will keep the low cost 11.6-13" market.

I don't remember mentioning my salary as anything other than proof that I must know something.

You can all drive a person to it - and could be purposeful in your machinations (insert maniacal laugh).

At any rate, even the ability (EASY) ability to set thread affinity quickly before a game, you gain up to 30% from what I've seen. As new revs are produced, usually every quarter or so, power will be handled and clocks can go up. Everyone is always keen to talk about Deneb\Thuban which both released at C2 not the B2 of FX\Interlagos. We coudl say that it was released too soon or released at a point where the "8\16 core part" can flex its muscles. When 8 threads are used, most benches are faster than 990X and 2600K.


GF perhaps dropped the ball in seeking additional clients, Fabs and processes, but that doesn't take away from what is a great architecture - especially for server. AMD even improved their INT perf and the additional cores put most "heavy" loads out of reach of Magny Cours and in some cases even E7 Xeons. We'll see how it shakes out over the coming months with them pushing hard for Trinity and perhaps a B3 rev.

They have to push and pay or whatever because only IBM has enough SOI fab space for AMDs CPU needs.


And they do have a worthwhile path to FMA support in 2012 with OpenCL able to use vector math - with I believe an OpenGL layer. I am of the opinion that a 16 core BD will be as fast as a low end GPU in FMAC ops.

Project Win should be getting that support in time for Haswell - which should get FMA3 support - AMD will have FMA3 in PD in addition to FMA4. Their AES numbers are in line and XOP is as fast as AVX.



From Tom's

Quote:
In its own comparisons against an Intel Xeon X5670-based system, the Opteron 6276 scored an 84 percent higher performance in Linpack. In Stream, the Opteron 6276 had 73 percent more memory bandwidth over the Xeon X5670.


And from AMD's launch material.

m
0
l
November 30, 2011 2:41:28 PM

I always loved conflicting bechies! :D 
m
0
l
December 3, 2011 10:26:05 PM

Why Moar Cores over IPC, isn't without its downsides.


New SQL Server 2012 per core licensing – Thank you Microsoft

http://sqlblog.com/blogs/joe_chang/archive/2011/11/16/n...

Many of us have probably seen the new SQL Server 2012 per core licensing, with Enterprise Edition at $6,874 per core super ceding the $27,495 per socket of SQL Server 2008 R2 (discounted to $19,188 for 4-way and $23,370 for 2-way in TPC benchmark reports) with Software Assurance at $6,874 per processor? Datacenter was $57,498 per processor, so the new per-core licensing puts 2012 EE on par with 2008R2 DC, at 8-cores per socket.

This is a significant increase for EE licensing on Intel Xeon 5600 6-core systems (6x$6,874 = $41,244 per socket) and a huge increase for Xeon E7 10-cores systems, now $68,740 per socket. I do not intend to discuss justification of the new model. I will say that SQL Server licensing had gotten out of balance with the growing performance capability of server systems over time. So perhaps the more correct perspective is that SQL Server had become underpriced in recent years. (Consider that there was a 30%+ increase in the hardware cost structure in the transition from Core 2 architectures systems to Nehalem systems for both 2-way and 4-way to accommodate the vastly increased memory and IO channels.)

Previously, I had discussed that the default choice for SQL Server used to be a 4-way system. In the really old days, server sizing and capacity planning was an important job category. From 1995/6 on, the better strategy for most people was to buy the 4-way Intel standard high-volume platform rather than risk the temperamental nature of big-iron NUMA systems (and even worse, the consultant to get SQL Server to run correctly by steering the execution plan around operations that were broken on NUMA). With the compute, memory and IO capabilities of Intel Xeon 5500 (Nehalem-EP), the 2-way became the better default system choice from mid-2009 on.

By “default choice”, I mean in the absence of detailed technical sizing analysis. I am not suggesting that ignorance is good policy (in addition to bliss), but rather the cost of knowledge was typically more than the value of said knowledge. Recall that in the past, there were companies that made load testing tools. I think they are mostly gone now. An unrestricted license for the load test product might be $100K. The effort to build scripts might equal or exceed that. All to find out whether a $25K or $50K server is the correct choice?

So now there will also be a huge incentive on software licensing to step down from a 4-way 10-core system with 40 cores total to a 2-way system with perhaps 8-12 cores total (going forward, this cost structure essentially kills the new AMD Bulldozer 16-core processor, which had just recently achieved price performance competitiveness with the Intel 6-core Westmere-EP in 2-way systems).

In the world of database performance consulting, for several years I had been advocating a careful balance between performance tuning effort (billed at consultant rates) with hardware. The price difference between a fully configured 2-way and 4-way system might be $25,000. For a two-node cluster, this is $50K difference in hardware, with perhaps another $50K in SQL Server licensing cost, with consideration that blindly stepping up to bigger hardware does not necessarily improve the critical aspect of performance proportionately, sometimes not at all, and may even have negative impact.

With performance tuning, it is frequently possible to achieve significant performance gains in the first few weeks. But after that, additional gains become either progressively smaller, limited in scope, or involve major re-architecture. In the long ago past, when hardware was so very expensive, not mention the hard upper limits on performance, it was not uncommon for a consultant to get a long term contract to do performance work exclusively.

More recently, performance consulting work tended to be shorter-term. Just clean up the long hanging fruit, and crush moderate inefficiencies with cheap powerful hardware. While this is perfectly viable work, it also precludes the justification for the deep skills necessary to resolve complex problems, which also calls into question the need to endure an intolerably arrogant, exorbitantly expensive consultant.

It had gotten to the point that I had given thought to retiring, and go fishing in some remote corner of the world. But now with the new SQL Server per core licensing, Microsoft has restored the indispensable (though still intolerable) status to arrogant, exorbitantly expensive, performance consultant. So, thank you Microsoft.
m
0
l
December 4, 2011 11:11:27 AM

Wow ... Microsoft just finished AMD's server strategy off in one hit.

Can't see them gaining any more traction in the server world when the OS costs are stacked like that.

Pity JF-AMD isn't around to comment.

More bad news for AMD.
m
0
l
December 4, 2011 12:43:26 PM

Chad Boga said:
Why Moar Cores over IPC, isn't without its downsides.


New SQL Server 2012 per core licensing – Thank you Microsoft

http://sqlblog.com/blogs/joe_chang/archive/2011/11/16/n...

Many of us have probably seen the new SQL Server 2012 per core licensing, with Enterprise Edition at $6,874 per core super ceding the $27,495 per socket of SQL Server 2008 R2 (discounted to $19,188 for 4-way and $23,370 for 2-way in TPC benchmark reports) with Software Assurance at $6,874 per processor? Datacenter was $57,498 per processor, so the new per-core licensing puts 2012 EE on par with 2008R2 DC, at 8-cores per socket.

This is a significant increase for EE licensing on Intel Xeon 5600 6-core systems (6x$6,874 = $41,244 per socket) and a huge increase for Xeon E7 10-cores systems, now $68,740 per socket. I do not intend to discuss justification of the new model. I will say that SQL Server licensing had gotten out of balance with the growing performance capability of server systems over time. So perhaps the more correct perspective is that SQL Server had become underpriced in recent years. (Consider that there was a 30%+ increase in the hardware cost structure in the transition from Core 2 architectures systems to Nehalem systems for both 2-way and 4-way to accommodate the vastly increased memory and IO channels.)


Thanks for the knowledge you just dropped on everyone, very enlighting, and again Thanks.
The only question i have for you, Where is JDJohn and BaronMatrix to explain how great this is for AMD. lol

Previously, I had discussed that the default choice for SQL Server used to be a 4-way system. In the really old days, server sizing and capacity planning was an important job category. From 1995/6 on, the better strategy for most people was to buy the 4-way Intel standard high-volume platform rather than risk the temperamental nature of big-iron NUMA systems (and even worse, the consultant to get SQL Server to run correctly by steering the execution plan around operations that were broken on NUMA). With the compute, memory and IO capabilities of Intel Xeon 5500 (Nehalem-EP), the 2-way became the better default system choice from mid-2009 on.

By “default choice”, I mean in the absence of detailed technical sizing analysis. I am not suggesting that ignorance is good policy (in addition to bliss), but rather the cost of knowledge was typically more than the value of said knowledge. Recall that in the past, there were companies that made load testing tools. I think they are mostly gone now. An unrestricted license for the load test product might be $100K. The effort to build scripts might equal or exceed that. All to find out whether a $25K or $50K server is the correct choice?

So now there will also be a huge incentive on software licensing to step down from a 4-way 10-core system with 40 cores total to a 2-way system with perhaps 8-12 cores total (going forward, this cost structure essentially kills the new AMD Bulldozer 16-core processor, which had just recently achieved price performance competitiveness with the Intel 6-core Westmere-EP in 2-way systems).

In the world of database performance consulting, for several years I had been advocating a careful balance between performance tuning effort (billed at consultant rates) with hardware. The price difference between a fully configured 2-way and 4-way system might be $25,000. For a two-node cluster, this is $50K difference in hardware, with perhaps another $50K in SQL Server licensing cost, with consideration that blindly stepping up to bigger hardware does not necessarily improve the critical aspect of performance proportionately, sometimes not at all, and may even have negative impact.

With performance tuning, it is frequently possible to achieve significant performance gains in the first few weeks. But after that, additional gains become either progressively smaller, limited in scope, or involve major re-architecture. In the long ago past, when hardware was so very expensive, not mention the hard upper limits on performance, it was not uncommon for a consultant to get a long term contract to do performance work exclusively.

More recently, performance consulting work tended to be shorter-term. Just clean up the long hanging fruit, and crush moderate inefficiencies with cheap powerful hardware. While this is perfectly viable work, it also precludes the justification for the deep skills necessary to resolve complex problems, which also calls into question the need to endure an intolerably arrogant, exorbitantly expensive consultant.

It had gotten to the point that I had given thought to retiring, and go fishing in some remote corner of the world. But now with the new SQL Server per core licensing, Microsoft has restored the indispensable (though still intolerable) status to arrogant, exorbitantly expensive, performance consultant. So, thank you Microsoft.



Nice Read Chad!
m
0
l
December 7, 2011 8:40:15 PM

With AMD's understandable reluctance to post non-rate Spec scores, Intel were kind enough to submit them on their behalf.

If AMD think they can get better or "more reflective" scores, they are of course free to submit their own Spec submissions, but I doubt they will.


http://www.spec.org/cpu2006/results/res2011q4/

SPECint_base2006/SPECfp_base2006 (autoparallel=yes)

i7-2700k (3.5/3.9 GHz) 45.5 / 56.1
FX-8150 (3.6/4.2 GHz) 20.8 / 25.7
X6-1100T (3.3/3.7 GHz) 25.0 / 32.2
m
0
l
December 17, 2011 5:25:46 AM

We had an earlier post in another thread about socket compatability so I found the answer here:

http://www.insidehw.com/reviews/cpu/6666-amd-bulldozer-...

An interesting fact is that on server platforms, Bulldozer remains fully compatible with C32 and G34 CPU sockets. Valencia and Interlagos (codenames for Bulldozer-based server CPUs) can perform flawlessly on existing San Marino (C32) and Maranello (G34) platforms. The most important difference between these is that the G34 socket supports CPUs in MCM packaging, such as Interlagos, which contains two monolithic octa-core CPUs! A server platform such as this can take up to 64 cores on a single motherboard. In terms of percentages, that’s 60% more than anything Intel can offer with their Xeon Nehalem EX CPUs (series E7000). Furthermore, the G34 platform supports up to four memory channels per physical socket, i.e. 16 memory channels on four sockets.

m
0
l
December 21, 2011 5:29:08 PM

Chad Boga said:
With AMD's understandable reluctance to post non-rate Spec scores, Intel were kind enough to submit them on their behalf.

If AMD think they can get better or "more reflective" scores, they are of course free to submit their own Spec submissions, but I doubt they will.


http://www.spec.org/cpu2006/results/res2011q4/

SPECint_base2006/SPECfp_base2006 (autoparallel=yes)

i7-2700k (3.5/3.9 GHz) 45.5 / 56.1
FX-8150 (3.6/4.2 GHz) 20.8 / 25.7
X6-1100T (3.3/3.7 GHz) 25.0 / 32.2




AMD publishes SPEC for Rate and Single. For server.
m
0
l
December 21, 2011 5:42:09 PM

Chad Boga said:
Why Moar Cores over IPC, isn't without its downsides.


New SQL Server 2012 per core licensing – Thank you Microsoft

http://sqlblog.com/blogs/joe_chang/archive/2011/11/16/n...

Many of us have probably seen the new SQL Server 2012 per core licensing, with Enterprise Edition at $6,874 per core super ceding the $27,495 per socket of SQL Server 2008 R2 (discounted to $19,188 for 4-way and $23,370 for 2-way in TPC benchmark reports) with Software Assurance at $6,874 per processor? Datacenter was $57,498 per processor, so the new per-core licensing puts 2012 EE on par with 2008R2 DC, at 8-cores per socket.

This is a significant increase for EE licensing on Intel Xeon 5600 6-core systems (6x$6,874 = $41,244 per socket) and a huge increase for Xeon E7 10-cores systems, now $68,740 per socket. I do not intend to discuss justification of the new model. I will say that SQL Server licensing had gotten out of balance with the growing performance capability of server systems over time. So perhaps the more correct perspective is that SQL Server had become underpriced in recent years. (Consider that there was a 30%+ increase in the hardware cost structure in the transition from Core 2 architectures systems to Nehalem systems for both 2-way and 4-way to accommodate the vastly increased memory and IO channels.)

Previously, I had discussed that the default choice for SQL Server used to be a 4-way system. In the really old days, server sizing and capacity planning was an important job category. From 1995/6 on, the better strategy for most people was to buy the 4-way Intel standard high-volume platform rather than risk the temperamental nature of big-iron NUMA systems (and even worse, the consultant to get SQL Server to run correctly by steering the execution plan around operations that were broken on NUMA). With the compute, memory and IO capabilities of Intel Xeon 5500 (Nehalem-EP), the 2-way became the better default system choice from mid-2009 on.

By “default choice”, I mean in the absence of detailed technical sizing analysis. I am not suggesting that ignorance is good policy (in addition to bliss), but rather the cost of knowledge was typically more than the value of said knowledge. Recall that in the past, there were companies that made load testing tools. I think they are mostly gone now. An unrestricted license for the load test product might be $100K. The effort to build scripts might equal or exceed that. All to find out whether a $25K or $50K server is the correct choice?

So now there will also be a huge incentive on software licensing to step down from a 4-way 10-core system with 40 cores total to a 2-way system with perhaps 8-12 cores total (going forward, this cost structure essentially kills the new AMD Bulldozer 16-core processor, which had just recently achieved price performance competitiveness with the Intel 6-core Westmere-EP in 2-way systems).

In the world of database performance consulting, for several years I had been advocating a careful balance between performance tuning effort (billed at consultant rates) with hardware. The price difference between a fully configured 2-way and 4-way system might be $25,000. For a two-node cluster, this is $50K difference in hardware, with perhaps another $50K in SQL Server licensing cost, with consideration that blindly stepping up to bigger hardware does not necessarily improve the critical aspect of performance proportionately, sometimes not at all, and may even have negative impact.

With performance tuning, it is frequently possible to achieve significant performance gains in the first few weeks. But after that, additional gains become either progressively smaller, limited in scope, or involve major re-architecture. In the long ago past, when hardware was so very expensive, not mention the hard upper limits on performance, it was not uncommon for a consultant to get a long term contract to do performance work exclusively.

More recently, performance consulting work tended to be shorter-term. Just clean up the long hanging fruit, and crush moderate inefficiencies with cheap powerful hardware. While this is perfectly viable work, it also precludes the justification for the deep skills necessary to resolve complex problems, which also calls into question the need to endure an intolerably arrogant, exorbitantly expensive consultant.

It had gotten to the point that I had given thought to retiring, and go fishing in some remote corner of the world. But now with the new SQL Server per core licensing, Microsoft has restored the indispensable (though still intolerable) status to arrogant, exorbitantly expensive, performance consultant. So, thank you Microsoft.



It seems like you were just looking for a problem for AMD. This licensing model is only for the Enterprise Edition where cost is almost NEVER the qualifier. Enterprise picks up for DataCenter, which means TBs or space which means, "whatever it takes." Also, the companies who have EE already don't have to increase their costs. They just upgrade.

Here si the full PDF


It doesn't hurt existing customers and it does sell in 2 core packs, so 8 licenses are four $6000 packs - or around $25K but ONLY with the EE edition for NEW purchases. BI and Standard still use server + CAL.
m
0
l
December 22, 2011 9:25:01 AM

Thanks for clearing that up Baron.
m
0
l
January 11, 2012 12:13:40 AM

Anand bench's favor Intel ... who would of thought it.

Anyhow, I'd have to see what compile options were used and a code analysis first. It's why we do our own benching with in house tools, it's entirely too easy to *tweak* a bench to favor one product over another.

We were looking at BD when we were considering going from SPARC to x86, but decided to stay with the T2/T3 architecture for now.

Also stop thinking about server bench's the same as desktop. It's not about a single big number in a single instance of a program. Even a suite that *cough* "simulates" a server environment isn't good enough. You need to actually setup a database server (Oracle in our case), setup the web front end (BEAWLS) with all the components and connectors. Configure a few java application servers and connect them to your primary J2SE instance. Deploy a bunch of webapps to these webservers.

Then you benchmark the whole suite. Some applications favor certain architectures over others, especially if the developers spent some time hand coding optimizations into certain functions.

I've personally seen this make a difference, specifically when we were looking at IBM power vs SUN Sparc awhile back. The IBM box posted higher virtualized transaction numbers, but when we did a suite test, the Sun box got more total transactions. This was due to SUN's CMT architecture handling tread switching better then IBM's, something that would never been seen in a virtualized test but exists in real world.

I'm not a fanboy, but I absolutely hate seeing people pull numbers out of a report, or an article based on a review of a report, and abuse it.
m
0
l
January 11, 2012 11:20:29 AM

In regards to per-core licensing, I will say this: Microsoft could have ridden the money train VERY early on by licensing Windows by the core, like some other products [IE: Anything by National Instruments] did. They didn't, and I give MSFT a LOT of credit for that.

That being said, pure money grab by MSFT on the server side. Can't blame them though.
m
0
l
January 11, 2012 9:22:22 PM

I wish AMD just silently admitted to the OS that the module concept was really just an optimised version of hyperthreading. Saves it a bag of hurt.

What I think AMD will succeed in, is the world of HPC applications dominated by Linux. There we can live with not paying anything for server software, mitigating Intel's per-core advantage.
m
0
l
January 18, 2012 8:55:04 PM

gamerk316 said:
In regards to per-core licensing, I will say this: Microsoft could have ridden the money train VERY early on by licensing Windows by the core, like some other products [IE: Anything by National Instruments] did. They didn't, and I give MSFT a LOT of credit for that.

That being said, pure money grab by MSFT on the server side. Can't blame them though.


I don't think Microsoft could afford to do it. Most of the per-core licensed software (ie. most VMware, Oracle, IBM stuff) also have Linux counterparts that would have probably pushed a lot of companies to really consider Linux to keep overhead down. For a lot of these companies, the software running on top of the OS is more important and they don't have much of a choice of running anything else.

edit: I did hear something about SQL Server 2012 having some type of core based licensing, but not sure.
m
0
l
January 18, 2012 9:00:59 PM

amdfangirl said:
What I think AMD will succeed in, is the world of HPC applications dominated by Linux. There we can live with not paying anything for server software, mitigating Intel's per-core advantage.



Yes, but most of those applications running on top of Linux are per-core licensed, for example, Oracle DB Enterprise is per-core licensed. For people running these types of applications more-cores isn't always better. The reason a lot of these companies run Linux is for performance, overhead cost, and flexibility or specific needs (ie. ZFS).

At the end of the day, to me at least it seems like AMD is going the way of Netburst, except instead of speed they are pushing cores rather than making each core efficient per clock.
m
0
l
January 20, 2012 6:32:09 PM

That is a really low blow from M$FT. Lucky for you Chad ! Good post btw.
m
0
l
February 23, 2012 2:18:36 AM

Reynod said:
We had an earlier post in another thread about socket compatability so I found the answer here:

http://www.insidehw.com/reviews/cpu/6666-amd-bulldozer-...

An interesting fact is that on server platforms, Bulldozer remains fully compatible with C32 and G34 CPU sockets. Valencia and Interlagos (codenames for Bulldozer-based server CPUs) can perform flawlessly on existing San Marino (C32) and Maranello (G34) platforms. The most important difference between these is that the G34 socket supports CPUs in MCM packaging, such as Interlagos, which contains two monolithic octa-core CPUs! A server platform such as this can take up to 64 cores on a single motherboard. In terms of percentages, that’s 60% more than anything Intel can offer with their Xeon Nehalem EX CPUs (series E7000). Furthermore, the G34 platform supports up to four memory channels per physical socket, i.e. 16 memory channels on four sockets.


IIRC the LGA1567 Xeons can be run in up to 8-way operation without any additional "glue" chips, like the previous Opteron 800/8000 series CPUs. That gives you 64 cores and 32 memory channels using Nehalem-EXes and up to 80 cores/32 memory channels with Westmere-EXes. The Opteron 6000s only support up to 4P operation. Probably the most important distinctions are that you can put 64 Opteron cores on a 4P board in a 1U server and buy CPUs + board for $3000-5000, while 8P Xeon setups are 4U+ only due to the CPU daughter cards and each 8-way-capable Xeon MP by itself costs about what four 6272s and a 4P board cost.
m
0
l
March 6, 2012 5:52:52 PM

http://www.anandtech.com/show/5553/the-xeon-e52600-dual...

Quote:
Conclusions

Our conclusion about the Xeon E5-2690 2.9 GHz is short and simple: it is the fastest server CPU you can get in a reasonably priced server and it blows the competition and the previous Xeon generation away. If performance is your first and foremost priority, this is the CPU to get. It consumes a lot of power if you push it to its limits, but make no mistake: this beast sips little energy when running at low and medium loads. The price tag is the only real disadvantage. In many cases this pricetag will be dwarfed by other IT costs. It is simply a top notch processor, no doubt about it.

For those who are more price sensitive, the Xeon E5-2630 costs less than the Opteron 6276 and performs (very likely) better in every real world situation we could test.

And what about the Opteron? Unless the actual Xeon-E5 servers are much more expensive than expected, it looks like it will be hard to recommend the current Opteron 6200. However if Xeon E5 servers end up being quite a bit more expensive than similar Xeon 5600 servers, the Opteron 6200 might still have a chance as a low end virtualization server. After all, quite a few virtualization servers are bottlenecked by memory capacity and not by raw processing power. The Opteron can then leverage the fact that it can offer the same memory capacity at a lower price point.

The Opteron might also have a role in the low end, price sensitive HPC market, where it still performs very well. It won't have much of chance in the high end clustered one as Intel has the faster and more power efficient PCIe interface.


And it looks like Johan added an HPC test to his benchmark suite:

Quote:
This is one of the few benchmarks (besides SAP) where the Opteron 6276 outperforms the older Opteron 6174 by a tangible margin (about 17% faster) and is significantly faster than the Xeon 5600, by 29% to be more precise. However, the direct competitor of the 6276, the Xeon E5-2630, will do a bit better (see the E5-2660 6C score). When you are aiming for the best performance, it is impossible to beat the best Xeons: the Xeon E5-2660 offers 20% better performance, the 2690 is 31% faster. It is interesting to note that LS-Dyna does not scale well with clockspeed: the 32% higher clockspeed of the Xeon E5-2690 results in only a 14% speed increase.

A few other interesting things to note: we saw only a very smal performance increase (+5%) due to Hyperthreading. Memory bandwidth does not seem to be critical either, as performance increased by only 6% when we replaced DDR3-1333 with DDR3-1600. If LS-Dyna was bottlenecked severely by the memory speed we should have seen a performance increase close to 20% (1600 vs 1333).

CMT boosted the Opteron 6276's performance by up to 33%, which seems weird at first since LS-DYNA is a typical floating point intensive application. As the shared floating point "outsources" load and stores to the integer cores, the most logical explanation is that LS-DYNA is limited by the load/store bandwidth. This is in sharp contrast with for example 3DS Max where the additional overhead of 16 extra threads slowed the shared FP down instead of speeding it up.

Also, both CPUs seem to have made good use of their turbo capabilities. The AMD Opteron was running at 2.6 GHz most of the time, the Xeon 2690 at 3.3 GHz and the Xeon 2660 at 2.6 GHz.

The three vehicle collision test does not change the benchmarking picture, it confirms our early findings. The Opteron Interlagos does well, but the Xeon E5 is the new HPC champion.
m
0
l
March 10, 2012 12:45:38 AM



At least quote Tom's own Xeon E5 article. ;) 

Intel has won this round pretty clearly, even if you use the best-case scenario of a lot of benchmarks on Linux using GCC instead of Intel's crippling-anything-that's-not-GenuineIntel compiler on Windows. Anand is a big-time Intel shill and takes no opportunity to prop up Intel or trash AMD in a sensationalist manner. I read some of their articles but it's like watching MSNBC- there is occasionally some good info in there, you just have to dig it out of the pile of spin. However, Tom's, which is much more fair and actually tried to give AMD a fair shot still arrived at the same conclusion. Intel knows it too, since they have actually *raised* prices on CPUs. The top Xeon DP E5 costs over $2000, compared to around $1500 in the past several generations. AMD really does need to get back in the fight as a decent server CPU generally makes a decent workstation CPU and vice-versa. Tom's correctly said that AMD has a decent platform with the AMD 890FX-based SR5690/SP5100 chipset. Their problem is that once you get beyond the severely kneecapped "basic" Intel SKUs, Intel pretty well dominates the market and can charge whatever they want. That's why we are seeing >$2000 DP CPUs again. Bulldozer isn't a bad design on paper and is more of an execution problem than a design problem, or so it appears. Fix the caches and the FPU scheduler in Piledriver and AMD ought to become at least decently competitive again. Lord knows we don't want to go back to the mid-90s again with Intel charging around six grand in today's dollars for their top-bin CPUs. (Those of you old enough remember what the top Pentiums and P2s cost? The top original PII cost over $2000! :o  ) I appreciate AMD's willingness to give customers a fair shake in doing things like not changing sockets like a teenage girl changes her wardrobe, severely crippling chips that aren't considered "high end," or charging outlandish prices for multi-socket capable CPUs, but it becomes pretty darned hard to buy AMD CPUs if they are quite a bit behind Intel's. Here's to Piledriver at least being at least an Istanbul on Linux to Intel's Nehalem instead of a B2 Barcelona on Windows...
m
0
l
March 13, 2012 3:27:53 PM

MU_Engineer said:
At least quote Tom's own Xeon E5 article. ;) 

Intel has won this round pretty clearly, even if you use the best-case scenario of a lot of benchmarks on Linux using GCC instead of Intel's crippling-anything-that's-not-GenuineIntel compiler on Windows. Anand is a big-time Intel shill and takes no opportunity to prop up Intel or trash AMD in a sensationalist manner. I read some of their articles but it's like watching MSNBC- there is occasionally some good info in there, you just have to dig it out of the pile of spin.


IIRC AT was first with their review, which is why I quoted them.

Sorry MU but that just strikes me as yet more AMDZone propaganda. It was Johan de Gelas' article - not Anand Lal Shimpi - and he is pretty respected most places with the exception of AMDZ.. I have read some server threads over there where Johan attempted to explain his testing methodology and got nothing but insults and flames and little substantive or constructive feedback for the most part. Even the supposedly knowledgeable mods there couldn't manage to articulate any real flaws but just spew crap instead. Most of their objections and rhetoric stem from when Johan allegedly over-mentioned the Xeon competition during an Opteron review a few years back IIRC.

Quote:
Bulldozer isn't a bad design on paper and is more of an execution problem than a design problem, or so it appears. Fix the caches and the FPU scheduler in Piledriver and AMD ought to become at least decently competitive again.


Maybe. But then AMD had something like 5 years to work on BD from when it first appeared on their roadmap. The common perception after BD appeared and the benchies were disappointing, was that BD was optimized for server workloads and so Interlagos should really shine. However those Interlagos reviews were also a pretty mixed bag. AMD had to compete on price and not much else. Now that the E5s have a model similarly priced and better performing, I guess AMD will have to drop prices once the HPC and server market has finished upgrading from Magny Cours.

Quote:
Here's to Piledriver at least being at least an Istanbul on Linux to Intel's Nehalem instead of a B2 Barcelona on Windows...


Yes, competition is always good as it drives R&D and thus improvements. However I just wonder how much attention AMD is going to pay to server if their revenues from it remain low and marketshare < 5%. With Read's statements amounting to "let's move on from competing with Intel", looks like AMD is perhaps prioritizing new markets where Intel doesn't compete much.
m
0
l
May 30, 2012 3:05:11 PM

Not to be an advocate to conspiracy theories, but they didn't publish the internal numbers for the AMD bench tool and didn't delve deeper in the "bad" numbers. Where we really wanted to delve deeper, actually, lol.

Not a bad analysis to be honest, but I feel they missed some critical points. Also, they never talk about code compilation to take advantages of the new instructions in BD.

I'm eager for the continuation on their "forth" finding.

Cheers!
m
0
l
July 16, 2012 7:59:48 PM

Reynod said:
A little bit of technical background here courtest of Chris Angelini:
:) 



Awesome.. anyone know some lines of code and any app running that aplies recurrent and parallel computing in a supercomputing environment?

m
0
l
July 18, 2012 10:52:50 AM

This topic has been desticky in top of the forum by REYNOD
m
0
l
!