Octo-Core Xeon (Nehalem-EX, Beckton) Info?

Mephistopheles

Distinguished
Feb 10, 2003
2,444
0
19,780
I'm currently gathering info on the new Xeon "Nehalem-EX", aka Beckton. I'm a physicist and hardware enthusiast, and I have acted as kind of an "IT Consultant" on many occasions. I know that a few professors in our institute are very interested in really buying these 32-CPU or 64-CPU systems... Which is why I started doing research about it. Any further info you guys have would be greatly appreciated!

Currently, we know that the next Xeon MPs will be eight- (or octo-) core, featuring 2.3 billion transitors, 24MB cache, and Hyperthreading, allowing each physical core to process two threads at once. Anandtech has talked about Nehalem-EX here, and the fact that there is a fast, serial point-to-point link between the processors will make a huge difference. For instance, the 4-socket platform will be:

4socket.jpg


Each processor is connected to all other processors with a dedicated link in this configuration. But more importantly, the QPI links will make 8-socket systems possible without 3rd-party chipsets, as shown in:

8socket.jpg


Alternatively, you can imagine this configuration more easily in terms of CPU connections if you think of it as a cube:

nehalemexideal.jpg


Now, in principle, Hypertransport also makes this possible in Opterons, but not many consumer-level implementations exist. Tyan has implemented an 8-socket system, but its CPU connection topology was less than ideal (check this, it even supports six-cores for an 8S6C = 48CPU system! it's the Thunder n4250QE S4985-SI). The usual suspects, like HP and Sun, might have 8-socket systems available, but I wanted to know if there was a way to build an 8S8C system at the consumer level, customizing it.

Tyan's opteron solution, while not ideal, is very interesting. It isn't really too friendly, though, because the way I see it, you simply have to buy their chassis too because you have to mechanically support the CPU expansion board. This kind of makes me wonder why no single manufacturer has tried to run Hypertransport/QPI links through EMI-shielded cables (like SGI's numalink cables): ideally, you could simply build two 4S systems using specially-designed mobos, put them side-by-side and connect the two motherboards using <30cm (1 feet) cables.

I wonder if there will be 8S motherboard combos available? I prefer doing custom builds, and the big system integrators don't really have the best prices, in my experience! I'd love to know what kind of products Supermicro and Tyan will come up with... What about launch dates? Last I heard was end of 2009, but real availability only in 2010...
 

Raviolissimo

Distinguished
Apr 29, 2006
357
0
18,780


if history is any indication, no there won't be any 8 socket systems at the consumer level where we can just a motherboard at an online retailer.

but technology is always changing, who knows.

i've been using PC's since 1984 & using them for finite element analysis & CAD since 1988, to explain my background.
 

Mephistopheles

Distinguished
Feb 10, 2003
2,444
0
19,780
Raviolissimo: Well, at least a solution currently exists for 8S6C Opterons, so it wouldn't be unprecedented. But I do understand what you mean... Hopefully, things will change.

BTW, I just found another old, very old, 8-socket "maybe-consumer-space" option from Iwill:
http://www.amdboard.com/iwill_h8502_8way.html

In any case, in the next few months we should see more information coming from some server motherboard manufacturers about the 4S systems... Personally, I'm keeping my fingers crossed! :D
 
Today (July 13) AMD announced a 40w Istanbul HE (2GHz?) but I believe it's just for 2P boards. For 8P they announced a 75w 2.6GHz and an SE Istanbul at 2.8 and 105w.

A year or so ago you could find the S4985 with riser & 4P M4985 board for around $1,200 if you looked hard enough. Shanghai rolled out, prices jumped 40% and they started getting scarce. If you can find the S4985 'package' I'll betcha the price will have doubled from a year ago when the Istanbul BIOS pops for legacy boards.

With Fiorano (Is it still called that? :lol: ) coming soon no one wants to be stuck with an inventory of Socket F mobos with NF chipsets.
 

Mephistopheles

Distinguished
Feb 10, 2003
2,444
0
19,780
wisecracker: Currently, the option I would take if I were to assemble a many-core heavy hitter is exactly that 8P Tyan combo. But I'm waiting on the Nehalem-EX launch, as it will be well out of reach of the istanbul rigs performance-wise.

In any case, a few details are missing from that S4985+M4985 combo: does the CPU expansion board screw somewhere inside Tyan's typical chassis? I'm guessing yes, and it is therefore completely impossible to go with any other chassis.... The manual only explains how to attach the expansion board to the motherboard... not much else there. I also strongly suspect that you need to use a specific heatsink, or at the very least use only low-profile heatsinks.

Edit: Just found that Tyan has 5 different pics of the 8P barebone on their website. Kind of scary, take a look here.

That's why I said I was kind of hoping that, with it becoming easier to go 8P with Xeons, more solutions would present themselves for the Intel cpus too. I guess I'm still daydreaming about the day when you can just kind of "SLI" a couple of 4P mobos and get an 8P system....

I really like it that AMD is pressing the 4S niche (where they still have a solid lead... for now) by launching the six-cores so early. This can be greatly beneficial to the consumers. Hopefully this will make some pressure on Intel. It would really be great if Nehalem-EX was launched soon.
 
I wouldn't doubt it that a 8 core Xeon multi CPU system would perform like crazy.

And its interesting to see what QPI can do other than speed info between CPUs and CPU-Memory.

Intel is really going to change the server game up. In fact they are going to give AMD a big run for their money for once past the 2P server market.

I hope AMD is buckled up and bracing for impact.
 

Mephistopheles

Distinguished
Feb 10, 2003
2,444
0
19,780
AMD bracing for impact = AMD releasing six-core CPUs early, I think.

In any case, these 4P-8P monster boxes should be equipped with good storage systems, because otherwise, the hard drives will bottleneck performance considerably, depending on application. I mean, with 32 CPUs you could have many simultaneous applications/users accessing the storage, which could be a problem if you had only one or two hard drives.

Ideally, you could add a few SSDs to this system too (maybe Intel's new 34nm drives). Would make for one hell of a system. This kind of shows the momentum Intel has gathered recently... There are quite a few products coming from them recently.
 

Helloworld_98

Distinguished
Feb 9, 2009
3,371
0
20,790
I'm surprised you haven't heard of the 12 core's from AMD yet, they use the new G34 socket though so you won't get much info on the motherboards and CPU's until they come out in 2010/11. More likely to be 2010 though since they're just two 32nm hexa-cores on one CPU.
 

Mephistopheles

Distinguished
Feb 10, 2003
2,444
0
19,780
Actually, I know of Magny-Cours, yes. I find the "long" socket very funny to look at. Take a look at the pictures from these links:

http://gigglehd.com/zbxe/hdnews/2251595
http://www.dvhardware.net/article35861.html

The G34 socket will feature 4 HT links (not 3), like the 4 QPI links from Nehalem-EX, and both will have quad-channel DDR3-controllers. But I'm thinking that the MCM-based Magny-Cours will have a somewhat hard time competing with the monolithic Nehalem-EX, unless AMD improves the fundamental building block: each of the cores.

In a sense, I'm thinking that Magny-Cours vs Beckton might be kind of like Nehalem-EP vs Istanbul. In many (but not all) situations, the Nehalem-EP would actually be preferred, even though AMD has +50% cores, but it's not a clear win for anybody. But you have to ask yourself, would you rather have 6 cores with performance A or 4 cores with performance 1.5A (this gives the same throughput, 6x1A = 4x1.5A)? The answer is that it's probably better to have fewer, faster cores, from which even single-threaded applications will benefit.

Add to that that Nehalem-EX will come out considerably sooner than Magny-Cours, and Intel has quite a product in their hands. I'm interested in Nehalem-EX because its launch falls within the usual time of the year where a few professors here buy new number-crunchers.

In any case, once Nehalem-EX gets out, AMD will probably try to push Magny-Cours out the door ASAP.
 

stasdm

Distinguished
Feb 10, 2009
53
0
18,630
There is no sence in so much cores in a single system - the productivity of eavch added core is diminishing in geomentry progression (just becoase the main core has to handle the use of all other). About 64 cores is max. usable in non-server applications, but 2 x 24-32 cores system will usually by more productive.
 

Main core?

What makes you think there is a main core? This would be used for scientific or extreme high end workstation apps, and with each socket having its own memory controller and the 8 socket interconnectivity, there shouldn't be any trouble scaling beautifully.

As for the comment about HD bottlenecks, you'd want to load this kind of setup with RAM - all the computations should be done from RAM to prevent performance loss. IIRC, these CPUs have quad channel RAM, and if you assume 3 DIMMS per channel, that's 12 RAM slots per socket, or 96 total. This allows for 192GB with 2GB DIMMS or 384GB with 4GB DIMMS. If you assume 4 DIMMS per channel, that bumps the numbers up to 256 and 512 GB respectively for 2GB and 4GB DIMMS. That should allow for pretty significant programs to remain almost entirely in RAM.
 

stasdm

Distinguished
Feb 10, 2009
53
0
18,630


There is always the main process that conducts the whole "orchestra". The more cores involed, the more resouse conducting should be done. It is well seen in GPU work - as only one basic core may conduct the whole picture computation, the addition of each new GPU adds only a fraction of the predessesor "force", That is why nVidia does not support more than 3 cards in SLI (the impact of the forth is less than 10% of the first). AMD also does not support more than 4-way CrossFire on the same reason (AMD GPUs have different architecture, so up to 4 makes sence). On the other hand, the use of "external" load conductor (Lucid Hydra) allows to utilise up to 90% of added GPU productivity (that is why Intel is so interested in it).

Same is with most scientific or other heavy applications (if they may be parallel splitted). A main relatively moderate power system may conduct several computing nods on more general level than threads management ang get much better results.

Mind that now there are RDMA drivers for IB and even direct PCIe links (8GB/s (even more with binded 12x (3 x4x) IB) of raw data each way (about 2/3 of that in pure data).

As for HD bottleneck, I would agree that it is not a problem. Even with SAS 3G controllers and modern SSD's it is possible to have over 4GB/s software RAID over hardware RAID IB-attached iSER storage subsystem (s). New 6G controllers allow even higher total throughput (close to 5GB/s). In combination with huge on-board RAM supported the I/O bottleneck is off consideration.

In my mind the best arrangement is to have a 3-way 8-core Nehalem with three 5520 chips main system and several (less connective, but more powerful) 4-way 8-core Nehalem with one-two 5520 nods.


 

Mephistopheles

Distinguished
Feb 10, 2003
2,444
0
19,780
There is always the main process that conducts the whole "orchestra". The more cores involed, the more resouse conducting should be done.
Depends on workload. For our workload, 32 cores would still make sense.

We always have to remember that computing is something that's a means to an end. Depending on what you want to do, 32 cores is not reasonable. There's no way you can look at all workloads in a "one size fits all" perspective, there's no such thing. You can't really compare video card workloads to all existing HPC workloads; they're different.

The guy interested in 32-core systems here at my institute typically works on solid state problems involving lattices, where each point in the lattice has an associated numerical problem that needs to be solved, and each lattice point then interacts with other points, but the thread communication is minimal at best. He can really use 1600% CPU on a 2S Nehalem Xeon system, no connectivity bottlenecks there. I think each of his numerical problems for lattice points, as complicated as they may be, only spits out a few numbers to be used later.

Bottom line, we can't really say "X cores is enough to reach connectivity saturation". What we can say is "X cores is enough to reach connectivity saturation for workload Y"...
 

stasdm

Distinguished
Feb 10, 2009
53
0
18,630


From the n"point of view" of the main process thay are absolutly similar - management of resourses. Same tasks may have longer durations, so, need less management, but really they take quite a shore time each.


This only says you would not need hi-speed data connections between claster nods.

Yes, it is possible to build even 128-way Nehalem computer - see the link below for possible solution - will it be practical?

http://hardwareforall.com/index.html?WinLIKE_Deep=%22var%20j=new%20WinLIKE.window(%27%27,235,100,740,1000,10);j.Nam=%27main%27;j.Ski=%27zero%27;j.Adr=%27pno/ws6.html%27;WinLIKE.addwindow(j,true);%22