SuperMicro X10DRG-Q vs Asus Z10PE-D8 WS

JavierVerdugo

Commendable
May 2, 2016
1
0
1,510
Hello,

I'm building a computer for 3D processing, fluid simulation, particle systems (all from CPU) and 3D GPU render. So I am choosing a dual XEON build with 4 GTX 980 TI.

I wonder what mother board would fit better in my build, ASUS Z10PE-D8 WS or SuperMicro X10DRG-Q and why.

If I am not wrong, Z10PE has more USB ports, m2 connectivity, Sata Express, and a total of 512 MB max (I don't think I even need 128GB right now). My impression is that this motherboard gives me more expansion features in the case I need it.

In the other hand SuperMicro has more stability and life time. (and the 2TB RAm is something I will never need)

I don't care the price difference but the functionality is very important for me. I really appreciate the extra connections from asus but I if SuperMicro gives me much more stability I will choose it.

Am I wrong? Thanks in advance for your advices.

Cheers!
 
Solution
@bambiboom, for an HPC task like this the the E5-2667V4 (8 core 3.2Ghz) is the better choice in most cases. It is significantly more expensive, but it is what I would recommend. There are too many poorly written programs that have trouble scaling to 8 threads (let alone 20). Clock speed will almost ALWAYS beat threading in HPC processing. The E5-2643 v4 might even be a better choice. Again, 12 high clocked threads may get more work done than your 20 lower clocked threads. The only way to determine is to benchmark the application and see how wide it will scale. A purchase of some time on an Amazon compute node might be a wise investment to determine the scalability.


JavierVerdugo,

The most important difference between the ASUS Z10PE-D8 WS or SuperMicro X10DRG-Q is the spacing of the x16 PCIe GPU slots The Supermicro has spacing for 4X double height GPU's- which includes an area at the lower end that allows the 4th GPU in that position. This spacing makes the Supermicro board a proprietary format.

By contrast the ASUS is rated to use four GPU's but all the intervening slots will be covered, so there is no slot to add a RAID controller or PCIe SSD. I think actually that the board might only accomodate three GPU's as the fourth would have a conflict with standing capacitors on thel lower end of the slots array.This would be the deciding factor for me a the likely GPU's will be double height..

In Passmark Benchmarks, the ASUS WS seems to produce high CPU scores as does the Intel S2600- which I think is the best built board ever. I have the most trust in Supermicro reliability as makers of workstation and server motherboards as those are their speciality. In summary, ASUS for performance, Supermicro for the slot spacing and reliability.

If you would care to mention the CPU you have in mind and budget, I would be pleased to give some suggestions for a system. Here is a system for scientific use- Matlab and simulation. I am not advising use of a GTX for scientific work due to the lower double precision. This one is based on suing an NVIDIA Maximus configuration with a Quadro K4200 and a pair of (used) Tesla K20 GPU co-processors. Last week, I was visiting a linear accelerator and their computational system comprised 11 dual Xeons each with four K20's in parallel.

BambiBoom CalcuCannon <Matalabacompurendersimulicious iWork TurboSignature Extreme ScienceStuffer 9900 ®©$$™®£™©™ _ 5.3.16

CPU: 2X Intel Xeon Processor E5-2640 v4 (10-core @2.40 / 3.4 GHz, 25M Cache) > $1,840 ($920 each)(Superbiiz)

CPU Coolers: 2X Supermicro SNK-P0048AP4 CPU Heatsink For LGA2011 >$64 ($32 each)

Motherboard: Supermicro X10DRG-Q (4X PCIe x16 GPU slots) > $499 (Superbiiz)(This motherboard has four GPU slots)(Review of this motherboard)

Memory: 128GB (8 X 16GB) Crucial DDR4-2133 16GB/2Gx72 ECC/REG CL15 Server Memory > $800 (Superbiiz)

GPU 1: NVIDIA Quadro K4200NVIDIA Quadro K4200 (4GB) Part No. VCQK4200-PB > used about $550

GPU 2,3,4: TESLA K20 GPU ACCELERATOR > $2,700 (used about $900 each)

Drive 1: Intel 750 Series AIC 400GB PCI-Express 3.0 x4 MLC PCIe Internal Solid State Drive (SSD) SSDPEDMW400G4X1 > $350

Drive 2, 3: 2X Seagate Constellation ES.3 ST3000NM0023 3TB 7200RPM SAS3/SAS 6.0 GB/s 128MB Enterprise Hard Drive (3.5 inch) $370 > ($185 each)

CASE: Supermicro SuperChassis CSE-747TQ-R1620B 1620W 4U Rackmount/Tower Server Chassis (Dark Gray) > $950

Operating System: Microsoft Windows 7 Professional SP1 64-bit English (1-Pack), OEM > $139.

____________________________________

TOTAL = about $7,892

Notes:

1. The 4X GPU configuration is unconventional: The Quadro K4200 is single height but the Tesla K20's are double- height cards. However the GPU slots are double spaced on the Supermicro X10DRG-Q motherboard.

2. The Xeon E5-2640 v4 has not been benchmarked extensively, but performance should be very good. There are two systems using the E5-2640 v4 on Passmark and the CPU score for a single CPU is 15776 and for a dual configuration- 25080. That would place it at No. 8 in the Passmark dual CPU list. A total of 20-cores /40 threads, and 8,832 CUDA cores (1344 + 7488) provide a lot calculation power. The 3.4Ghz turbo speed should have a sufficient single-threaded capability in combination with the Quadro K4200 for quite demanding visualizations, so in addition to Matlab and Mathematica: simulation animation, Arc/GIS, 3D structural design /analysis, particle and thermal simulation, and visualizations of these should be very good.

2. The Quadro K4200 was chosen as the GPU's in a NVIDIA Maximus configuration (Quadro + Tesla) have to have the same series processor- in this case all have to be Kepler series.

[3. Benchmarks of GPU's:

NVIDIA Quadro K4200 _________NVIDIA Tesla K20

OS Windows
API OpenCL

Face Detection

27.889 mPixels/s_________________32.104 mPixels/s

TV-L1 Optical Flow

9.092 mPixels/s_________________ 11.228 mPixels/s

Ocean Surface Simulation

1032.353 Frames/s______________ 1427.912 Frames/s

Particle Simulation - 64k

363.404 mInteraction/s___________ 377.433 mInteraction/s

T-Rex

2.597 Frames/s__________________4.129 Frames/s

Video Composition

38.83 Frames/s _________________ 64.479 Frames/s

Bitcoin Mining

66.493 mHash/s ________________ 179.058 mHash/s

______________________________________________________________

4. The disk system is somewhat generic and it may be useful to consider a RAID controller for the 5th PCIe x16 slot (wired as x8) to include a hardware RAID controller, and to add a RAID 5 for the storage. This may, however, mean that there would be a single PCIe SSD drive in the remaining x8 slot (wired as x4). In some ways, my inclination is to have a cache drive for fast swaps to RAM and chance to saves, but the cache drive could be an SATA SSD with good performance as well.

A complicated configuration that does need a bit more detailed study, but uses the latest Xeon E5-2600 series v4. I think has a very good cost / performance potential along with excellent system stability at high performance. As a new, proprietary system I estimate the cost at about $16,000+ As 3 new K20's would be over $9,000 alone.

Cheers,

BambiBoom

1. HP z420 (2015) > Xeon E5-1660 v2 (6-core @ 3.7 / 4.0GHz) > 32GB DDR3 1866 ECC RAM > Quadro K4200 (4GB) > Intel 730 480GB (9SSDSC2BP480G4R5) > Western Digital Black WD1003FZEX 1TB> M-Audio 192 sound card > 600W PSU> > Windows 7 Professional 64-bit > Logitech z2300 speakers > 2X Dell Ultrasharp U2715H (2560 X 1440)>
[ Passmark Rating = 5064 > CPU= 13989 / 2D= 819 / 3D= 4596 / Mem= 2772 / Disk= 4555] [Cinebench R15 > CPU = 1014 OpenGL= 126.59 FPS] 7.8.15

2. Dell Precision T5500 (2011) (Revised) > 2X Xeon X5680 (6 -core @ 3.33 / 3.6GHz), 48GB DDR3 1333 ECC Reg. > Quadro K2200 (4GB ) > PERC H310 / Samsung 840 250GB / WD RE4 Enterprise 1TB > M-Audio 192 sound card > Logitech z313 > 875W PSU > Windows 7 Professional 64> HP 2711x (27", 1920 X 1080)
[ Passmark system rating = 3844 / CPU = 15047 / 2D= 662 / 3D= 3550 / Mem= 1785 / Disk= 2649] (12.30.15)

 

kanewolf

Titan
Moderator
@bambiboom, for an HPC task like this the the E5-2667V4 (8 core 3.2Ghz) is the better choice in most cases. It is significantly more expensive, but it is what I would recommend. There are too many poorly written programs that have trouble scaling to 8 threads (let alone 20). Clock speed will almost ALWAYS beat threading in HPC processing. The E5-2643 v4 might even be a better choice. Again, 12 high clocked threads may get more work done than your 20 lower clocked threads. The only way to determine is to benchmark the application and see how wide it will scale. A purchase of some time on an Amazon compute node might be a wise investment to determine the scalability.
 
Solution


kanewolf,

If visualization is the priority such as games, 3D CAD, or animation, a higher single-threaded performance from a CPU with fewer but faster cores might be preferable, but scientific computing- and rendering- these days is GPGPU and runs on configurations that can utilize CPU plus GPU coprocessing Tesla or Xeon Phi in parallel. The calculation density of CPU cores + GPU cores can't be remotely matched by fewer CPU cores at a higher clock speed.

Have a look in the previous post at the ratings for particle simulation- one of the uses for the proposed system for the Tesla K20. Passmark CPU ratings are experientially weighted to include single-threaded performance, but in general reflect the calculation density. A Xeon E5-1620 4-core @ 3.6 /3.8GHz has a high single threaded rating of 1930 and average CPU mark of 9097. An E5-2683 v3 14-core @ 2.0 / 3.0GHz single-threaded is 1684 but the CPU mark is 17986 or for a pair- 22063- it's getting though many more cycles per second. I would use the E5-1620 for CAD and the E5-2683 v3 for the uses mentioned in the original post. Actually the 1684 single-threaded performance of the E5-2683 v3 is quite good anyway. I've done a lot of complex 3D modeling on Xeon X5680's that have a single-threaded rating of 1465.

I was at a particle research facility a few days ago where I had done a project and the particle simulations, thermal, and electro-mechanical experiment design / simulations are run on a set of eleven nodes each with dual 14-core Xeons (I don't know the model) and four Tesla K20's GPU coprocessors. This facility has links to Oak Ridge where they have a system called "Titan" that has 18,000+ AMD Opterons running 18,000+ Tesla K20X. At one time that was the fastest supercomputer. In the world of very high peroformnance computing, CPU's are shifting towards becoming "GPU /Memory Controllers".

The surprise of that visit was speaking with the head draughtsman who is supervising the construction drawings for a $400M photon beam accelerator. They are using Siemens NX. I assumed the CAD system would be amazingly powerful and was bowled over when I learned what he was using: Dell Precision T3500 4-core but with a Quadro K6000. He said, "Yeah, all you need is a really good graphics card". Here again, the GPU cores are more important than the CPU.

I have a new respect for my $53 T3500.

Cheers,

BambiBoom

Proud owner:

Dell Precision T3500 (2011) (Rev 2) Xeon X5677 4-core @ 3.46 / 3.73GHz > 12GB (6X 2GB) DDR3-1333 ECC > Quadro 4000 (2GB) > PERC 6/i + Seagate 300GB 15K SAS ST3300657SS + WD Black 500GB > 525W PSU> Windows 7 Professional 64-bit > 2X Dell 19" LCD
[Passmark system rating = 2751, CPU = 7236 / 2D= 658 / 3D=2020 / Mem= 1875 / Disk=1221]

But, where can I find a cheap K6000?
 

kanewolf

Titan
Moderator
I don't disagree with ANYTHING you say. But, I am a big believer in benchmark before buying. The OP didn't identify what software was being used so we can only speculate on the scalability or clock speed (or memory speed, or ....) sensitivity. Since most individuals don't have the benefit of loaner equipment from the major sellers that large companies have, the next best option is to purchase time on an Amazon or Google computer node. Not a perfect analog to a standalone box, but the best that an individual can get.
 


kanewolf,

A person requesting information for a system for: "3D processing, fluid simulation, particle systems (all from CPU) and 3D GPU render" can be assumed to be using modified algorithmic programs for that purpose such as Matlab and possibly ray tracing rendering which are fully scalar in parallel- can use every core of any kind.

Certainly, the benchmarks are valuable. The main point of my previous post was to suggest mapping calculation density against single-threaded performance as expressed by comparative benchmarks to adapt to the applications.

Interesting discussion!

Cheers,

BambiBoom
 

kanewolf

Titan
Moderator
I have seen REALLY badly written Matlab code which wouldn't scale. I may be taking the pessimist viewpoint, but I never assume that any code will scale without testing it. Going from 4 threads to 20 threads plus GPGPU is a non-trivial thing even with the benefits of Matlab toolboxes. And Mathworks doesn't give those toolboxes away either :)
 

chriscambridge

Commendable
Aug 5, 2016
33
0
1,540
Actually, previous answers have got this all wrong.

In terms of 4 GPUs, the Supermicro mobo is one of the only ones that can actually utilize the full PCIE lanes from the Xeons.

If you take a look the specification for PCIE slots of both boards this become apparent.

For gaming perhaps their is no difference between running a slot at x16 or x8 speeds, but for data processing, this becomes quite important (dependent on application being used), especially as GPUs become more powerful, such as the 1080Ti etc.

Asus:

4 x PCIe 3.0/2.0 x16 (dual x16 or quad x8)

So if you run four GPUs, they only run at x8 speed..

Supermicro:

4 PCI-E 3.0 x16 (double-width) slots

--

You should also check the manual, for the PCIE to CPU lanes diagram, as this clearly shows what and how many lanes are actually being or can be used.

Clearly on a single Xeon, you only have 40 PCIE lanes, and therefore the most you could run (without multiplexers or switchers, which reduce speed) is 2x 16 and 1x 8; the same thing holds for dual CPUs.

--

This is why perhaps Cambridge University use the Supermicro board above; so far it is one of the only ones that we have found that actually fully use the 80 PCIE lanes on both Xeons; most other boards use these lanes for other things, such as PCI, M2, etc..

If you take a look at our threads you will see we posed just this question about full PCIE slots and lanes for our GPU rack.

 


chriscambridge,

"Actually, previous answers have got this all wrong."

I don't understand neither the purpose or content of this post, nor the timing nearly one year after the original post.

In my post of 2 May, 2016, I proposed using:

"CPU: 2X Intel Xeon Processor E5-2640 v4 (10-core @2.40 / 3.4 GHz, 25M Cache) > $1,840 ($920 each)(Superbiiz)

Motherboard: Supermicro X10DRG-Q (4X PCIe x16 GPU slots) > $499 (Superbiiz)(This motherboard has four GPU slots)(Review of this motherboard)

Notes:

1. The 4X GPU configuration is unconventional: The Quadro K4200 is single height but the Tesla K20's are double- height cards. However the GPU slots are double spaced on the Supermicro X10DRG-Q motherboard.
"

The pair of Xeon E5-2640 v4 processors provide a total of 80 PCIe lanes and the Supermicro X10DRG-Q motherboard ( provides 4X double height PCIe x16 GPU slots, spaced uniquely such that no other PCIe slots are covered. This means that four x16 GPU's use 64 Lanes to run at full capability, leaving the other 16 lanes for peripherals, for example, a x8 RAID controller + x4 PCIe with a total of 4 lanes remaining- none shared.

Would you restate your point?

Also, at this time of year, this site is strongly recommended: http://www.cambridgestudents.cam.ac.uk/your-course/examinations/all-students-timetable


Cheers,

BambiBoom

ex Pembroke

CAD / 3D Modeling / Graphic Design:

HP z620_2 (2017) > Xeon E5-1680 v2 (8-core@ 4.1GHz) / 64GB DDR3-1866 ECC Reg / Quadro P2000/ HP Z Turbo Drive M.2 256GB + Intel 730 480GB + Seagate Constellation ES.3 1TB / ASUS Essence STX PCIe sound card /825W PSU / Windows 7 Prof. 64-bit > 2X Dell Ultrasharp U2715H (2560 X 1440)
[Passmark Rating = 6166 / CPU rating = 16934 / 2D = 820 / 3D= 8849 / Mem = 2991 / Disk = 13794] 4.24.17 Single Thread Mark = 2252

HP z420 (2015) (Rev 5) > Xeon E5-1660 v2 (6-core @ 3.7 / 4.2GHz) / 32GB DDR3 -1866 ECC RAM / Quadro P2000 (4GB) / HP Z Turbo Drive M.2 256GB AHCI + Intel 730 480GB (9SSDSC2BP480G4R5) + Western Digital Black WD1003FZEX 1TB> Creative SB X-Fi Titanium + Logitech z2300 2.1 speakers > 600W PSU> > Windows 7 Professional 64-bit >> 2X Dell Ultrasharp U2715H (2560 X 1440)
[ Passmark Rating = 5920 > CPU= 15129 / 2D= 855 / 3D= 8945 / Mem= 2906 / Disk= 8576] [6.12.16] Single-Thread Mark = 2322 [4.20.17]

Analysis / Simulation / Rendering:

HP z620 (2012) (Rev 3) 2X Xeon E5-2690 (8-core @ 2.9 / 3.8GHz) / 64GB DDR3-1600 ECC reg) / Quadro K2200 (4GB) + Tesla M2090 (6GB) / HP Z Turbo Drive (256GB) + Samsung 850 Evo 250GB + Seagate Constellation ES.3 (1TB) / Creative Sound Blaster X-Fi Titanium PCIe sound card + Logitech z313 2.1 speakers / 800W / Windows 7 Professional 64-bit > > HP 2711x (27" 1980 X 1080)
[ Passmark System Rating= 5675 / CPU= 22625 / 2D= 815 / 3D = 3580 / Mem = 2522 / Disk = 12640 ] 9.25.16 Single Thread Mark = 1903
[ Cinebench R15: CPU = 2209 cb / Single core 130 cb / OpenGL= 119.23 fps / MP Ratio 16.84x] 10.31.16



 

chriscambridge

Commendable
Aug 5, 2016
33
0
1,540
Yes I did my degree in Computer Science 20+ years ago.

The biggest difference between both mobos if you are running four GPUs is the one I mentioned.

The post was for others, as this thread had been marked as solved.