I want to give forewarning that this post will be long. This problem has stumped us for weeks and we could seriously use some help! Thanks in advance for reading.
I work for an engineering company and we perform finite element simulations. For the simulations in question we are using a commercial code called ABAQUS/Standard (see here). We were in need for a new machine recently with a very large RAM capacity so our analyses would fit in-core and therefore went with a server. We purchased this machine earlier this month, and the original setup looked like this:
- SuperMicro 2U Server 2022G-URF (specs)
- SuperMicro H8DGU-F Motherboard (specs)
- 2x AMD Opteron 6134 Processors @ 2.3GHz (specs)
- 64GB Kingston DDR3 1333MHz ECC Registered RAM
- 2x Intel 320 120GB SSD drives
- Redhat Enterprise Server v6.2
- Linux Kernel 2.6.32
Right out of the box, we were shocked how slow this machine performed. The machines we've purchased over the past few years have mostly been quad core desktop units and tend to do quite well with our simulations. The last server we bought was about 4 years ago and has roughly the following specs:
- 2x Intel Xeon X5355 @ 2.6GHz (no Turbo Boost)
- 16GB DDR 667 MHz RAM
- Standard SATA spinning disk drives
Much to our surprise, the 4 year old server was performing the analysis out-of-core, on old crappy RAM and processors, reading and writing to a spinning disk, and was doing it faster than our new machine! The new machine was running the analysis in-core, with slightly slower processors (but not really once the AMD Turbo Core kicked in), better RAM, and using an SSD drive. The software we use is fairly read/write intensive so the faster read/write was a big plus.
Over the subsequent two weeks, here are some of the things we've tried in hopes of speeding up this server:
1) We exchanged the Opteron 6134 processors (2x 8 cores @ 2.3GHz) for the 6272 processors (2x 16 cores @ 2.1 GHz) and then later to the 6220 (2x 8 cores @ 3.0 GHz) and none of these changes helped. In fact, going from the 2.3GHz to 3.0GHz processors did very little at all to speed up our analyses.
2) Updated to the latest BIOS
3) Tried an array of Linux kernels from 2.6.32 to 2.6.39 and even the latest stable 3.2.1
4) We've had better luck in the past with SUSE Linux so we switched from Redhat to Suse Enterprise Server 11
5) Played with the BIOS settings to death, messing with just about every parameter but focusing primarily on the power-saving options.
6) Modified the automatic CPU scaling mode and parameters from within the OS (it uses powernow-k8). We tried 'performance mode' where it leaves all of the CPUs at 3.0GHz all the time. If you leave it on it's standard default mode it fluctuates the CPU speeds between 1.4 and 3.0GHz to save power. Also I should note that while running the analysis we will typically only use one or two out of the 16 processors so the AMD Turbo Core kicks in and ramps us up to nearly 3.6GHz at times.
7) We wanted to eliminate the possibility of a hardware issue. We swapped out the motherboard, RAM and even power supplies (I'll explain why we swapped the power supplies later) and none of that helped.
8) We created a RAID 0 on our two SSDs to rule out I/O as a potential bottleneck. We clocked this raided disk at nearly 700MB/sec and had the OS installed on it and that did not help either.
While playing with the BIOS settings I noticed something quite strange. There is an option to control the maximum fan speed. The default is a 'Balanced' mode. I ramped up the fan speed to 'High Performance' and then later 'Full Speed'. I have a little 90 second benchmark problem that I've used to monitor system performance through all of these changes. With the fans in Balanced mode, the benchmark takes ~92 seconds. If I change the fans to High Performance mode, the benchmark takes ~94 seconds. If I change the fans to Full Speed mode the benchmark takes ~95 seconds. For some reason, increasing fan speed caused the analysis benchmark to slow down. As a result, we decided to try swapping out the power supplies. Oddly enough, we did this and rebooted the machine and the benchmark time reduced by about 11 seconds. It actually made the benchmark go faster in the short-term, however we've been running the full-scale analyses on this machine as well as the older server over the past few days and they are running very near the same speed, so this did not fix the overall issue.
Does anybody have any ideas what in the world may be going on with this machine? It doesn't seem possible to be remotely comparable with that older server. Since the I/O is so much quicker on our new server, we are inclined to think that this is a processor issue. Is it possible that the architecture of these AMD Opteron processors is literally THAT poor for our type of work? I would greatly appreciate any input or thoughts. We have invested a lot of money in this machine and would really like to see some return on that at some point. As it is now, this machine is almost useless for our needs!
Thanks in advance for the help.
I work for an engineering company and we perform finite element simulations. For the simulations in question we are using a commercial code called ABAQUS/Standard (see here). We were in need for a new machine recently with a very large RAM capacity so our analyses would fit in-core and therefore went with a server. We purchased this machine earlier this month, and the original setup looked like this:
- SuperMicro 2U Server 2022G-URF (specs)
- SuperMicro H8DGU-F Motherboard (specs)
- 2x AMD Opteron 6134 Processors @ 2.3GHz (specs)
- 64GB Kingston DDR3 1333MHz ECC Registered RAM
- 2x Intel 320 120GB SSD drives
- Redhat Enterprise Server v6.2
- Linux Kernel 2.6.32
Right out of the box, we were shocked how slow this machine performed. The machines we've purchased over the past few years have mostly been quad core desktop units and tend to do quite well with our simulations. The last server we bought was about 4 years ago and has roughly the following specs:
- 2x Intel Xeon X5355 @ 2.6GHz (no Turbo Boost)
- 16GB DDR 667 MHz RAM
- Standard SATA spinning disk drives
Much to our surprise, the 4 year old server was performing the analysis out-of-core, on old crappy RAM and processors, reading and writing to a spinning disk, and was doing it faster than our new machine! The new machine was running the analysis in-core, with slightly slower processors (but not really once the AMD Turbo Core kicked in), better RAM, and using an SSD drive. The software we use is fairly read/write intensive so the faster read/write was a big plus.
Over the subsequent two weeks, here are some of the things we've tried in hopes of speeding up this server:
1) We exchanged the Opteron 6134 processors (2x 8 cores @ 2.3GHz) for the 6272 processors (2x 16 cores @ 2.1 GHz) and then later to the 6220 (2x 8 cores @ 3.0 GHz) and none of these changes helped. In fact, going from the 2.3GHz to 3.0GHz processors did very little at all to speed up our analyses.
2) Updated to the latest BIOS
3) Tried an array of Linux kernels from 2.6.32 to 2.6.39 and even the latest stable 3.2.1
4) We've had better luck in the past with SUSE Linux so we switched from Redhat to Suse Enterprise Server 11
5) Played with the BIOS settings to death, messing with just about every parameter but focusing primarily on the power-saving options.
6) Modified the automatic CPU scaling mode and parameters from within the OS (it uses powernow-k8). We tried 'performance mode' where it leaves all of the CPUs at 3.0GHz all the time. If you leave it on it's standard default mode it fluctuates the CPU speeds between 1.4 and 3.0GHz to save power. Also I should note that while running the analysis we will typically only use one or two out of the 16 processors so the AMD Turbo Core kicks in and ramps us up to nearly 3.6GHz at times.
7) We wanted to eliminate the possibility of a hardware issue. We swapped out the motherboard, RAM and even power supplies (I'll explain why we swapped the power supplies later) and none of that helped.
8) We created a RAID 0 on our two SSDs to rule out I/O as a potential bottleneck. We clocked this raided disk at nearly 700MB/sec and had the OS installed on it and that did not help either.
While playing with the BIOS settings I noticed something quite strange. There is an option to control the maximum fan speed. The default is a 'Balanced' mode. I ramped up the fan speed to 'High Performance' and then later 'Full Speed'. I have a little 90 second benchmark problem that I've used to monitor system performance through all of these changes. With the fans in Balanced mode, the benchmark takes ~92 seconds. If I change the fans to High Performance mode, the benchmark takes ~94 seconds. If I change the fans to Full Speed mode the benchmark takes ~95 seconds. For some reason, increasing fan speed caused the analysis benchmark to slow down. As a result, we decided to try swapping out the power supplies. Oddly enough, we did this and rebooted the machine and the benchmark time reduced by about 11 seconds. It actually made the benchmark go faster in the short-term, however we've been running the full-scale analyses on this machine as well as the older server over the past few days and they are running very near the same speed, so this did not fix the overall issue.
Does anybody have any ideas what in the world may be going on with this machine? It doesn't seem possible to be remotely comparable with that older server. Since the I/O is so much quicker on our new server, we are inclined to think that this is a processor issue. Is it possible that the architecture of these AMD Opteron processors is literally THAT poor for our type of work? I would greatly appreciate any input or thoughts. We have invested a lot of money in this machine and would really like to see some return on that at some point. As it is now, this machine is almost useless for our needs!
Thanks in advance for the help.