CPU charts - floating point number crunching

Tom's CPU charts show a number of benchmarks. Which benchmarks are most relevant for a computer that will be used solely for long engineering calculations (floating point number crunching) without significant video output.

Thanks,
Don Culp
11 answers Last reply
More about charts floating point number crunching
  1. 3D mark CPU
    and
    Sisandra.

    these are the bench marks for the computing power of CPU's.

    however: if you looking for a number cruching CPU then get a e8xxx or a q9xxx (get the q9 if you application make use of multiple threads.) and then the more you pay the more you get.
  2. Um ... SPECfp would be ideal.
  3. http://www.tomshardware.com/charts/cpu-charts-2007/sisoftware-sandra-xi,394.html?p=1217%2C1273%2C1271%2C1272%2C1270%2C1269%2C1275%2C1268%2C1267%2C1265%2C1274%2C1266%2C1264%2C1263%2C1262%2C1261%2C1259%2C1260%2C1257%2C1258%2C1256%2C1302%2C1308%2C1301%2C1307%2C1306%2C1253%2C1252%2C1305%2C1250%2C1228%2C1251%2C1304%2C1246%2C1249%2C1248%2C1245%2C1247%2C1303%2C1244%2C1240%2C1241%2C1242%2C1243%2C1239%2C1236%2C1235%2C1300%2C1237%2C1238%2C1255%2C1299%2C1234%2C1233%2C1232%2C1231%2C1230%2C1295%2C1298%2C1254%2C1229%2C1294%2C1297%2C1293%2C1296%2C1292%2C1281%2C1291%2C1280%2C1289%2C1290%2C1279%2C1287%2C1227%2C1288%2C1278%2C1285%2C1286%2C1225%2C1224%2C1226%2C1277%2C1284%2C1283%2C1220%2C1223%2C1222%2C1221%2C1282%2C1276%2C1219%2C1316%2C1317%2C1318%2C1218%2C1313%2C1312%2C1315%2C1314%2C1309%2C1311%2C1310

    This is similar at least.

    Note the newer Phenom CPU's are not listed and in reality these benchmarks may not be that useful ... as they are synthetic.

    I would take them with a grain of salt and imagine if your running complex engineering tasks you would be hardy wasting your time on a single socket solution when a 4 or 8 socket machine is available.

    The Intel CPU's currently do not scale very well beyond 2 cpu's because the front-side bus simply gets swamped with the traffic.

    If your seriously looking for a small engineering based machine then look at a 4 or 8 socket quad core Opteron.

    And load the box up with at least 32 Gig or RAM.

    Who cares about the video card ...

    Hope this helps.

    Ad a couple of SAS / SCSI drives.
  4. dculp said:
    Tom's CPU charts show a number of benchmarks. Which benchmarks are most relevant for a computer that will be used solely for long engineering calculations (floating point number crunching) without significant video output.

    Thanks,
    Don Culp


    Reynod has good advice as SPECfp is a decent measure of FP power. You can also look at Sciencemark or Fluent or other HPC benchmarks.

    I don't know what kind of a system you're looking at making, so here's a general synopsis of the performance of the various wares from AMD and Intel.

    Definitions:
    1. "NetBurst" architecture: desktop Celerons >2.0 GHz, Pentium 4, Pentium D, any Xeon with no 4-digit model number, Xeon 50x0 series, 70xx series, 71xx series.
    2. "Core" architecture : desktop Celeron 4xx, Celeron Dual Core, Pentium Dual Core, Core 2 Duo, Core 2 Quad, Xeon 3xxx series, 51xx series, 52xx series, 53xx series, 54xx series, 73xx series.
    3. "K8" architecture: socket 754 Semprons, Athlon 64, Athlon 64 X2, Athlon X2, Opteron 1xx/2xx/8xx, Opteron 12xx/22xx/82xx, Turion 64, Turion 64 X2.
    4. "K10" architecture: Phenom X3, Phenom X4, all quad-core Opterons (23xx/83xx.)

    1. The NetBurst chips are obsolete but you may still see a few around. They make relatively poor number crunchers due to their very long execution pipeline.

    2. The 65 nm Core CPUs' (everything except Core 2 Duo E7xxx/8xxx, Core 2 Quad Q9xxx, Xeon 52xx and 54xx) clock-for-clock performance is generally considered to be similar to the AMD K10 CPUs in single-socket number-crunching performance, 15-20% better than the AMD K8s, and close to double that of the NetBurst CPUs. However, the dual-socket Xeons do not scale as well as dual-socket Opterons. The scaling largely depends on the amount of memory accesses (more = worse performance for the Xeons). Two quad-core Xeon 53xx CPUs perform only a moderate bit better than two dual-core Xeon 51xx CPUs or a single Xeon 53xx CPUs in some number-crunching apps due to FSB congestion.

    The 45 nm Core chips (Core 2 Duo E7xxx/8xxx, Core 2 Quad Q9xxx, Xeon 52xx and 54xx) are roughly 5% faster clock-for-clock than their 65 nm siblings. The 45 nm Xeon 52xx and 54xx units on the new Xeon 5400 chipset are known to scale a little better than the 65 nm units due to a faster 1600 MHz FSB and bigger on-chip caches.

    3. Very little testing has been done on the K10 quad-core Opterons as they are just now shipping.

    So to sum it up, if you are looking for the best performance in a single-socket setup, opt for the Core 2 Duo E8xxx series or the Core 2 Quad Q9xxx series. They are a touch faster clock-for-clock than the AMD Phenoms but have higher clock speeds. However, the Core 2 chips and their motherboards are more expensive than the Phenom X4s, plus Phenoms X4s are widely available while the Core 2 Duo E8xxx and Core 2 Quad Q9xxx CPUs are hard to find as only small quantities are shipping.

    If you are looking for a dual-socket setup, the higher-clocked Xeon 54xx CPUs will be the fastest units, mostly because you can get them in speeds up to 3.4 GHz while the K10 Opterons only go up to 2.3 GHz at the present. But be warned that these Xeons are also shipping in small quantities like all Intel 45 nm CPUs and you may not be able to find all models in stock.

    if you are looking at a server using four to eight sockets, Opterons are the only way to go. The Xeons are very much constrained by the FSB and as such do not scale very well with increasing numbers of computational threads.
  5. MUE:

    I'm looking to replace my AMD5400 for doing some (BOTE) polymer folding calculations. Currently I run-all-night on a 100-element model. The 6400+ does not appear to buy me x10 improvement. What's my best near-term shot ??
  6. nss000 said:
    MUE:

    I'm looking to replace my AMD5400 for doing some (BOTE) polymer folding calculations. Currently I run-all-night on a 100-element model. The 6400+ does not appear to buy me x10 improvement. What's my best near-term shot ??


    Well, it depends on the characteristics of the program you are using. Google was not helpful in trying to find out what program you are running as all I found when searching for "BOTE+polymer+folding+calculation+program" were ads for small plastic collapsible watercraft. Here's about all I can tell you:

    1. It will be impossible to speed up your program significantly unless it is multithreaded and can spawn many threads. The X2 5400+ is a fairly potent chip and no currently-shipping CPU will buy you more than a 50% improvement assuming the absolute bast-case scenario, which would be that the computations are almost all vectorized into SSE and you use Intel's compiler and run it on a fast Core 2 Duo like the E8500. That's probably not going to be the case, so you're looking at maybe a 20-30% improvement, tops. I wouldn't worry about that if I were you as several hundred dollars for a new CPU + new motherboard isn't worth saving maybe an hour on an overnight compute. Of course, you could overclock a Core 2 Duo E8000-series chip to over 4 GHz and this will speed up calculations, but overclocked CPUs can make math errors and I would recommend against overclocking unless you are willing to run your work several times to double-check your results. That would take even more time than leaving it at stock.

    2. If your program is multithreaded, then you can get a good speedup by throwing a bunch of cores at the problem. I'd get a dual-socket quad-core as that gives you eight cores but can be gotten for a couple grand rather than the closer to ten grand that a quad-socket quad-core setup would run. The absolute fastest setup would most likely be a pair of 3.2 GHz Xeon X5482s on a 5400-chipset board with four matched sticks of RAM, but that would be about $3000 just for the two chips ($1280 MSRP each), a decent motherboard (about $500), and the FB-DIMM RAM (figure a few hundred dollars.) You would likely be looking at 3-4 times the speed, depending on how much memory bandwidth the program uses. If it were me, I'd look at either a pair of 2.5 GHz Xeon E5420s ($350 each) and a 5200-chipset board (~$400) or a pair of 2.1 GHz Opteron 2352s ($313 each) and a suitable socket F board ($300-400) instead. The Xeons are probably as fast per thread as your X2 5400+ is but you won't see a 4-fold increase due to scaling issues. The Opterons scale much better than the Xeons do, but a 2.1 GHz K10 Opteron quad-core is slower per thread than your 2.8 GHz K8 5400+. However, with running eight threads, these systems ought to be in the same ballpark. You should see a speedup of 2.5-3x or so and those units will cost under $2000 with case, PSU, hard drive, etc. You just have to decide if it's worth it.

    3. If your program runs on Linux or a UNIX-type system and is amenable to run on a cluster, consider making a small cluster of a few cheap quad-core machines and hook them together via a gigabit Ethernet switch. Figure that three machines will cost you in the neighborhood of $2000 or so if you use lower-range quads like Q6600s or Phenom X4s. You only need one machine that has a hard drive- the rest can be booted through the network. If you have enough machines, you can complete the tasks in as short of a time as you want to. However, there is a learning curve associated with setting up a *nix cluster and this may not be something you want to undertake.
  7. MUE:
    Sorry for the Back-Of-The-Envelope confusion. It's my own (unthreaded) C-code on an UBUNTU/GCC compiler. Displays via GNUplot.
    Thanks for your quick ( though GRIM ) response. I was concerned no home-brew system will allow realistic models to be run in semi-real time - you've confirmed that. Sounds like I should get the fastest Quad-core I can afford ... and learn to thread. Thanks again.
  8. All --

    For the suggested benchmarks, which are multithreaded so that they would reflect the participation of all cores?

    Don Culp
  9. dculp said:
    Tom's CPU charts show a number of benchmarks. Which benchmarks are most relevant for a computer that will be used solely for long engineering calculations (floating point number crunching) without significant video output.

    Thanks,
    Don Culp



    How much memory are you using for each node? (I assume its parallel?)
  10. nss000 said:
    MUE:
    Sorry for the Back-Of-The-Envelope confusion. It's my own (unthreaded) C-code on an UBUNTU/GCC compiler. Displays via GNUplot.
    Thanks for your quick ( though GRIM ) response. I was concerned no home-brew system will allow realistic models to be run in semi-real time - you've confirmed that. Sounds like I should get the fastest Quad-core I can afford ... and learn to thread. Thanks again.


    Yup, I've done some of the back-of-the-envelope C programming to do big number-crunching runs too. My programs were also single-threaded but the job itself was parallel, so I would just run as many single-threaded jobs as I had free processor cores to get different work done at the same time. I've found out that going through my code and trying to hand-optimize whenever possible can yield decent increases, as well as tweaking with GCC options. You won't shave off a huge amount of runtime doing that unless you really goofed something up to begin with, but it's worth a shot. Good luck.


  11. I also am looking for the "best" CPU to conduct number crunching. However, I am only using integers. Also, my program is linear, so is there any benefit for having many cores? http://www.cpubenchmark.net/high_end_cpus.html ranks the performance of the CPUs. How should I interpret their benchmarks for the problem at hand?
Ask a new question

Read More

CPUs Computers Video