HPC Council Offer Free Time on Nvidia Tesla GPUs

Donated by Nvidia, a member of the HPC Advisory Council, the program is designed to help researchers and developers benchmark their software and optimize their code to run on Tesla GPUs. The goal appears to be the finalization of software to make sure that the software will perform exactly as advertised. The HPC Council told us that the cluster will have 16 GPUs with four being available at this time. The GPU time that can be allocated to a user depends on the specific needs.

"Researchers need an easy way to benchmark their models on the growing number of GPU-accelerated applications before making a buying decision," said Sumit Gupta, director of Tesla business at Nvidia. "The new Center provides a valuable resource to help developers optimize their codes for GPUs, and ensure that applications will perform precisely as advertised."

Gilad Shainer, chairman of the HPC Advisory Council noted that HPC systems have been donated by other member companies in the past, including AMD and Intel.

  • d_kuhn
    This is a great idea... I purchased a Supermicro GPU server with 4 2090's last year to do precisely this evaluation, having the ability to run evaluation code online (assuming their implementation doesn't add overhead) would give you a great way to spin up one of these GPU supercomputers without needing to invest the several tens of thousand dollars required to buy a box.

    The folks at Supermicro and other oem's who've struggled through NVidia validation for the 2090 (which has no onboard cooling so requires a very dedicated system design) may not be particularly happy about it though.
    Reply
  • wiyosaya
    D_KuhnThis is a great idea... I purchased a Supermicro GPU server with 4 2090's last year to do precisely this evaluation, having the ability to run evaluation code online (assuming their implementation doesn't add overhead) would give you a great way to spin up one of these GPU supercomputers without needing to invest the several tens of thousand dollars required to buy a box. The folks at Supermicro and other oem's who've struggled through NVidia validation for the 2090 (which has no onboard cooling so requires a very dedicated system design) may not be particularly happy about it though.Buzz at BOINC stats is that some consumer cards, i.e., in the 400 and 500 series, outperform some of the Teslas. Teslas sound like NVidia's marketing baby that may or may not perform any better than "consumer" GPUs.

    My bet is that this is another NVidia marketing thrust. IMHO, testing code like this might just as easily be accomplished on a consumer GPU - or, with this service, one could conceivably test on a consumer GPU in house and a Tesla based HPC at the same time to determine whether there is an advantage to a Tesla setup.
    Reply
  • trandoanhung1991
    wiyosayaBuzz at BOINC stats is that some consumer cards, i.e., in the 400 and 500 series, outperform some of the Teslas. Teslas sound like NVidia's marketing baby that may or may not perform any better than "consumer" GPUs.My bet is that this is another NVidia marketing thrust. IMHO, testing code like this might just as easily be accomplished on a consumer GPU - or, with this service, one could conceivably test on a consumer GPU in house and a Tesla based HPC at the same time to determine whether there is an advantage to a Tesla setup.
    Teslas have a much higher DP rate compared to regular cards. In addition, they're much more stable and heavily tested and certified for continuous running (24/7/365).

    Otherwise, they're just consumer cards, really. They both use the same chips, after all.
    Reply
  • d_kuhn
    Also keep in mind that the 2090 was designed to run in a multi-card dense stack in a servers air management. I think if you could manage airflow on a stack of GTX590's you'd get great bang for your buck out of them, but they have a quarter the memory per GPU and 20% lower memory bw so there would be some challenges involved in getting the most out of them.
    Reply
  • wiyosaya
    FWIW - 2070 run BOINC WUs slower than a GTX 460.If you look at the poster's world BOINC position, he is the number 1 overall BOINC contributor in the world. Unfortunately, his computer stats do not give the details on which machines run what GPGPU. Since he has experience running various hardware in a virtual HPC environment, I tend to think that his opinion holds some weight.

    Personally, I think the argument here is similar to that of using a Quadro in a CAD environment as opposed to a consumer card. While I am sure there are scenarios where the pro cards excel as they do in large CAD models, I would not be surprised if those HPC scenarios are presently a small number of all possible scenarios as they are in CAD.

    Coming from a "pro" imaging software background, the software the company I worked for delivered to all users is the same - various users pay a premium to enable certain features - which is gravy to the manufacturer. As stated above, the silicon is the same; whether the "special treatment" given to the pro market is worth the extra cost is, ultimately, up to the end-user and their requirements.

    From the 2090 spec sheet, it has a passive cooler. Many GTX 590 boards offer active cooling which lessens the need for airflow management for a 590 solution.

    I am not sure how many of either one can run in a single box - and I am assuming that one would not run the 590s in SLI for a HPC scenario as SLI gives no advantage in an HPC scenario, AFAIK, for doing so.

    In my opinion, pro cards are not worth the extra expense. However, if you have the budget and your use case scenarios are such that Teslas give a proven advantage that justifies the extra expense, then Teslas would, of course, be the better choice.
    Reply
  • d_kuhn
    The passive cooling is actually a desired feature for server integration... active cooling on boards disrupts the server cooling architecture and at best is something that designers need to compensate for (a problem to deal with that places restrictions on their ability to structure airflow as they desire), at worst it can do more harm than good in a server implementation.

    There's no doubt that a good chunk of the cost is the 'corporate premium', but a lot of that is actually not 'gravy' but rather the increased cost required to design and test these systems to enterprise standards (and the fact that they need to recoup those costs on a relatively small sales volume). Back in the CAD world, getting certified for the various packages costs money... so if a company wants certified hardware, they're going to have to pay for it. In the GPU acceleration world, boards like the 2090 are designed for high end NIC use, which means (in a well designed computing center) spending as much time as possible near 100% utilization. It's a different world from a consumer card dealing with a couple hours of gaming a day (and some of them aren't particularly good at even that). Even graphically intense gaming generally sees pretty wide demand variation.
    Reply
  • PreferLinux
    wiyosayaFWIW - 2070 run BOINC WUs slower than a GTX 460.If you look at the poster's world BOINC position, he is the number 1 overall BOINC contributor in the world. Unfortunately, his computer stats do not give the details on which machines run what GPGPU. Since he has experience running various hardware in a virtual HPC environment, I tend to think that his opinion holds some weight.Personally, I think the argument here is similar to that of using a Quadro in a CAD environment as opposed to a consumer card. While I am sure there are scenarios where the pro cards excel as they do in large CAD models, I would not be surprised if those HPC scenarios are presently a small number of all possible scenarios as they are in CAD.Coming from a "pro" imaging software background, the software the company I worked for delivered to all users is the same - various users pay a premium to enable certain features - which is gravy to the manufacturer. As stated above, the silicon is the same; whether the "special treatment" given to the pro market is worth the extra cost is, ultimately, up to the end-user and their requirements.From the 2090 spec sheet, it has a passive cooler. Many GTX 590 boards offer active cooling which lessens the need for airflow management for a 590 solution.I am not sure how many of either one can run in a single box - and I am assuming that one would not run the 590s in SLI for a HPC scenario as SLI gives no advantage in an HPC scenario, AFAIK, for doing so.In my opinion, pro cards are not worth the extra expense. However, if you have the budget and your use case scenarios are such that Teslas give a proven advantage that justifies the extra expense, then Teslas would, of course, be the better choice.You're forgetting the already-mentioned double-precision performance – on consumer cards, it is far lower than on professional cards, and it is far more important than single-precision.

    Also consider that the extra cost of professional cards is probably minimal compared to other expenses.
    Reply