HPC Systems, a workstation maker from Japan, has announced that its new workstation is powered by AMD's EPYC processor with up to 64 cores as well as up to two Nvidia GeForce RTX 3080/3090 graphics cards. The PAW-300 machine is designed primarily for AI developers, but it can certainly be used for other applications, such as digital content creation once Nvidia releases Studio drivers for its latest GPUs.
The Monster Workstation
The HPC PAW-300 is certainly one of the most powerful workstations available today. In its maximum configuration, the system can be equipped with AMD's EPYC 7002-series CPU with up to 64 cores that can be accompanied by up to 512 GB of DDR4-3200 ECC memory. The graphics department includes one or two three-wide GeForce RTX 3090 graphics cards that support the NVLink interface for multi-GPU setups or GeForce RTX 3080 boards that do not feature NVLink. Based on an image provided by HPC, the PAW-300 uses closed-loop liquid cooling systems for all of its compute components. As for storage, the computer has one M.2 slot for an SSD and a U.2 port for a workstation-grade drive.
Connectivity capabilities of the HPC PAW-300 are rather vast as the system has two 10 GbE connectors controlled by Intel's X550-AT2, four USB 3.1 Gen 2 (two Type-A and two Type-C) as well as two USB 3.0 Type-A ports, a COM port, and audio jacks. The system also comes equipped with Aspeed's AST2500 BMC (along with an RJ45 IPMI and a D-Sub connector) for remote management.
HPC will start sales of its PAW-300 workstations in late October. Pricing has not been announced, but since a 64-core AMD EPYC 7742 processor and two Nvidia GeForce RTX 3090 cost around $7770, obviously beefy configurations will cost a lot. The default model featuring a 16-core AMD EPYC 7302P ($919), 64 GB of memory, and two Nvidia GeForce RTX 3080 ($1400) will be priced more democratically.
GeForce RTX 30-Series vs A100?
After Nvidia launches its new gaming graphics cards, it usually takes the company some time to release Quadro boards for CAD and DCC professionals, and perhaps Titan boards for prosumers and specialized HPC models for AI and HPC applications. This is not the case with the Ampere family. The Nvidia A100 was revealed back in May, and PN just released cards for AI and HPC developers this week.
HPC Systems is essentially proposing people use its GeForce RTX-powered workstation for the same workloads that Nvidia's A100 was designed for. PNY's Nvidia A100 cards are only useful for compute, they're scarcely available at the moment, and they cost around £12,500 in the UK. Which leads to the question: Does it make sense to use GeForce RTX 30-series 'Ampere' graphics cards for AI/ML computing instead of A100?
When it comes to capabilities and actual performance of Nvidia's A100 compared to one or two GeForce RTX 3090 cards, the situation looks quite interesting.
Nvidia's GeForce cards traditionally do not support FP64 properly, so the A100 is unchallenged for HPC workloads. Furthermore, the GeForce RTX 30-series does not seem to support INT4 and INT8 instructions for AI/ML, so A100 has an edge again. Finally, the A100 carries 40 GB of HBM2 SDRAM, significantly more memory than its GeForce RTX 30-series counterparts, which is important for large datasets — yet another win.
But the GeForce RTX 3080/3090 gaming boards feature a considerably higher FP32 performance than the A100: 35.7/29.8 TFLOPS vs. 19.5 TFLOPS. Moreover, these cards offer rather decent FP16 performance, and two of RTX 3090 boards could actually challenge one A100 at a much lower price. Yes, the $1,500 RTX 3090 in this particular case could actually be an amazing bargain.
|Nvidia Ampere Family Peak Performance Comparison||Header Cell - Column 1||Header Cell - Column 2||Header Cell - Column 3|
|Row 0 - Cell 0||A100||RTX 3090||RTX 3080|
|FP64 Performance||9.7 TFLOPS||558 GFLOPS||465 GFLOPS|
|FP64 Tensor Core||19.5 TFLOPS||-||-|
|FP32 Performance||19.5 TFLOPS||35.7 TFLOPS||29.8 TFLOPS|
|Tensor Float 32 (TF32) Performance||156 TFLOPS | 312 TFLOPS*||143 TFLOPS | 285 TFLOPS*||119 TFLOPS | 238 TFLOPS*|
|FP16/Bfloat 16 Performance||312 TFLOPS | 624 TFLOPS*||143 TFLOPS | 285 TFLOPS*||119 TFLOPS | 238 TFLOPS*|
|INT8 Performance||624 TOPS | 1248 TOPS*||-||-|
|INT4 Performance||1248 TOPS | 2496 TOPS*||-||-|
|Memory Onboard||40 GB HBM2||24 GB GDDR6X||10 GB GDDR6X|
|Memory Bandwidth||1.6 TB/s||936 GB/s||760 GB/s|
|Note:||*Structural sparcity enabled||Row 12 - Cell 2||Row 12 - Cell 3|
Is Two Better Than One?
Whether or not it makes sense for AI/ML developers to use a couple of GeForce RTX 3090/3080 graphics cards instead of one A100 accelerator depends entirely on the projects they work on. Nvidia's GA100 and GA102 GPUs were designed for completely different workloads and the former is also optimized for prolonged operation under high loads, so using gaming boards for workstations is not always ideal.
But in the light of the fact that Nvidia yet has to offer its Titan and Quadro graphics cards based on the Ampere architecture for professionals, it is inevitable that at least some workstation vendors will offer workstations running Nvidia's GeForce RTX 30-series boards.