Fujitsu To Build AI Supercomputer With 24 Nvidia DGX-1 Systems

Fujitsu announced that it will build a supercomputer for Riken, a Japanese AI research center, that comes with 32 Intel Xeon-based Fujitsu servers and 24 Nvidia DGX-1 AI accelerator systems and boasts a peak theoretical performance of 4 petaflops.

Nvidia DGX-1

The DGX-1 is what Nvidia likes to call an “AI supercomputer in a box.” It features eight Tesla P100 GPUs that are optimized for deep learning, and the whole system can cost as much as $129,000. The Elon Musk-backed OpenAI nonprofit was the very first customer to get one.

According to Nvidia, a DGX-1 has the same performance as 250 conventional x86 servers. The key word here is “conventional,” as Intel has its own machine learning "Xeon Phi" accelerators now, which can offer much better competition. However, they may still not be a match for Nvidia’s latest GPUs.

Although things could change over the next few years, when we’ll see more FPGAs or ASICs on the market that are more optimized for machine learning, it looks like GPUs are still the most common and effective way to train neural networks right now. Nvidia has also invested heavily in the software ecosystem to make its GPUs that much more appealing for customers who want to train neural networks on its chips.

Riken’s Supercomputer

A performance of 4 petaflops is towards the lower end of the spectrum for today’s supercomputers, which can already reach around 100 petaflops, and will soon reach 300 petaflops. The lower performance target may be the reason why Riken and Fujitsu decided to go with a modular solution based on 24 DGX-1 systems rather than a more customized architecture.

The whole supercomputer will be comprised of two server architectures: Nvidia’s DGX-1 systems and Fujitsu’s Intel Xeon-based servers (Primergy RX2530 M2). The file system will run on a “high-reliability, high-performance storage system,” which includes six Fujitsu Server Primergy RX2540 M2 PC servers, eight Fujitsu Storage Eternus DX200 S3 storage systems, and one Fujitsu Storage Eternus DX100 S3 storage system to provide the IO processing demanded by deep learning analysis.

According to Nvidia, the DGX-1 systems will offer Riken's supercomputer the following capabilities:

Containerized deep learning frameworks, optimized by NVIDIA for maximum GPU-accelerated deep learning trainingGreater performance and multi-GPU scaling with NVIDIA NVLink, accelerating time to discoveryAn integrated software and hardware architecture optimized for deep learning

The Riken R&D lab will use the supercomputer and its AI capabilities to find better solutions to social issues. Riken aims to find improvements to healthcare for the elderly, the management of aging infrastructure, and response to natural disasters. The Fujitsu-built supercomputer should go online in April.

TOPICS

Lucian Armasu is a Contributing Writer for Tom's Hardware US. He covers software news and the issues surrounding privacy and security.