D-Wave's Mindbending Quantum Computing
Quantum computing may be the next wave in supercomputing, and a number of companies, including industry behemoths like Intel, Google, and IBM, are drawing the battle lines. Those three industry icons are focusing on using Quantum logic gates, which are more accessible to programmers and mathematicians, but are unfortunately harder to manufacture.
D-Wave's latest quantum processor can search a solution space that is larger than the number of particles in the observable universe--and it accomplishes the feat in 1 microsecond. The company has embraced annealing, which has more robust error correction capabilities. D-Wave announced at the show that its 2000Q system now supports reverse annealing, which can solve some problems up to 150X faster than the standard annealing approach.
D-Wave's quantum chips are built on a standard CMOS process with an overlay of superconducting materials. The company then cools the processor to 400X colder than interstellar space to enable processing. D-Wave's 2,000-Qubit 2000Q supercomputer comes with a hefty $15 million price tag, though the company also offers a cloud-based subscription service.
PEZY's 16,384-thread SC2 Processor
It's hard not to be enamored with PEZY's ultra-dense immersion-cooled supercomputers, but the PEZY-SC2 Many-Core processor is truly a standout achievement.
The 2,048-core chip (MIMD) operates at 1.0 GHz within a 130W TDP envelope. The chip features 56MB of cache memory and supports SMT8, so each core executes eight threads in parallel. That means a single PEZY-SC2 processor brings 16,384 threads to bear. It also features the PCIe Gen 4.0 interface.
Among other highlights, the chip sports a quad-channel memory interface that supports up to 128GB of memory with 153GB/s of bandwidth. That may seem comparatively small, but PEZY's approach relies upon deploying these processors en masse on a spectacular scale.
The 620mm2 die, built on TSMC's 16nm process, offers 8.192 TeraFLOPS of single-precision and 4.1 TeraFLOPS of double-precision compute.
Let's see how PEZY integrates these into a larger supercomputer.
The Pezy-SC2 Brick
PEZY uses its custom -SC2 processors, which serve as accelerators, to create modules which are then assembled into bricks, much like we saw with the company's V100 blades. The PEXY-SC2 brick consists of 32 modules that all have their own allotment of DRAM, and one 16-core Xeon-D processor is deployed per eight PEZY-SC2 processors. The brick also includes slots for four 100Gb/s network interface cards.
Now the completed brick is off to the immersion tank.
PEZY-SC2 Brick Immersion Cooling
A single ZettaScaler immersion tank can house 16 bricks, which works out to 512 PEZY-SC2 processors. That means each tank can bring in 1,048,576 cores and 8,388,608 threads. That's an incredibly mind-bending number, sure, but perhaps the size of the relatively small immersion tank is the most impressive. That's performance density at its finest.
The Gyoukou supercomputer employs 10,000 PEZY-SC2 chips spread over 20 immersion tanks, but the company "only" enabled 1,984 cores per chip, which equates to 19,840,000 cores. Each core wields eight threads, which works out to more than 158 million threads. That system also includes 1,250 sixteen-core Xeon-D processors, adding over 20,000 Intel cores. In aggregate, that's the most cores ever packed into a single supercomputer.
The Gyoukou system is ranked fifth on the Green500 list for power efficiency and fourth on the Top500 list for performance. It delivers up to 19,135 TFLOPS and 14.173 GFLOPS-per-watt.
Incredible numbers aside, the company also has its PEZY-SC3 chips in development, with 8,192 cores and 65,536 threads per chip. That means the core and thread counts of its next-generation ZettaScaler systems will be, well, beyond astounding. We can't wait.
The Vector Processor
NEC is the last company in the world to manufacture Vector processors for supercomputers. NEC's Vector Engine full-height full-length PCIe boards will serve as the key components of the SX-Aurora TSUBASA supercomputer.
NEC claimed that its 16nm FinFET (TSMC) Vector core is the most powerful single core in HPC. It provides up to 307 GigaFLOPS per 1.6 GHz core. Eight cores come together inside the processor, and each core is fed by 150GB/s of memory throughput. That bleeding-edge throughput comes from six HBM2 packages (48GB) that flank the processor. That's a record number of HBM2 packages paired with a single processor. In aggregate, they provide a record of 1.2 TB/s of throughput to the Vector processor.
All told, the Vector Engine delivers 2.45 TeraFLOPS of peak performance and works with the easy-to-use x86/Linux ecosystem. Of course, the larger supercomputer is just as interesting.
The Vector Engine A300-8 Chassis
NEC's SX-Aurora TSUBASA supercomputer employs eight A300-8 chassis (pictured above) that host eight Vector Engines each. Each rack in the system provides up to 157 TeraFLOPS, 76 TB/s of memory bandwidth, and consumes 30KW. NEC's previous-generation SX-ACE model required 10 racks and 170KW of power to achieve the same feat.
What's It Take To Immersion Cool 20 Nvidia Teslas?
We love to watch the pretty bubbles generated by immersion cooling, but there are hidden infrastructure requirements. Allied Control's 8U 2-Phase Immersion Tank, which we'll see in action on the next slide, offers 48kW of cooling capacity, but it requires a sophisticated set of pumps, radiators, and piping to accomplish the feat.
In a data center environment, the large coolers are placed on the outside of the building. In fact, Allied Control powers the world's largest immersion cooling system. The BitFury data center in the Republic of Georgia, Eastern Europe, processes more than five times the workload of the Hong Kong Stock Exchange (HKSE), but accomplishes the feat in 10% of the floor space and uses a mere 7.4% of the exchange's electricity budget for cooling (48MW for HKSE, 1.2MW for BitFury).
So saving money is what it's all about. Let's see the system in action with 20 of Nvidia's finest.
Allied Control Immersion Cooling 20 Tesla GPUs
The Allied Control demo consisted of 20 immersion-cooled Nvidia Tesla GPUs and a dual-Xeon E5 server. The demo system draws 48kW, but the cooling system supports up to 60kW.
The company uses custom ASIC hardware and a special motherboard. The company replaced the standard GPU coolers with a 1mm copper plate that allows it to place the GPUs a mere 2.5mm apart. That's much slimmer than the normal dual-slot coolers we see on this class of GPU.
Allied Control Immersion Cooling Five Nvidia GeForce 1070s
So, we've seen the big version, but what if we could fit it on a desk? The Allied Control demo in Gigabyte's booth consisted of a single-socket Xeon-W system paired with six Nvidia GeForce 1070s. The GeForce GPUs were running under full load, yet stayed below 66C.
Although the immersion cooling tank could fit on the top of the platform, there was some additional equipment underneath. The cabinet was sealed off, though. In either case, we're sure most enthusiasts wouldn't have a problem sacrificing some space for this much bling on their desktop.
Anything You Can Do, OSS Can Do Bigger
One Stop Systems (OSS) holds a special place in our hearts. We put 32 3.2TB Fusion ioMemory SSDs head-to-head against 30 of Intel's NVMe DC P3700 SSDs in the FAS 200 chassis last year. That same core design, albeit with some enhancements in power delivery, can house up to 16 Nvidia Volta V100s in a single chassis. These robust systems are used for mobile data centers, such as for the military, along with supercomputing applications.
Altogether, 16 Volta GPUs wield 336 billion transistors and 13,440 CUDA cores, making this one densely packed solution for heavy parallel compute tasks. If your interests are more focused on heavy storage requirements, the company also offers a wide range of densely-packed PCIe storage enclosures.