US Government's Aurora Supercomputer Delayed Due to Intel’s 7nm Setback

When Intel announced earlier this year that its 7nm process technology would be delayed, it brought implications for Aurora, the first Intel-based exascale supercomputer. There was no clear answer back in July, but an official for the U.S. Department of Energy’s (DoE) Office of Science confirmed this week that the system will be delayed.

As reported by HPCwire, the DoE does not see this is as a major problem, noting that the Argonne National Laboratory, Aurora’s operator, has a contingency plan in place.

Aurora Supercomputer Delayed

“Yes, we have indications that the Aurora system will be delayed,” said Barb Helland, associate director of the Office of Science for Advanced Scientific Computing Research (ASCR) of the DoE’s Office of Science. The exec added that Argonne is cooperating with Intel to “mitigate the consequences not only to Argonne, but to the Exascale Computing Project and to the nation’s high-performance computing users.”

“It’s not unexpected that when we’re entering into contracts for the most advanced supercomputers in the world, 4-5 to five years before they’re deployed, that there will be some schedule delays,” said Helland. “For that reason, we build both cost and schedule contingencies into our project budgets.”

The Aurora supercomputer is based on Intel’s next-generation Xeon processor codenamed Sapphire Rapids running the Golden Cove microarchitecture, as well as the company’s first datacenter GPU codenamed Ponte Vecchio, which is powered by the Xe high-performance computing (HPC) architecture.

Sapphire Rapids is made using Intel’s 10nm Enhanced SuperFin process technology that's expected to be on-track for mass production in 2021. Meanwhile, Intel’s Xe-HPC Ponte Vecchio' GPU is a multi-tile chiplet design using a base tile produced using Intel’s 10nm SuperFin fabrication technology, an Xe-Link I/O tile made by a foundry, a Rambo Cache tile fabbed at the 10nm Enhanced SuperFin process, as well as a Compute Tile that was supposed to use Intel’s 7nm node, which was delayed by about six months. Last month, Intel revealed that the Compute Tile could be made both at an external foundry as well as internally,

Intel says that it has always envisioned Ponte Vecchio as a multi-chiplet product with tiles coming from various sources. Making a key tile at an external foundry is not a problem per se, but tailoring the design’s thermals, voltages and packaging to other parts will take some time. Intel’s Ponte Vecchio will be used outside of Aurora, so it makes sense for Intel to eventually produce its main Compute Tile at its own fabs, but this means that there will be two versions of the Xe-HPC Ponte Vecchio GPU.

Each Aurora blade features two Intel Xeon Scalable "Sapphire Rapids" processors, as well as six Intel Xe-HPC "Ponte Vecchio" GPUs. That's means volume production of Intel's datacenter graphics chips is crucial to enable Aurora.

The First Exascale Supercomputers

So far, the U.S. DoE has revealed three exascale-class supercomputers. Argonne’s National Laboratory’s Aurora was the first system, announced in March 2019, and is expected to deliver over 1 ExaFLOPS performance.

Oak Ridge National Lab’s Frontier supercomputer, powered by AMD’s Epyc ‘Milan’ processors and Radeon Instinct MI200 graphics ,was unveiled in May 2019 and is on-track to deliver 1.5 ExaFLOPS performance in 2021.

This March, the DoE announced Lawrence Livermore Lab’s El Capitan system that is set to hit 2 ExaFLOPS in 2023 using AMD’s Epyc "Genoa" CPUs and AMD CDNA GPUs.

All three systems use HP Enterprise's Cray EX architecture, so they will have many things in common. Aurora is the only Intel-powered supercomputer out of the three.

However, we don't know when Aurora will arrive, and the supercomputer has already faced a major setback. The system was first announced in 2015 and was described as an Intel Xeon Phi "Knights Hill"-powered 180 TeraFLOPS supercomputer due in 2018. Since Intel cancelled its Knights Hill in 2017, the original Aurora project was pushed back too.

TOPICS

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

7 Comments Comment from the forums

whatisupthere

Too bad they didn't pick EPYC. Would likely be right on schedule
Reply
JarredWaltonGPU

whatisupthere said:
Too bad they didn't pick EPYC. Would likely be right on schedule
Because Ponte Vicchio is the delay, not Sapphire Rapids? The only potential fix that wouldn't have resulted in delays would be using Nvidia A100, or maybe an AMD RDNA2 chip.
Reply
Kamen Rider Blade

Please cancel Intel's contract and go with AMD or IBM
Reply
Co BIY

At this level the customer (government) has so much power that they have to make a deliberate choice to keep competitive players in the business or they will not be able to have competition for their contracts in the future and maybe even no bidders if a sole source gets damaged by some event.

One solution is a "spread the wealth" system of contracts to multiple vendors. They compete for them but the decision makers make sure everyone wins some.
Reply
Jimbojan

Intel's Aurora system is likely 30 - 40% more power efficient than AMD's, thus in a 3-4 months time, Aurora may save the country for the cost of AMD's system ( 2.5 - 3 Million watts power ), it is worth the wait.
Reply
JarredWaltonGPU

Jimbojan said:
Intel's Aurora system is likely 30 - 40% more power efficient than AMD's, thus in a 3-4 months time, Aurora may save the country for the cost of AMD's system ( 2.5 - 3 Million watts power ), it is worth the wait.
I'm not sure anyone can say that with any degree of certainty right now. AMD's 7nm Zen 2 parts are better efficiency and performance per watt than Intel's current parts. Will SuperFin close the gap? Perhaps. Nvidia's A100 parts are high performance but also very high power (up to 400W each), while AMD's alternative GPUs are not yet released in any form. I 'm sure one of the designs for a next-gen supercomputer will be more efficient -- either Frontier or Aurora -- but we're probably two years away from knowing which one ends up ranking higher on the Green 500.
Reply
Avro Arrow

Co BIY said:
At this level the customer (government) has so much power that they have to make a deliberate choice to keep competitive players in the business or they will not be able to have competition for their contracts in the future and maybe even no bidders if a sole source gets damaged by some event.

One solution is a "spread the wealth" system of contracts to multiple vendors. They compete for them but the decision makers make sure everyone wins some.
I don't think that's true because I haven't been any Opteronsupercomputers since Titan. Also, Intel is in no danger of going out of business. AMD was in danger of going out of business but the US government wasn't falling all over themselves to give design wins to Bulldozer-based Opterons. I think that they only do things like that with defence contractors because that's often their only source of revenue.

Jimbojan said:
Intel's Aurora system is likely 30 - 40% more power efficient than AMD's, thus in a 3-4 months time, Aurora may save the country for the cost of AMD's system ( 2.5 - 3 Million watts power ), it is worth the wait.
I'd love to see proof of this because AMD has been absolutely crushing Intel when it comes to efficiency and performance-per-watt. Even if Intel does come out with something that's more efficient than AMD has NOW, AMD isn't exactly sitting still and their EPYC architecture makes current Xeons look like overpriced little-league CPUs. Intel hasn't exactly been a hotbed of innovation in the last ten years (to put it mildly) and corporate culture doesn't turn on a dime. Remember that Jim Keller left Intel after a very short amount of time and the rumours are that he hated how rigidly monolithic the corporate culture at Intel is.
Reply

Show more comments