A cutting-edge, high performance supercomputer is set to be delivered to the Energy Department’s Los Alamos National Laboratory in early 2023.
“With an innovative balance of memory bandwidth and capacity, this next-generation system will shape our institution’s computing strategy,” the lab’s Director Thom Mason said Monday.
Hewlett Packard Enterprise will provide the supercomputing architecture for the novel system, which will also feature Nvidia’s Arm-based Grace Central Processing Unit. Named for computer-programming pioneer Grace Hopper, the Grace CPU uses energy-efficient Arm cores and was explicitly designed for large-scale HPC and artificial intelligence applications. This new chip stems from designs of Arm Holdings, the British semiconductor-licensing giant Nvidia is moving to acquire. It was unveiled this week at Nvidia’s GTC conference, as the company’s first data center CPU.
Energy’s New Mexico-based lab will be the first U.S. customer to receive the advanced processor, and the Swiss Supercomputing Center will be the first to gain the product abroad.
“LANL has been working towards a strategy for tailoring processors to improve performance and efficiency for its most demanding 3D, multi-physics, multi-link scale, and multi-resolutional problems to move time to solution from about 6 months to 6 days,” the lab’s HPC Division Leader Gary Grider told Nextgov Wednesday. “Current CPU and GPU architectures are very inefficient for this class of problems so attempting to tailor processors to our problems is one option we are pursuing.”
He explained that the entire Arm ecosystem “is all about being able to tailor processors to your needs,” so LANL had been investing and showing interest in such options. Previously, the lab was funding technology company Marvell’s development of the thunder line of Arm server-class processors, but according to Grider, Marvell abandoned that activity.
“Working with Nvidia on this new Arm server-class processor was a natural move to make and we have been engaged for well over a year on this effort,” he noted. “We are hoping this is a first step to a longer relationship to generate more powerful and tailorable Arm-based processors with extreme memory bandwidth and features that help address our most demanding workloads.”
The price LANL is paying for the item is not yet being shared, Grider said—though he confirmed that, via HPE, there is a contract that underpins the machine’s purchase, “and it is the basis for some of the collaboration.”
In its press release, the lab said the supercomputer will enable researchers to embark on new discoveries across climate, diseases, materials and nuclear deterrence.
“The machine has three purposes,” Grider added. It’s meant to be a first-class resource for LANL’s artificial intelligence, machine learning, simulation and other workloads, and an asset through which scientists in the lab’s weapons program can try new concepts out on its important and emerging workloads. Thirdly, he said it will be a “resource to learn about and help shape future tailored architectures to move the bar on our most challenging problems reducing solution time from months to days over the next five to six years.”
The entities involved in bringing this sophisticated system to reality are engaging in what the lab called a multi-year collaboration focused on codesign—or an approach that combines the expertise of “vendors, hardware architects, system software developers, domain scientists, computer scientists, and applied mathematicians” to ensure informed choices are made regarding various hardware and software components.
“It’s codesign at work, but it is what we think of as codesign2,” Grider explained. He noted that today most codesign is dominated by making applications performance portable so that they can leverage modern processors being produced for popular application areas, such as AI and ML.
“We think of codesign2 as doing somewhat the opposite—tailor the architecture more to the problems at-hand, instead of tailoring the applications to match architectures suited for different problems that happen to be popular,” he said. “This machine is the beginning of that strategy.”