Accelerated Flash Box: Powerful Computing

image: A working prototype of Flash’s Accelerated Box. Hardware components are all standard for ease of adoption. Accelerators and storage devices are placed in U.2 slots in the front bays, while there is also an internal PCIe (peripheral component interconnect express) slot used for accelerator hardware.
to see Continued

Credit: Los Alamos National Laboratory

Data is an essential part of solving complex scientific questions, in areas ranging from genomics to climatology to the analysis of nuclear reactions. However, an abundance of data is often only as good as the ability to efficiently store, access and manipulate that data. To facilitate the discovery of big data problems, Los Alamos researchers, in collaboration with industry partners, have developed an open storage system acceleration architecture for scientific data analysis, which can provide 10 to 30 times the performance of current systems.

The architecture allows intensive functions to be offloaded to a network-attached, programmable, accelerator-enabled storage device called the Accelerated Box of Flash or simply ABOF. These systems are intended to be a key element of the Laboratory’s future high-performance computing platforms.

“Data science and the data-driven discovery techniques used to analyze that data are growing rapidly,” said Dominic Manno, a researcher in the lab’s high-performance computing division. “Performing the complex analysis to enable scientific discovery requires enormous advances in the performance and efficiency of scientific data storage systems. The ABOF Programmable Appliance makes it easier for high-performance storage solutions to take advantage of rapid improvements in network and storage device performance, ultimately making more scientific discoveries possible. Placing compute near storage minimizes data movement and improves the efficiency of simulation and data analysis pipelines.

Accelerates scientific computing

Scalable computing systems adopt data processing units (DPUs) placed directly in the data path to accelerate intensive functions between processors and storage devices; However, the ability to leverage DPUs in production-grade storage systems for use in complex HPC simulation and data analysis systems has proven challenging. Although DPUs have specialized computing capabilities suitable for data processing tasks, their integration into HPC systems has not fully realized the efficiencies available.

The ABOF appliance is the product of a hardware and software co-design of the storage system. It enables easier use of NVIDIA BlueField-2 DPUs and other accelerators to offload intensive operations from host processors without major storage system software changes and allows users to take advantage of these offloads and resulting speedups without any application changes.

The current ABOF implementation accelerates three critical functional areas required for storage system operation (compression, erasure coding, and checksums) by applying specialized accelerators. Each of these functions represents time, expense, and energy consumption in storage systems. It uses BlueField-2 DPUs with a 200 Gb/s InfiniBand network. Performance-critical functions of the popular Linux Zettabyte File System (ZFS) are offloaded to ABOF accelerators. This ZFS offload is achieved using a new ZFS interface for accelerators (available on the GitHub software platform). The Linux DPU Services Module, also on GitHub, is a Linux kernel module that enables the use of DPUs directly from the kernel, regardless of their location along the data path.

Released in January, successful demo

The project was successfully demonstrated in-house after the January release of the ABOF appliance hardware and supporting software.

Collaborators included NVIDIA, which built the data processing units and provided a scalable storage fabric; Eideticom, which created the NoLoad computing storage stack used to accelerate data

intensive operations and minimize data movement; Aeon Computing, which designed and integrated each component into a storage enclosure; and SK hynix, which has partnered to provide fast storage hardware.

“HPC solves the world’s most complex problems as we enter the era of exascale AI,” said Gilad Shainer, senior vice president of networking at NVIDIA. “NVIDIA’s Accelerated Computing Platform dramatically improves the performance of innovative exploration by pioneers such as Los Alamos National Laboratory, enabling researchers to dramatically accelerate breakthroughs in scientific discovery.”

“The next-generation open storage architecture enables a new level of performance and efficiency through its hardware-software co-design, open standards, and innovative use of technologies such as DPUs, NVMe, and compute storage. “said Stephen Bates, Chief Technology Officer at Eideticom. “Eideticom is proud to work with Los Alamos National Laboratory and other partners to develop the computing storage stack used to show how this architecture can achieve these new levels of performance and efficiency. Effective use of accelerators, combined with innovative software and open standards, is the key to the next generation of data centers. »

“Developing a cutting-edge storage product with an end user has been a very positive experience,” said Doug Johnson, co-founder of Aeon Computing. “Working with technology vendors and the end user collaboratively has enabled rapid iteration and improvement of a new type of storage product that will serve the most important purpose any product can have, accelerating the end-user workflow.”

“SK hynix joined this collaboration building ABOF because we understand the need for a new flash-based system that can speed up data analysis,” said Jin Lim, Vice President of Solution Lab at SK hynix. . “Building on this demonstration technology, we are committed to working with collaborative partners to further define the new computing storage device architecture and requirements critical to its best use cases.”

Building on the File System Acceleration Project, the researchers plan to continue integrating a set of common analysis functions into the system. This functionality would allow scientists to analyze the data using existing programming, potentially avoiding the need for additional data movements and supercomputing resources. This functionality would be specialized and tailored to the scientific community – another robust tool for tackling the complex and data-intensive questions underlying the challenges of our world.

Warning: AAAS and EurekAlert! are not responsible for the accuracy of press releases posted on EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.

About Jon Moses

Check Also

Why should you learn infrastructure as code as a DevOps engineer? | by Lakshmi Narasimman V | Narasimman Technology | May 2022

freepik Dynamic Systems for the Cloud Age — Kief Morris This blog post introduces you …