Combating cluster sprawl with the open source Omnia platform
Years ago people in computer stores talked a lot about server sprawl resulting from the continuous deployment of dedicated servers for single applications. While in many cases these servers were woefully underutilized, IT teams couldn’t easily share resources with other applications that needed more processing power. And then came server virtualization, which made it easier to share server capacity among multiple applications, and which helped solve the problem of server sprawl.
Organizations today are grappling with a similar problem, but on a larger scale. This problem is cluster spreading, which arises from the deployment of high-performance computing systems dedicated to particular compute-intensive applications used in different fields, such as those for data analysis, machine learning and data analysis. engineering simulations. Many organizations now find themselves with many islands of CPU that are underutilized and difficult to manage in a unified way.
The Gabriel Consulting Group recently weighed in on this issue in a report on a new software toolkit designed to make it easier to share and manage HPC resources.
“We find many organizations suffering from ‘cluster sprawl’ with smaller clusters dedicated to just one or a few workloads,” the company notes. “Today, many organizations are buying AI-centric clusters because they believe these applications need dedicated hardware. However, these clusters often end up with lower utilization and become compute silos in the infrastructure. Why not combine these systems into a larger package that can be used by everyone? “
And that brings us to the new Omni software suite, a toolkit that gives data centers a way to bring all of their compute together into a single, highly usable, and more easily managed resource pool.
Omnia at a glance
Omnia was developed at the Dell Technologies HPC & AI Innovation Lab, in collaboration with Intel and with support from the HPC community. This open source software is designed to automate the provisioning and management of HPC, AI, and data analytics workloads to create a single pool of flexible resources to meet growing and diverse demands.
The Omnia software stack includes an open source set of Ansible playbooks that accelerate the deployment of converged workloads with Kubernetes and Slurm, as well as library frameworks, services, and applications. Omnia automatically prints a software solution to each server based on the use case – for example, HPC simulations, neural networks for AI, or in-memory graphics processing for data analysis – to reduce time deployment from a few weeks to a few minutes.
Community involvement and contribution is important to the advancement of Omnia. To this end, Arizona State University Research Computing has worked closely with the Dell Technologies HPC & AI Innovation Lab on the development of Omnia to better support mixed workloads, including simulation, high compute. flow and machine learning.
A third party review
In its review of the Omnia platform, the Gabriel Consulting Group discussed a wide range of software capabilities and benefits. This concerns in particular the management of:
Pooling Resources – “Customers using Omnia can quickly deploy HPC or AI clusters ready for users to populate with the application stacks they need,” the company notes. “With Omnia, all of an organization’s compute resources can be brought together to create a single infrastructure pool that can then be split to run workloads. “
Custom Clusters – “Omnia allows customers to dynamically divide large clusters into custom-designed logical clusters for specific workload configurations,” explains Gabriel. “Clusters can be built, destroyed, merged with other material resources, quickly and easily. “
Large number of clusters – “Omnia can be used to create and manage groups of servers ranging from a single system to all systems in the data center,” the company emphasizes. “For us, it’s most valuable when used to deploy a large number of uniquely configured clusters and stacks for many users. “
The bottom line
In the conclusion of its report, Gabriel Consulting Group says that Omnia may well be the answer to the challenges of building and managing increasingly large and complex compute infrastructures and the growing problem of cluster sprawl. .
“It ticks all the boxes in terms of rationalization and virtualization of HPC infrastructures,” notes the firm. “This makes the creation and actual use of compute clusters faster and easier, and it reduces the workload for administrators and users. These are all great victories in our book.
For a more in-depth dive into the Omnia platform, check out Gabriel Consulting Group’s “Omnia Fights Cluster Sprawl” report. And to get started with Omnia, download the software from GitHub at https://github.com/dellhpc/omnia, then join the community that helps guide the design and development of the next generation of open consolidated cluster deployments. source tools.
Copyright © 2021 IDG Communications, Inc.