Why all-flash storage solutions are the optimal choice for AI projects and how they address evolving needs in the tech sector. Explore the profound impact of AI on data storage with insights from Alex McMullan, CTO International at Pure Storage.
As a technology with huge but unrealised potential, AI has been on the corporate agenda for a long time. This year, it has undoubtedly gone into overdrive due to Microsoft’s $10 billion investment in OpenAI, together with strategic initiatives by Meta, Google and others in generative AI. Although we’ve seen many advances in AI over the years and arguably just as many false dawns in its widespread adoption, there can be little doubt now that it’s here to stay. As such, now is the time for CTOs and IT teams to consider the implications of the coming AI-driven era.
In terms of its likely impact on the technology sector and society in general, AI can be likened to the introduction of the relational database in that it was the spark that ignited a widespread appreciation for large data sets — resonating with both end users and software developers. AI and ML can be viewed in the same terms as they provide a formative foundation for not only building powerful new applications but also enhancing and improving the way we engage with groundbreaking technology alongside large and disparate datasets. We’re already seeing how these developments can help us solve complex problems much faster than was previously possible.
Understanding AI data storage challenges
To understand the challenges that AI presents from a data storage perspective, we need to look at its foundations. Any machine learning capability requires a training data set. In the case of generative AI, the data sets need to be very large and complex, including different types of data.
Generative AI relies on complex models, and the algorithms on which it is based can include a very large number of parameters that it is tasked with learning. The greater the number of features, size and variability of the anticipated output, the greater the level of data batch size combined with the number of epochs in the training runs before inference can begin.
Generative AI is essentially tasked with making an educated guess or running an extrapolation, regression or classification based on the data set. The more data the model has to work with, the greater the chance of an accurate outcome or minimising the error/cost function. Over the last few years, AI has steadily driven the size of these datasets upwards. Still, the introduction of large language models, upon which ChatGPT and the other generative AI platforms rely, has seen their size and complexity increase by an order of magnitude. This is because the learned knowledge patterns that emerge during the AI model training process need to be stored in memory — which can become a real challenge with larger models.
Checkpointing large and complex models also puts huge pressure on underlying network and storage infrastructure, as the model cannot continue until the internal data has all been saved in the checkpoint; these checkpoints act as restart or recovery points if the job crashes or the error gradient is not improving.
Given the connection between data volumes and the accuracy of AI platforms, it follows that organisations investing in AI will want to build their own very large data sets to take advantage of the unlimited opportunities that AI affords. This is achieved through utilising neural networks to identify the patterns and structures within existing data to create new, proprietary content. Because data volumes are increasing exponentially, it’s more important than ever that organisations can utilise the densest, most efficient data storage possible to limit sprawling data centre footprints and the spiralling power and cooling costs that go with them. This presents another challenge that is beginning to surface as a significant issue — the implications massively scaled-up storage requirements have for achieving net zero carbon targets by 2030-2040.
AI will impact sustainability commitments because of the extra demands it places on data centres at a time when CO2 footprints and power consumption are already a major issue. This will only increase pressure on organisations, but it can be accommodated and managed by working with the right technology suppliers. The latest GPU servers consume 6-10kW each, and most existing data centres are not designed to deliver more than 15kW per rack, so there is a large and looming challenge for data centre professionals as GPU deployments increase in scale.
Flash optimal for AI
Some technology vendors are already addressing sustainability in their product design. For example, all-flash storage solutions are considerably more efficient than their spinning disk (HDD) counterparts. Some vendors are even going beyond off-the-shelf SSDs, creating their flash modules which allow all-flash arrays to communicate directly with raw flash storage, which maximises the capabilities of flash and provides better performance, power utilisation, and efficiency.
As well as being more sustainable than HDD, it’s also a fact that flash storage is much better suited to running AI projects. The key to results is connecting AI models or AI-powered applications to data. To do this successfully requires large and varied data types, streaming bandwidth for training jobs, write performance for checkpointing (and checkpoint restores), random read performance for inference and crucially, it all needs to be 24×7 reliable and easily accessible across silos and applications. This set of characteristics isn’t possible with HDD-based storage underpinning your operations; all-flash is needed.
Data centres are now facing a secondary but equally important challenge that the continued rise of AI and ML will exacerbate. That is water consumption, which is set to become an even bigger problem — especially when considering the continued rise in global temperatures.
Many data centres utilise evaporative cooling, which works by spraying fine mists of water onto cloth strips, with the ambient heat being absorbed by the water, thus cooling the air around it. It’s a smart idea, but it’s problematic, given the added strain that climate change is placing on water resources — especially in built-up areas.
As a result, this cooling method has fallen out of favour in the past year, resulting in a reliance on more traditional, power-intensive cooling methods like air conditioning. This is another reason to move to all-flash data centres, which consume far less power and don’t have the same intensive cooling requirements as HDD and hybrid.
The road ahead for AI and data storage
As AI and ML continue to evolve rapidly, the focus will increase on data security (to ensure that rogue or adversarial inputs can’t change the output), model repeatability (using techniques like Shapley values to gain a better understanding of how inputs alter the model) and stronger ethics (to ensure this very powerful technology is used to benefit humanity).
All these worthy goals will increasingly place new demands on data storage. Storage vendors are already factoring this into their product development roadmaps, knowing that CTOs will be looking for secure, high-performance, scalable, efficient storage solutions that help them achieve these goals. The focus should not be entirely on the capabilities of data storage hardware and software, the big picture is very big indeed.