By Murat Kilicoglu, Principal, Cota Capital
AI has quickly become an essential element of modern life for many and is expected to revolutionize industries from healthcare to finance to retail. However, the widespread adoption of AI technologies is driving an unprecedented demand for computational power, thus putting an enormous strain on the power grid. In fact, the demand for energy in the U.S. is projected to hit record highs this year, partly due to the increasing number of data centers that are fueling the AI boom across the country.
And as AI gets more sophisticated, it requires evermore compute power, as illustrated by the fact that a single ChatGPT query uses nearly 10 times the electricity of a typical internet search. This surge in computing requirements directly translates to increased energy consumption and heat generation within data centers.
As a result, we are rapidly approaching a critical juncture where the available power infrastructure may no longer be sufficient to support the continued advancement of AI technologies. This looming power constraint threatens to impede progress, potentially stalling innovation and limiting the practical applications of AI across various sectors.
AI is hot (really hot)
Boston Consulting Group predicts that demand for data centers will comprise 16% of total U.S. power consumption by 2030. This is a significant increase from just 2.5% before ChatGPT’s release in 2022, and it is equivalent to the power used by about two-thirds of all U.S. homes.
A significant portion of this energy goes toward keeping these data center facilities cool. Indeed, in today’s data centers, cooling accounts for approximately 40% of electricity usage. Current cooling solutions that manage this heat are very power-intensive, further taxing the U.S.’s aging power grid.
It’s clear that we need a way to cool data centers more efficiently. If we can figure out how to do this, we can realize the full potential of AI—without threatening our power supply. Sadly, however, traditional air cooling methods are falling short as they struggle to keep up with the demand now being placed on data centers.
Today, the most common method of cooling, used by about 90% of data centers, is air cooling. It’s dominant because maintenance is straightforward, and data center personnel are familiar with the process. But while air cooling typically involves low initial costs, it has high operational expenses due to its energy consumption.
And that consumption will continue to increase. Advanced AI chips, for instance, generate significant amounts of heat due to their high power densities and computational demands. Air cooling, with its limited heat-transfer capacity, often fails to dissipate heat effectively, leading to thermal throttling and reduced performance. Air cooling systems also require substantial energy to power fans and air-conditioning units to maintain acceptable temperatures, especially in dense server environments. This, again, results in higher operational costs and increased energy consumption in data centers.
Liquid systems cool data centers better
Given the drawbacks of air cooling, data center operators are now switching gears and starting to examine liquid cooling technologies closely, especially as compute-intensive applications like AI continue to gain traction.
Liquid cooling can help reduce power consumption in several ways. For starters, it offers superior heat-transfer capabilities, allowing for more efficient heat removal from spaces densely packed with servers. This enables data centers to operate high-performance systems at peak efficiency while reducing energy consumption for cooling. Additionally, liquid cooling systems can handle higher heat loads in a smaller footprint, allowing for more compact and space-efficient data center designs.
This is particularly important as the demand for AI and high-performance computing (HPC) continues to grow. Improved thermal management also contributes to extended hardware lifespan and increased reliability, which are crucial factors in maintaining the continuous operation of mission-critical AI and HPC applications.
Air cooling systems, for their part, often necessitate large heatsinks, numerous fans and extensive ductwork, consuming valuable space in data centers and limiting scalability. In contrast, liquid cooling systems are more compact and can be integrated into the existing infrastructure with less spatial footprint. This makes liquid cooling more adaptable to the dense and growing requirements of AI hardware.
Liquid cooling options
Liquid cooling technologies are increasingly popular. While market adoption of liquid cooling is currently about 10%, this is expected to triple to about 30% by 2028, with a market opportunity totaling more than $15 billion.
The most common liquid methods are direct-to-chip cooling and immersion cooling, which include single-phase immersion cooling and two-phase immersion cooling. Each of these technologies has its own advantages and disadvantages, and each is suitable for different applications depending on specific cooling requirements, budget constraints, and operational considerations.
Direct-to-chip cooling involves circulating coolant directly through cold plates attached to individual chips, providing targeted and efficient cooling for large-scale data centers and small-scale applications like gaming PCs. However, this method requires complex installation, regular maintenance, and higher initial setup costs.
Single-phase immersion cooling submerges components in a liquid that remains in its liquid state as it absorbs heat and circulates through a cooling loop. This method offers a simpler tank design, improved reliability, and lower initial costs. However, it requires careful fluid handling and may need more frequent maintenance.
Two-phase immersion cooling takes advantage of the liquid-to-vapor phase change to enhance heat-transfer efficiency. In this method, the liquid absorbs heat, changes to vapor, then condenses back into liquid in a continuous cycle. This approach provides superior cooling efficiency, higher power density, and lower operational costs. The drawbacks include increased system complexity due to gas handling.
Our take on liquid cooling
Traditional air-cooling methods are typically effective for data center racks with power densities up to 10-15 kW per rack. Beyond this threshold, air cooling becomes less efficient and may require supplementary cooling methods like rear-door heat exchangers.
Direct-to-chip liquid cooling is generally the most viable solution when the density exceeds 20 kW per rack, with some advanced direct-to-chip systems going well beyond this. We believe immersion cooling makes the most sense above 40-50 kW per rack, and that two-phase liquid cooling is the most effective cooling technology.
Overall, we believe direct-to-chip will continue to be used for high-power density racks in the short-to-medium term. It’s not the most effective cooling method but it’s materially better than air cooling. However, our take is that two-phase immersion will win the liquid cooling play in the long-term. It took almost 20 years for direct-to-chip to see significant adoption in data centers, but we believe it will take less for two-phase to become more prevalent.
Already, immersion cooling has working applications in crypto and custom AI solutions, even though it is not yet in data centers at scale. The main hurdle for immersion cooling is achieving a rack-level solution that could be deployed at scale. The technology exists, but the design still needs to be developed—and we know companies are currently working on this.
Tapping into liquid cooling and other modern data center opportunities
As the industry continues to evolve, it’s clear that liquid cooling technologies will play a crucial role in addressing the cooling challenges of modern data centers. By embracing these innovative solutions, companies can not only improve their operational efficiency but also enable the continued growth of compute-intensive solutions like AI.
We are also seeing new opportunities created within data centers at the software layer, particularly in applications that enhance efficiency and management. Exciting developments include advanced AI-driven analytics for predictive maintenance, software-defined networking for optimized data flow, and energy management systems that leverage machine learning to reduce power consumption. As the demand for AI continues to rise, these infrastructure investments, coupled with innovative software solutions, will be critical in supporting the future growth of data centers.
We believe that an ecosystem of companies, including liquid cooling startups, is rapidly innovating to address the infrastructure requirements of the next generation of data centers and is poised for significant growth.