Electronics and fluids don’t generally mix. But teams from different corners of the globe are showing that immersing data-center gear in specialized fluids could be the best way to keep them cool.
Computers may fail if they get too hot, so they often use power-hungry fans to cool them down. Recently engineers have deployed ways to cool supercomputers by circulating water in pipes near processors. Fluids are far more dense than air, which makes them more efficient at drawing heat away from computers. This efficiency is increasingly important—a 2023 study finds that the energy required to keep the servers in data centers from overheating represents 30 to 40 percent of the total energy that data centers consume.
However, water cooling faces problems of its own. The water carrying heat from computers is typically piped to cooling towers. There its heat converts a separate supply of water into mist that evaporates into the atmosphere. In 2022, Google’s data centers consumed about 19 billion liters of freshwater for cooling.
Sandia researchers are testing out cooling computers by submerging them entirely in nonconductive oil.
Now, two separate results are putting a different technology on the map—immersion cooling, or dunking whole data centers in oil. The oil is nonconductive and noncorrosive, so that it can be in direct contact with electronics without short-circuiting or damaging them. The technology holds the potential to cut energy usage in half, says Oliver Curtis, co-CEO of immersion-cooled data center company Sustainable Metal Cloud.
“We’ve proven that you can get the same amount of performance, but for half the amount of energy, and if you can do that, it’s our social responsibility to proliferate this technology,” Curtis says.
Dunking an AI Factory
Yesterday, the MLPerf AI training competition announced a new benchmark—energy consumption. As the name suggests, it measures the power each submitting machine consumes when performing each of its other benchmarks, such as training a large language model or a recommendation engine. This new category had only one submitting organization, Singapore-based Sustainable Metal Cloud (SMC).
SMC was looking to show off the efficiency gains that result from its immersion-based cooling system. The system’s fluid is an oil called polyalphaolefin, which is a commonly used automotive lubricant. The oil is forced slowly through the dunked servers, allowing for efficient heat transfer.
The SMC team has figured out what modifications need to be made to servers to make them compatible with this cooling method over the long-term. Beyond removing the built-in fans, they switch out thermal interface materials that connect chips to their heat sinks, as some of those materials degrade in the oil. Curtis says the modifications they make are small but important to the functioning of their setup.
“What we’ve done there is we’ve created the perfect operating environment for a computer,” Curtis says. “There’s no dust, there’s no movement, no vibration, because there’s no fans. And it’s a perfect operating temperature.”
SMC’s systems, which it calls HyperCubes, consist of 12 or 16 oil tanks, each housing a server. The servers are connected to each other in between tanks through ordinary interconnects, looping out of the oil in one tank and into the adjacent tank. Curtis claims that this approach saves 20 to 30 percent of total energy usage at the server level.
In addition, SMC builds sitewide heat-exchange systems, one for each HyperCube. In a traditional data center, in addition to fans attached directly to servers, centralized air conditioning is needed to keep servers cool. Curtis says the system-level heat exchanger does the job of the A/C more efficiently, supplying a further 20 percent energy reduction.
SMC calls its combined HyperCubes and dedicated heat exchangers “AI Factories.” The company deployed its first HyperCube in Tasmania in 2019, and subsequently built and delivered more than 14 others in Australia. In 2022, SMC installed its first AI Factory in Singapore, accessible via the cloud for commercial use in Asia.
Benchmark | SMC Energy (kJ) | SMC Time to Train | Best Time to Train |
Natural language processing | 1,793 | 5.39 | 5.31 (Supermicro) |
Recommender systems | 1,266 | 3.84 | 3.84 (SMC) |
GPT-3 | 1,676,757 | 56.87 | 50.73 (Nvidia) |
Image recognition | 7,757 | 2.55 | 2.49 (Oracle) |
Object detection | 21,493 | 6.31 | 6.08 (Nvidia) |
Medical imaging | 5,915 | 1.83 | 1.83 (SMC) |
Because SMC was the only company to enter MLPerf’s new energy category, it is hard to validate its exact energy-saving claims. However, the performance of its platform on various benchmarks was on par with comparable competitors—that is, other systems that, like SMC, use Nvidia’s H100 GPUs in the same numbers. And its energy results are now out there as a gauntlet, thrown down for other companies to beat.
Researching Oil for the Chill
Separately, Sandia National Laboratories, in New Mexico, is testing immersion cooling with the aim of providing an independent, publicly available assessment. So far, immersion cooling “has a lot of advantages, and it’s really hard for me to see any disadvantages that would sway me to other technologies,” says Dave Martinez, engineering program project lead for Sandia’s infrastructure computing services.
The liquid Sandia is using is from Submer Technologies in Barcelona. It’s is a synthetic, biodegradable, nontoxic, nonflammable, noncorrosive fluid made using food-grade components. The fluid has 1/8th the electrical conductivity of air and is roughly the viscosity of cooking oil, Martinez says.
In tests, Sandia is placing entire computers—server racks and their power cables—in immersion tanks loaded with the fluid. This strategy aims to capture all of the heat the electronics generate to provide even cooling. The coolant gives up its heat to the open air, given the right difference in temperature.
According to Submer, its immersion cooling system is 95 percent more efficient than traditional cooling technologies. Martinez suggests it may cut energy consumption by 70 percent compared with standard methods. In addition, after the coolant absorbs heat, it can be used to warm buildings during winter months, he says.
When it comes to replacing a component—say, a chip on a board—a gantry system above the tank can lift out a server rack. “We just let it drip until there’s no oil left,” Martinez says. “We might have to clean it all up a tiny bit, not a whole lot. It is just one more step than a normal system. But my assumption is that the failure rate of these parts will go down a lot because the cooling is more effective than a fan-based system.”
In partnership with Albuquerque-based data company Adacen, Martinez and his colleagues began testing Submer’s fluid and equipment in May.
“Right now, we’re seeing a lot more pros than cons,” Martinez says. “It’s not just the energy saved, which is pretty tremendous. Without all the fans, there’s virtually no noise, too. You might not even know there’s a data center there.”
Sandia’s tests involve checking temperatures inside and outside the immersion tank, measuring the amount of energy that cooling requires, the reliability of the hardware, examining whether some coolant flow patterns work better than others, calculating infrastructure costs, and figuring out how best to use fans or water to remove what heat the coolant does release. The lab also plans to overclock the computers and see how much of a performance boost the coolant might provide without damaging the electronics, Martinez says.
Submer notes there are potential challenges its coolant faces. For instance, plasticizer compounds in PVC cables may leak into the coolant, potentially making the cables stiffer and brittle. However, the company notes that cables with outer sheaths made of materials like polyurethane resin do not show this problem.
Sandia plans to finish its tests in July and write up its results in August. “Sandia is exploring what our next data center is going to look like,” and immersion cooling could play a part, Martinez says. “Right now this is looking pretty good as a player in our future.”
Charles Q. Choi is a science reporter who contributes regularly to IEEE Spectrum. He has written for Scientific American, The New York Times, Wired, and Science, among others.
Dina Genkina is an associate editor at IEEE Spectrum focused on computing and hardware. She holds a PhD in atomic physics and lives in Brooklyn.