Put to good use
Focusing on the energy impact of training models, however, may be a distraction. Boris Gamazaychikov, who is in charge of AI sustainability at Salesforce, a software company, compares it to trying to estimate the carbon footprint of a flight by including the impact of building the plane itself. Not only is that construction cost tiny compared with the fuel used over a typical lifetime in service, it’s also impossible to calculate the per-passenger impact until the aircraft is finally retired.

Instead, he says, it is best to focus on the energy impact of using AI, a process called inference. Brent Thill of Jefferies, an analyst, estimates that this stage accounts for 96% of the overall energy consumed in data centres used by the AI industry. Mr Gamazaychikov is trying to put hard numbers on that side of the industry, working with HuggingFace, an AI cloud provider, to systematically test the efficiency of hundreds of AI models. The results show the difficulty of generalising: the difference between the most and least power-hungry models is more than 60,000-fold.

Some of that difference arises from the AI models’ varying purposes. The most efficient model tested, called BERT-tiny, draws just 0.06 watt-hours (Wh) per task—about a second’s worth on an exercise bike—but is useful only for simple text-manipulation tasks. Even the least power-hungry image-generation model tested, by contrast, requires 3,000 times as much electricity to produce a single image.

All the same, says Sasha Luccioni of HuggingFace, concrete figures are not always available. Her company could test only the models it could download and run on its own hardware. “OpenAI has not released a single metric about ChatGPT,” Ms Luccioni says, even though such data exist.

Another difficulty in calculating energy use is the fact that AI models are rapidly evolving. The release of DeepSeek V3 in December, a top-tier AI model made by a lab spun off from a Chinese hedge fund, initially looked like good news for those concerned about the industry’s energy use. A raft of improvements meant that the final training run was more than ten times faster than that of Meta’s Llama 3.3 model just a few weeks earlier, with a roughly proportionate reduction in power used. Inference also became less power-hungry.

In January, as the implications of that improvement became clear, the stock prices of chipmakers crashed. But Satya Nadella, the boss of Microsoft, predicted the upset would be brief, citing the Jevons paradox, a 19th-century observation that the rising efficiency of steam engines opened up new economic uses for the technology and thereby raised demand for coal.

For AI, the rebound effect arrived in the form of “reasoning” models, including DeepSeek’s follow-up model, R1. If normal chatbots exhibit what Daniel Kahneman, a psychologist and Nobel economics laureate, called “type one” thinking—prioritising speedy responses—reasoning models display “type two”: structured replies that attempt to break a problem into its constituent parts, solve it with a variety of approaches, and check their answer is correct before settling on it as the final response.

Training a reasoning model is not much harder than training a normal AI system, especially if you have pre-existing models to learn from. But running it requires significantly more power, since the “reasoning” step, in which the problem is thought through before a final answer is reached, takes longer. The efficiency improvements DeepSeek pioneered in V3 were more than eaten up by the extra thinking time used by R1 a couple of months later.
 
 
Back to Top