When it comes to the planet, things are getting dangerously hot, and human consumption of power is a contributing factor. We use so much power that the power grid that supplies us is groaning under the weight. Just as worries about these matters have risen to the public consciousness, so has the unavoidable fact that AI is a ravenous beast that consumes power at jaw-dropping rates.
“Data centres are significant drivers of growth in electricity demand in many regions,” says a report from the International Energy Agency (IEA). “Data centres’ total electricity consumption could reach more than 1,000 TWh in 2026. This demand is roughly equivalent to the electricity consumption of Japan.”
NVIDIA CEO Jensen Huang believes AI will, one day, solve this problem itself. If further developed, he said in a news release, AI improvements will lead to “better energy, better carbon-capture systems, better energy-generating materials — and, ultimately, a grid capable of supporting AI energy demands."
Until then, however, when you have high demand for a product where supply is a challenge, prices go up. This situation is putting a heavy strain on the operating budgets of many companies that rely on AI, and many of them are looking for ways to use less power.
What are the strategies they're turning toward to lower their energy demand?
Buying Efficient Hardware
When you are shopping for the hardware to run your AI operations, include power usage as a line item on your budget. Power costs are significant for AI. Buying hardware that is power efficient will pay dividends in the long run.
High-end processors will often maximize the AI results you get from the power you feed them. “If you have the budget and want to invest in AI long term, choose specialized chips such as the NVIDIA A100 GPUs and Google TPUs," said Mithilesh Ramaswamy, a senior engineer at Microsoft. "They will outperform general-purpose CPUs.”
In some cases, though, you might also be able to use less expensive hardware to limit the power you throw at your AI operations. “If AI models do not require ultra-high performance, swapping to energy-efficient hardware can have a huge impact on reducing power use," said Adam Bushell, director of AB Electrical & Communications. "I have seen companies in manufacturing swap their AI-powered monitoring systems to lower-power alternatives without sacrificing accuracy.”
Related Article: Green AI Is a Competitive Advantage — Here’s Why It Matters
Improving Cooling Systems
When you fill a room with high-power computing equipment and ask that system to run nearly constantly, things get hot. Those processors do not like a hot environment, so keeping them cool is a necessary part of your operation. How much air conditioning do you throw at that problem? If you have not optimized your cooling solution, it is likely to be a big part of your power expenditure problem.
“AI-heavy setups produce heat,” said Bushell. “Inefficient cooling burns more energy than people realize.” Your cooling setup is a good place to start if you are trying to lower your power expenditure. “Simple changes in airflow design, liquid cooling or even relocating servers to cooler areas can cut down on energy-hungry air conditioning systems."
Timing Operations During Low Demand
“One way to cut down on AI energy use is by timing operations to run when electricity demand is lower,” said Bushell. “Many businesses leave AI processes running around the clock.” This is inefficient and, usually, unnecessary. “Timing heavy workloads to off-peak hours reduces strain on the grid and takes advantage of cheaper, cleaner energy,” Bushell continued. “We have done this with clients running machine learning models, scheduling their training overnight when the grid is not overloaded.”
You could take a big-picture look at this, too. If you are planning a large AI project, schedule it to occur in the cooler months so that it won’t compete with people turning on their air conditioners. When it comes to power, lower demand usually means lower prices and less drain on the grid. If you are located in a cold climate, plan heavy AI-use projects for summer months so it doesn’t compete with people turning on heat.
Using Prompt Caching
“Prompt caching is my favorite strategy to reduce energy consumption, latency and cost,” said Nathan Brunner, CEO of boterview. To do this, you have to develop a system, or use the one built into the AI you use, to store the prompts you have already used. If the AI gets that same question again, it pulls the answer it already generated instead of repeating that query. This saves a considerable amount of compute power, and speeds up response time. You're essentially creating a FAQ of already asked questions for users to tap.
“When the same input is received again, the system can quickly return a cached answer instead of recomputing it from scratch,” said Brunner. “This method works very well when you are using system prompts or if your users frequently input the same prompts.” This won’t help, of course, if your team or customers always ask unique questions.
Capping Power Usage
Capping the power that a GPU is allowed to draw can cut power use considerably, as much as 15% according to some estimates. Many hardware manufacturers offer a way to limit the amount of power a GPU can use. The system then figures out how to do the work you have asked for with less power. Often, this means it will do it more slowly.
While this is effective at saving power, it could slow your work processes. But usually not by much. If you are training a model, a high-energy operation, it might take a few hours longer, but it can save a significant amount of power.
Related Article: How to Evaluate and Select the Right AI Foundation Model for Your Business
Using Foundation Models
Foundation models, those that only require training once, are a more sustainable option in some instances, but won’t work for every application. When it is appropriate, it can save considerable power.
IBM’s geospatial foundation model, for example, can be used to track deforestation, detect greenhouse gases or predict crop yields. And it doesn’t require further training. BERT, developed by Google, is a natural language model that is good at things like translation, sentiment analysis and sentence classification. GPT, the foundation of ChatGPT, is another foundation model.
Choosing Small Large Language Models
You don’t have to use a large language model (LLM) for every query. LLMs are power hungry, while small language models are more compact and efficient. They are perfect for mobile operations, edge devices and constrained environments. And they draw much less power.
“SLMs (Small Language Models) and LLMs (Large Language Models) primarily differ in size, efficiency and training complexity,” explained Ramaswamy. “LLMs, like GPT-4, have billions to trillions of parameters, requiring massive computational resources. In contrast, SLMs, such as Gemma 2B are smaller in size (millions to low billions of parameters). While LLMs excel in complex reasoning and open-ended tasks, SLMs are designed for targeted applications, faster processing and lower energy consumption.”