This new approach, published in the journal Future Generation Computer Systems, optimises the software of language models without modifying their behaviour, improving both execution time and energy efficiency.
Researchers from the University of Cádiz are leading a scientific study that proposes a new way to reduce the energy impact of large language models, such as those used by ChatGPT or Gemini—technologies that are increasingly present in virtual assistants, education and scientific research. This work suggests optimising the software that runs these systems in order to improve their performance without the need to modify the artificial intelligence (AI) models themselves.
The growth of these models has represented a significant advance in human–technology interaction, but it has also led to high energy and computational demands, particularly during the response generation process. Specifically, the daily energy consumption generated by queries to models such as ChatGPT is equivalent to that of thousands of households and produces emissions comparable to those of a car travelling 80,000 kilometres (almost twice around the world). In this context, the work carried out at the UCA addresses the challenge of reducing the environmental impact of these AI models from an innovative perspective focused on inference engines—that is, the programs that use pre-trained AI models, such as ChatGPT, to generate real-time responses.
The proposal is based on an automated tool capable of optimising the code of these engines through a genetic algorithm inspired by natural evolution processes. This system intelligently explores a large number of possible software improvements and selects those that enable more efficient use of hardware, generating optimised versions that reduce energy consumption and speed up execution.
Tests carried out with language models of different sizes show significant results. In fact, energy consumption was reduced by more than 13%, while execution time decreased by nearly 20%. These savings represent substantial improvements compared to the generic optimisation techniques currently in use.
Unlike existing strategies, which focus on reducing the size or complexity of models—and therefore their response capacity—this work proposes improving efficiency by acting directly on the software that executes them. This makes it possible to move towards more sustainable and agile artificial intelligence without affecting its performance.
These results open up new lines of research aimed at reducing the environmental impact of artificial intelligence and contribute to the development of more efficient and environmentally friendly technologies, in a context of rapid expansion of these systems at scale.
This research has been carried out by Javier Jareño, José Miguel Aragón-Jurado, Juan Carlos De La Torre, Patricia Ruiz and Bernabé Dorronsoro, all members of the School of Engineering of Cádiz and the research group Graphical Methods, Optimization, and Learning (GOAL). Funding from the Regional Ministry of University, Research and Innovation of the Government of Andalusia made this work possible within the framework of the gCODE project.

Referencia bibliográfica: Javier Jareño, José Miguel Aragón-Jurado, Juan Carlos De La Torre, Patricia Ruiz, Bernabé Dorronsoro (2026): Energy-efficient large language models. Future Generation Computer Systems. Volume 182, 108483. ISSN 0167-739X. https://doi.org/10.1016/j.future.2026.108483
