o3-mini and the environmental rebound effect of reasoning models
Another striking example of AI's environmental rebound effect, entirely obscured by lack of transparency - the precipitous public release of OpenAI's o3-mini model.
Last week, Deepseek unveiled their Chinese model R1, misleadingly positioned as energy-efficient yet substantially cheaper for users than OpenAI's equivalent models and ChatGPT - 27x less expensive per token (roughly equivalent to a generated word), though this reveals little about actual energy consumption. This model created a market upheaval, competitively forcing ChatGPT's parent company to respond by unveiling their new "reasoning" model o3-mini to hundreds of millions of ChatGPT subscribers.
What exactly are these newly fashionable "reasoning" models? Unlike traditional LLMs that generate responses directly, these models are engineered to incorporate a crucial intermediate step: systematically decomposing problems into logical steps before producing an answer. This "chain of thought" approach enables the model to detect and correct potential errors along the way, emulating a form of reflection through "inner monologue" (though we should be cautious about anthropomorphization), proving particularly effective for complex scientific tasks.
Announced in December 2024, OpenAI's o3 model substantially outperformed its predecessors on highly complex logical tasks and advanced questions in quantum physics and mathematics. Today, OpenAI announces o3-mini, a lighter model with mysterious lineage: an evolution of o1-mini? An architectural improvement of o1? Or a distilled version of the powerful o3? The company maintains complete opacity on these questions while announcing 24% faster performance and 63% reduced costs compared to o1-mini (for users).
The fundamental issue with these models' energy consumption is that while standard LLMs maintain relatively linear consumption, the reasoning chain - and thus energy consumption - dynamically adapts to problem complexity. This architecture makes direct performance or efficiency comparisons particularly challenging: the same model can consume vastly different resources depending on task complexity. DeepSeek-R1's empirical data by Scott Chamberlin is revealing: an average 87% increase in energy consumption compared to standard LLMs of similar size, solely due to this reasoning phase.
The numbers become truly staggering for o3 on the most complex tasks. An analysis by Boris Gamazaychikov on the ARC-AGI benchmarks showed that a single complex task can consume up to 1,785 kWh - generating around 700 kg of CO2 for a single query (equivalent to approximately 100 beef-based meals or a Paris-New York flight).
Previously, unlimited access to reasoning models (o1 and o1-mini) required a €200/month ChatGPT Pro subscription, which Sam Altman acknowledged was operating at a loss. Today, under competitive pressure, they're releasing a new model without transparency or guidance for users on appropriate use cases for these computational behemoths.
While optimizations will inevitably emerge, as Dr Sasha Luccioni noted in MIT Tech Review, we must exercise extreme caution regarding the excitement generated by Deepseek, which will push AI giants to integrate these reasoning models ubiquitously without transparency or explanation - with one guaranteed loser: our environment.