Deep reinforcement learning (DRL) has achieved remarkable performance across a wide range of tasks, yet its growing computational demands raise significant environmental and economic concerns. While prior work has largely emphasized learning performance and sample efficiency, the energy consumption, carbon emissions, and monetary cost of DRL training remain poorly understood. In this work, we present a systematic energy benchmarking study of seven widely used DRL algorithms, i.e., DQN, TRPO, A2C, ARS, PPO, RecurrentPPO, and QR-DQN, evaluated on ten Atari 2600 benchmarks under identical hardware and software configurations. Each algorithm is trained for one million steps, with real-time power measurements used to estimate total energy usage, CO -equivalent emissions, and electricity cost. Our results over ten runs reveal substantial disparities in energy efficiency across algorithms: derivative-free methods such as ARS achieve comparable performance to DQN while consuming approximately 77% less energy. In contrast, distributional and recurrent approaches incur substantially higher environmental and monetary overheads: for example, RecurrentPPO produces over 1344% more CO emissions and incurs more than 1367% higher national monetary cost than the most energy-efficient alternative (ARS), while QR-DQN exhibits moderately higher emissions relative to DQN. These findings demonstrate that algorithmic choice alone can induce multi-fold differences in energy efficiency without sacrificing performance, highlighting the importance of incorporating energy and sustainability metrics into the evaluation and design of future DRL systems.
Supervisor
Prof. Govind Chhimpa
Program
Btech CSE
License
CC BY 42
Published
29 March 2026
Department
Computer Science & Engineering
License
CC BY 42
Supervisor
Prof. Govind Chhimpa
Program
Btech CSE
Submit your research to MUJ General. Join our community of researchers and share your work with the academic world.
Get started