Reinforcement Learning Shows Cost-Optimal HVAC Systems with 30-Year Life-Cycle Analysis

Researchers are tackling the complex problem of designing cost-effective and efficient heating, ventilation, and air conditioning (HVAC) systems for commercial buildings. Tanay Raghunandan Srinivasa, Vivek Deulkar, Aviruch Bhatia, and Vishal Garg, all from Plaksha University, present a novel approach utilising reinforcement learning to simultaneously optimise both the sizing and operation of chillers and thermal energy storage units. This work is significant because it addresses the inherent trade-offs between expensive chiller capacity and more affordable thermal storage, aiming to minimise life-cycle costs over a 30-year period while guaranteeing reliable cooling performance? Their method learns an optimal control policy for chiller operation and then integrates this with a life-cycle cost analysis to identify the ideal combination of equipment capacities, ultimately offering a pathway to substantial energy and financial savings.
The research addresses a critical challenge in HVAC design: balancing the disproportionately high cost of increasing chiller capacity against the relatively lower cost of increasing thermal energy storage (TES) capacity.
This work presents a method for co-designing chiller and TES systems, ensuring reliable cooling while minimising overall expenditure. The team formulated the chiller operation problem as a finite-horizon Markov Decision Process, utilising a Deep Q Network (DQN) to determine the optimal chiller part-load ratio.
This DQN policy was trained using historical cooling demand and electricity price data, explicitly accounting for chiller inefficiencies and TES charging/discharging losses. By learning to minimise electricity costs, the policy effectively manages the interplay between chiller operation and TES utilisation.
Experiments involved evaluating the trained policy across various chiller-TES sizing configurations, restricting analysis to those that guarantee zero loss of cooling load. A subsequent life-cycle cost minimisation process then identified the most cost-effective infrastructure design. This innovative approach determined that optimal chiller and thermal energy storage capacities are 700 and 1500, respectively.
This breakthrough reveals a powerful method for optimising HVAC systems, moving beyond conventional designs that often prioritise meeting peak demand at the expense of long-term cost efficiency. The research establishes a framework for intelligently managing cooling production and storage, leveraging time-varying electricity prices to reduce operational expenses. The work opens possibilities for integrating renewable energy sources and creating more sustainable and economical building climate control systems.
Optimising HVAC life-cycle costs via reinforcement learning and Markov Decision Processes requires careful consideration
Scientists developed a reinforcement learning methodology to jointly optimise the operation and sizing of cooling infrastructure for commercial HVAC systems over a 30-year life-cycle. The research focused on a cooling system comprising a fixed-capacity electric chiller and a thermal energy storage (TES) unit, operated to satisfy stochastic hourly cooling demands alongside time-varying electricity prices.
Life-cycle cost was calculated considering both capital expenditure and discounted operating costs, encompassing electricity consumption and maintenance. To address the significant asymmetry in capital costs, where increasing chiller capacity is substantially more expensive than equivalent TES capacity increases, the team formulated the chiller operation problem as a finite-horizon Markov Decision Process (MDP).
This MDP determined the chiller part-load ratio (PLR) for a given infrastructure configuration. Scientists then employed a Deep Q-Network (DQN) with a constrained action space to solve the MDP, training the DQN to minimise electricity cost using historical cooling demand and electricity price data. For each candidate chiller-TES sizing configuration, the trained DQN policy was evaluated to ensure full satisfaction of cooling demand.
The study pioneered a life-cycle cost minimisation process performed on this feasible set of configurations, ultimately identifying the cost-optimal infrastructure design. Experiments revealed an optimal chiller capacity of 700 units and a corresponding TES capacity of 1500 units. This approach enables precise co-design, balancing capital expenditure with operational efficiency and demonstrating a 7.6% cost reduction compared to fixed-schedule strategies, as validated through simulations of a 2020 cooling season. The technique reveals how reinforcement learning can effectively learn system characteristics and enhance cost efficiency in complex HVAC systems.
Optimal HVAC capacity determined via reinforcement learning and life-cycle cost analysis offers significant energy savings
Scientists achieved optimal chiller and thermal energy storage capacities of 700 and 1500, respectively, through a novel reinforcement learning approach to HVAC system design. The research focused on minimizing life-cycle cost over a 30-year horizon, considering both capital expenditure and discounted operating costs, including electricity consumption and maintenance.
Experiments revealed a significant asymmetry in capital costs, where increasing chiller capacity by one unit is approximately 4.3times more expensive than an equivalent increase in thermal energy storage capacity. The team formulated the chiller operation problem as a finite-horizon Markov Decision Process, controlling the chiller part-load ratio to meet stochastic hourly cooling demands under time-varying electricity prices.
A Deep Q Network (DQN) with a constrained action space was implemented to solve the MDP, minimizing electricity cost based on historical cooling demand and price data. Results demonstrate the DQN policy effectively learns to operate the chiller, enabling evaluation of various chiller-TES sizing configurations.
Measurements confirm that the approach successfully identifies configurations fully satisfying cooling demand, subsequently performing life-cycle cost minimization to pinpoint the most cost-effective infrastructure design. The study quantified annual electricity cost and loss-of-load tradeoffs across multiple configurations, as detailed in Table 1, ultimately delivering a cost-optimal solution.
This breakthrough delivers a method for co-designing chiller and TES systems, offering a pathway to reduced operating costs and improved energy efficiency in commercial HVAC applications. The research also shows a 7.6% cost reduction during a simulated cooling season when compared to fixed-schedule strategies.
Reinforcement learning optimises chiller and thermal storage lifecycle costs through intelligent control
Scientists have demonstrated a method for jointly optimising the operation and sizing of cooling infrastructure in commercial HVAC systems using reinforcement learning. The research focused on minimising life-cycle costs over a 30-year period, considering both initial capital expenditure and ongoing operating expenses such as electricity and maintenance.
A system comprising an electric chiller and a thermal energy storage (TES) unit was investigated, operated to meet fluctuating cooling demands and time-varying electricity prices. The key achievement lies in co-designing the chiller and TES capacities, addressing the challenge of asymmetric capital costs where increasing chiller capacity is significantly more expensive than increasing TES capacity.
Researchers formulated the chiller operation as a Markov Decision Process and solved it using a Deep Q Network, incentivising low electricity costs and penalising cooling load loss. This approach identified an optimal configuration of 700 kWhth chiller capacity and 1500 kWhth TES capacity, demonstrating that operational intelligence can effectively substitute for simply over-provisioning capital-intensive equipment.
The authors acknowledge that the study’s findings are specific to the modelled system and the historical data used for training the reinforcement learning agent. They suggest future work could explore the application of this methodology to more complex HVAC systems and diverse climate conditions. Furthermore, investigating the robustness of the learned policy to unforeseen changes in electricity pricing or cooling demand patterns would be valuable. Overall, this research validates reinforcement learning as a useful tool for estimating optimal operational costs and planning cooling infrastructure, offering economic benefits through reduced initial investment and lower operating expenses.
👉 More information
🗞 Reinforcement Learning-Based Co-Design and Operation of Chiller and Thermal Energy Storage for Cost-Optimal HVAC Systems
🧠 ArXiv: https://arxiv.org/abs/2601.22880



