基于深度强化学习算法的“电网脑”及其示范工程应用

徐春雷; 吴海伟; 刁瑞盛; 胡浔惠; 李 雷; 史 迪

文章摘要

徐春雷,吴海伟,刁瑞盛,胡浔惠,李雷,史迪.基于深度强化学习算法的“电网脑”及其示范工程应用[J].电力需求侧管理,2021,23(4):73-78

基于深度强化学习算法的“电网脑”及其示范工程应用

Deep reinforcement learning-based grid mind and field demonstration application

投稿时间：2021-03-02 修订日期：2021-05-30

DOI：10. 3969 / j. issn. 1009-1831. 2021. 04. 014

中文关键词: 人工智能智能调控深度强化学习电网安全

英文关键词: artificial intelligence intelligent dispatch and control deep reinforcement learning grid security

基金项目:国网江苏省电力有限公司科技项目（J2020058）

作者	单位
徐春雷	国网江苏省电力有限公司,南京 210024
吴海伟	国网江苏省电力有限公司,南京 210024
刁瑞盛	智博能源科技（江苏）有限公司,南京211302
胡浔惠	国电南瑞科技股份有限公司,南京 211106
李雷	国电南瑞科技股份有限公司,南京 211106
史迪	智博能源科技（江苏）有限公司,南京211302

摘要点击次数: 4138

全文下载次数: 1039

中文摘要:

可再生能源、电力电子设备渗透率持续增大以及大功率交直流混联，电网的动态性、随机性和不确定性显著增强，给电力系统安全稳定运行带来新的挑战。为更有效解决电网中出现的电压、潮流快速波动而导致的安全问题，提出一种基于最大熵深度强化学习算法的智能电网调控辅助决策方法，同时考虑多种控制目标，对电网运行方式进行在线优化控制。该方法将电网调度控制决策建模为马尔科夫决策过程，训练多线程智能体，并采用周期性在线训练机制对智能体的控制性能进行不断提升。基于该方法所研发的辅助决策原型软件部署在国网江苏电力调度控制中心，可与电网调度控制系统环境直接交互，自主学习且不断提升智能体调控决策能力。训练好的智能体可针对电压越限、联络线潮流越限、网损等综合控制目标在毫秒级时间内给出有效控制策略。

英文摘要:

With the increasing penetration of renewable energy and power electronics-based devices, and the hybrid operation of AC/DC power networks with heavy power transfer, the dynamics,stochastics and uncertainties of the power grid are being observed,threatening its secure operation. In order to effectively resolve security issues caused by fast variations of voltage and line flows, a reinforcement learning algorithm based on maximum entropy depth ispresented for providing online decision support in smart grid operation, which can simultaneously consider multiple control objectives.This method formulates decision derivation for grid operation as Markov decision process, which trains multi-threaded soft actor-critic and uses periodic online training mechanism to continuously improve its control performance. The developed prototype using this method has been deployed in the control center of SGCC Jiangsu electric power company, which interacts with live energy management system and learns its control policy adaptively. The well -trained agent can provide effective control actions within milliseconds to regulate voltage violation, line flow and losses.

查看全文查看/发表评论下载PDF阅读器

关闭