logo

SCIENTIA SINICA Informationis, Volume 48, Issue 10: 1430-1449(2018) https://doi.org/10.1360/N112018-00072

Smart generation control based on deep reinforcement learning with the ability of action self-optimization

More info
  • ReceivedMay 11, 2018
  • AcceptedJun 11, 2018
  • PublishedOct 16, 2018

Abstract

Random disturbance or noise caused by the large-scale integration of new energy and distributed energy affects the safe and economical operation of the interconnected power grids. This study aims to propose DDRQN-AD, which was based on action self-optimization. In comparison with the traditional centralized automatic generation control systems, DDRQN-AD identified the optimal strategies easily and addressed the stochastic disturbance caused by the extensive integration of new energy and distributed energy sources into interconnected power grids to maximize the utilization of new energy. Simulation results for a two-area microgrid load-frequency control power system model and the Guangdong Power Grid model showed that the proposed algorithm can reduce carbon emissions can enhance the utilization rates of new energy sources. Moreover, the robustness and learning ability of DDRQN-AD were stronger than those of the traditional smart methods.


Funded by

国家自然科学基金(51707102)

国家自然科学基金(61603212)


Supplement

Appendix

续表A1


References

[1] Lund H. Large-scale integration of wind power into different energy systems. Energy, 2005, 30: 2402-2412 CrossRef Google Scholar

[2] Soares M.C. Borba B, Szklo A, Schaeffer R. Plug-in hybrid electric vehicles as a way to maximize the integration of variable renewable energy in power systems: The case of wind generation in northeastern Brazil. Energy, 2012, 37: 469-481 CrossRef Google Scholar

[3] Venkat A N, Hiskens I A, Rawlings J B. Distributed MPC Strategies With Application to Power System Automatic Generation Control. IEEE Trans Contr Syst Technol, 2008, 16: 1192-1206 CrossRef Google Scholar

[4] Mallesham G, Mishra S, Jha A N. Maiden application of Ziegler-Nichols method to AGC of distributed generation system. In: Proceedings of IEEE/PES Power Systems Conference and Exposition, Seattle, 2009. 1--7. Google Scholar

[5] Yazdanian M, Mehrizi-Sani A. Distributed Control Techniques in Microgrids. IEEE Trans Smart Grid, 2014, 5: 2901-2909 CrossRef Google Scholar

[6] Busoniu L, Babuska R, De Schutter B. A Comprehensive Survey of Multiagent Reinforcement Learning. IEEE Trans Syst Man Cybern C, 2008, 38: 156-172 CrossRef Google Scholar

[7] Yu T, Zhou B, Chan K W. Stochastic Optimal CPS Relaxed Control Methodology for Interconnected Power Systems Using Q-Learning Method. J Energy Eng, 2011, 137: 116-129 CrossRef Google Scholar

[8] Yu T, Zhou B, Chan K W. Stochastic Optimal Relaxed Automatic Generation Control in Non-Markov Environment Based on Multi-Step Q(lambda) Learning. IEEE Trans Power Syst, 2011, 26: 1272-1282 CrossRef ADS Google Scholar

[9] Yu T, Xi L, Yang B. Multiagent Stochastic Dynamic Game for Smart Generation Control. J Energy Eng, 2016, 142: 04015012 CrossRef Google Scholar

[10] Xi L, Yu T, Yang B. A novel multi-agent decentralized win or learn fast policy hill-climbing with eligibility trace algorithm for smart generation control of interconnected complex power grids. Energy Convers Manage, 2015, 103: 82-93 CrossRef Google Scholar

[11] Foerster J N, Assael Y M, Freitas N D, et al. Learning to communicate to solve riddles with deep distributed decurrent Q-networks. 2016,. arXiv Google Scholar

[12] Banerjee B, Kraemer L. Reinforcement learning with action discovery. In: Proceedings of the Adaptive and Learning Agents Workshop at AAMAS-10, Toronto, 2010. 30--37. Google Scholar

[13] Tan W, Xu Z. Robust analysis and design of load frequency controller for power systems. Electric Power Syst Res, 2009, 79: 846-853 CrossRef Google Scholar

[14] Zhang X, Zheng L, Yu T. Multi-objective optimal carbon emission flow calculation of power grid based on multi-step Q($\lambda~)$ learning algorithm. Automat Electron Power Sys, 2014, 38: 118--123. Google Scholar

[15] Tang Y, Zhang W, Zhang J, et al. Research on control performance standard based control strategy for AGC. Power Sys Te Chno, 2004, 28: 75--79. Google Scholar

[16] Park J, Law K H. A data-driven, cooperative wind farm control to maximize the total power production. Appl Energy, 2016, 165: 151-165 CrossRef Google Scholar

[17] Banerjee B, Peng J. Adaptive policy gradient in multiagent learning. In: Proceedings of the 2nd International Joint Conference on Autonomous Agents and Multiagent Systems. New York: ACM, 2003. 686--692. Google Scholar

[18] Bowling M, Veloso M. Multiagent learning using a variable learning rate. Artificial Intelligence, 2002, 136: 215-250 CrossRef Google Scholar

[19] Yu T, Zhou B, Chan K W. R() imitation learning for automatic generation control of interconnected power grids. Automatica, 2012, 48: 2130-2136 CrossRef Google Scholar

  • Figure 1

    (Color online) The schematic diagram of action discovery

  • Figure 2

    (Color online) Control schematic diagram of SGC system based on DDRQN-AD

  • Figure 3

    (Color online) Active power of wind power, photovoltaic and electricvehicles

  • Figure 4

    (Color online) The two-area micro-grid LFC power system model

  • Figure 5

    (Color online) The pre-learning effect of DDRQN-AD in area-A

  • Figure 6

    (Color online) Control performance of different algorithms under pulsedisturbance. (a) Output pulse of various algorithms; (b) the $\vert~\Delta~f\vert$ and $\vert$ACE$\vert~$ of various algorithms

  • Figure 7

    (Color online) Control performance of different algorithms under random pulsedisturbance. (a) Output results under random pulse disturbance; (b)the $\vert~\Delta~f\vert$ and $\vert~$ACE$\vert~$ ofvarious algorithms

  • Figure 8

    (Color online) The adjusted active power of wind power, photovoltaic andelectric vehicles

  • Figure 9

    (Color online) Effect of DDRQN-AD under white noise disturbance

  • Figure 10

    (Color online) Performance statistics of the three algorithms under whitenoise disturbance

  • Figure 11

    (Color online) Guangdong power grid model

  • Table 1   Table 1Unit parameter statistics of Guangdong power grid
    Area Unit type Unit $\Delta~P_k^{\rm~max}$ (MW) $\Delta~P_k^{\rm~min}$ (MW) $B_{k}$ (kg/kWh) Unit state (Summer) Unit state (Winter)
    Yuebei Coal-fired units G1 120 $-120$ 0.99 Starting up Maintenance
    G2 120 $-$120 0.99 Starting up Starting up
    G3 120 $-$120 0.99 Cosing down Starting up
    G4 135 $-$135 0.99 Starting up Starting up
    G5 135 $-$135 0.99 Starting up Starting up
    G6 300 $-$300 0.99 Starting up Starting up
    G7 300 $-$300 0.99 Starting up Starting up
    G8 320 $-$320 0.89 Starting up Starting up
    Gas-fired unit G9 188 $-$188 0.5 Starting up Starting up
    Hydropower unit G10 180 0 0 Starting up 50% capacity
    Yuexi Coal-fired units G11 500 $-$500 0.89 Starting up Starting up
    G12 330 $-$330 0.89 Starting up Starting up
    G13 125 $-$125 0.99 Starting up Maintenance
    G14 125 $-$125 0.99 Cosing down Starting up
    G15 150 $-$150 0.99 Starting up Starting up
    G16 150 $-$150 0.99 Starting up Starting up
    G17 150 $-$150 0.99 Starting up Starting up
    G18 220 $-$220 0.99 Starting up Starting up
    G19 220 $-$220 0.99 Starting up Starting up
    G20 220 $-$220 0.99 Starting up Starting up
    G21 660 $-$660 0.87 Starting up Starting up
    G22 180 $-$180 0.99 Starting up Starting up
    G23 180 $-$180 0.99 Starting up Starting up
    Gas-fired units G24 280 $-$280 0.5 Starting up Starting up
    G25 200 $-$200 0.5 Starting up Starting up
    G26 200 $-$200 0.5 Starting up Starting up
    G27 200 $-$200 0.5 Starting up Starting up
    Oil-fuel units G28 120 $-$120 0.7 Starting up Maintenance
    G29 120 $-$120 0.7 Cosing down Starting up
    Zhusanjiao Coal-fired units G30 600 $-$600 0.89 Starting up Starting up
    G31 100 $-$100 0.99 Starting up Maintenance
    G32 100 $-$100 0.99 Starting up Maintenance
    G33 200 $-$200 0.99 Starting up Starting up
    G34 200 $-$200 0.99 Starting up Starting up
    G35 200 $-$200 0.99 Starting up Starting up
    G36 210 $-$210 0.99 Starting up Starting up
    G37 240 $-$240 0.99 Starting up Starting up
  • Table 1   Table 1SGC parameter settings
    Case $\delta$ $\alpha$ $\alpha^{-}$ $\gamma$
    Ideal environment 0.1 0.5 0.5 0.95
    Nonideal environment 0.3 0.1 0.1 0.9
  •   

    Algorithm 1 DDRQN-AD算法

    Require:对所有智能体$m$, 初始化奖励函数$R$(1), 动作集$A$(1), 权值$\theta_{1}$以及$\theta^-_1.$

    Output:设置算法参数$\delta$, $\gamma$, $\alpha$, $\alpha^-$.

    Output:设置初始状态$s_{1}$, 设初始内部状态$h_{1}$=0, 设$\nabla~\theta=0.$

    Output:Start

    基于动作概率分布选择并执行一个探索动作$a^m_t$.

    观察下一时刻的状态$s_{t~+~1}$.

    记录状态观测值$o^m_{t+1}$和内部状态$h^m_t$.

    由式(8)获取一个短期的奖励函数信号$R(t)$.

    根据式(6)计算目标Q值函数$y^m_t$.

    按照式(1)计算损失函数误差$L^m_t$.

    根据式(3)和(4)更新权值$\theta_{i~+~1}$和$\theta^-_{i+1}$.

    按照式(7)搜索并评估新动作.

    动作集$A(t)$更新为$A(t$+1).

    令$t=t$+1, 返回步骤1.

    Output:End

  •   
    Area Unit type Unit $\Delta~P_k^{\rm~max}~$ (MW) $\Delta~P_k^{\rm~min}$ (MW) $B_{k}$ (kg/kWh) Unit state (Summer) Unit state (Winter)
    G38 240 $-$240 0.99 Starting up Starting up
    G39 280 $-$280 0.99 Starting up Starting up
    G40 280 $-$280 0.99 Starting up Starting up
    G41 280 $-$280 0.99 Starting up Starting up
    G42 250 $-$250 0.99 Starting up Starting up
    G43 250 $-$250 0.99 Starting up Starting up
    G44 360 $-$360 0.89 Starting up Starting up
    G45 360 $-$360 0.89 Starting up Starting up
    G46 400 $-$400 0.89 Starting up Starting up
    G47 400 $-$400 0.89 Starting up Starting up
    Gas-fired units G48 180 $-$180 0.5 Starting up Starting up
    G49 180 $-$180 0.5 Starting up Starting up
    G50 180 $-$180 0.5 Starting up Starting up
    Oil-fuel units G51 150 $-$150 0.7 Cosing down Starting up
    G52 150 $-$150 0.7 Cosing down Starting up
    G53 180 $-$180 0.7 Starting up Starting up
    G54 180 $-$180 0.7 Starting up Starting up
    G55 180 $-$180 0.7 Starting up Starting up
    Hydropower units G56 300 0 0 Starting up 50% capacity
    G57 300 0 0 Starting up 50% capacity
    G58 400 0 0 Starting up 50% capacity
    Yuedong Coal-fired units G59 100 $-$100 0.99 Starting up Maintenance
    G60 196 $-$196 0.99 Starting up Starting up
    G61 296 $-$296 0.99 Starting up Starting up
    G62 180 $-$180 0.99 Cosing down Starting up
    G63 220 $-$220 0.99 Starting up Starting up
    G64 180 $-$180 0.99 Cosing down Starting up
    G65 220 $-$220 0.99 Starting up Starting up
    G66 180 $-$180 0.99 Starting up Starting up
    G67 100 $-$100 0.99 Starting up Maintenance
    G68 168 $-$168 0.99 Cosing down Starting up
    G69 60 $-$60 0.99 Starting up Maintenance
    G70 210 $-$210 0.99 Starting up Starting up
    G71 350 $-$350 0.89 Starting up Starting up
    G72 240 $-$240 0.99 Starting up Starting up
    G73 240 $-$240 0.99 Starting up Starting up
    G74 240 $-$240 0.99 Starting up Starting up
    G75 240 $-$240 0.99 Starting up Starting up
    G76 200 $-$200 0.99 Starting up Starting up
  • 2   Table 2Control performance of different algorithms under stepdisturbance
    Algorithm Overshoot (% Steady state error (% Risetime (s)
    DDRQN-AD 7.08 0.57 138
    DDRQN 7.34 5.83 202
    PDWoLF-PHC($\lambda~)$ 7.38 3.98 190
    SARSA-AD 7.38 7.24 186
    DWoLF-PHC($\lambda~)$ 7.54 7.57 318
    WoLF-PHC 7.40 12.42 198
    R($\lambda~)$ 7.36 20.84 222
    Q($\lambda~)$ 8.24 12.28 254
    Q 8.13 13.69 534
  • 3   Table 3Statistics of Guangdong power grid in summer under differentalgorithms
    Area Algorithm $\vert~$ACE$\vert~$ (MW) CPS1 (% $\vert~\Delta~f\vert~$ (Hz) CE
    DDRQN-AD 4.8465 199.9606 0.0036 637.8061
    DDRQN 13.2599 199.5169 0.0066 653.1144
    Yuebei PDWoLF-PHC($\lambda~)$ 30.8923 197.6574 0.0096 687.4272
    DWoLF-PHC($\lambda~)$ 62.0124 194.6082 0.0140 689.4484
    Q($\lambda~)$ 82.0249 188.5264 0.0154 694.0980
    DDRQN-AD 9.3702 199.9714 0.0063 671.2834
    DDRQN 19.5084 198.1847 0.0096 688.7363
    Yuexi PDWoLF-PHC($\lambda~)$ 45.5341 195.3937 0.0122 692.6057
    DWoLF-PHC($\lambda~)$ 72.5696 189.6666 0.0141 693.1040
    Q($\lambda~)$ 105.0339 178.8297 0.0155 699.8637
    DDRQN-AD 9.3545 199.5875 0.0054 633.6197
    DDRQN 18.1722 198.8693 0.0068 652.1616
    Zhusanjiao PDWoLF-PHC($\lambda~)$ 45.7089 195.0776 0.0098 683.5096
    DWoLF-PHC($\lambda~)$ 80.8745 191.5694 0.0142 687.8286
    Q($\lambda~)$ 139.9966 173.0605 0.0157 694.8414
    DDRQN-AD 2.9283 199.8459 0.0065 635.2505
    DDRQN 9.7626 199.1897 0.0069 657.7666
    Yuedong PDWoLF-PHC($\lambda~)$ 22.7016 197.2192 0.0100 671.7122
    DWoLF-PHC($\lambda~)$ 61.9700 194.6535 0.0144 675.5289
    Q($\lambda~)$ 102.4672 190.5930 0.0157 698.7767
  •   
    Area Unit type Unit $\Delta~P_k^{\rm~max}$ (MW) $\Delta~P_k^{\rm~min}$ (MW) $B_{k}$ (kg/kWh) Unit state (Summer) Unit state (Winter)
    G77 200 $-$200 0.99 Starting up Starting up
    G78 220 $-$220 0.99 Starting up Starting up
    G79 220 $-$220 0.99 Starting up Starting up
    G80 220 $-$220 0.99 Starting up Starting up
    G81 350 $-$350 0.89 Starting up Starting up
    G82 350 $-$350 0.89 Starting up Starting up
    Gas-fired units G83 250 $-$250 0.5 Starting up Starting up
    G84 250 $-$250 0.5 Starting up Starting up
    G85 250 $-$250 0.5 Starting up Starting up
    G86 250 $-$250 0.5 Starting up Starting up
    G87 288 $-$288 0.5 Starting up Starting up
    G88 360 $-$360 0.5 Starting up Starting up
    G89 100 $-$100 0.5 Starting up Maintenance
    Oil-fuel units G90 240 $-$240 0.7 Starting up Starting up
    G91 240 $-$240 0.7 Starting up Starting up
    G92 120 $-$120 0.7 Cosing down Starting up
    Hydropower unit G93 244 0 0 Starting up 50% capacity
  • 4   Table 4Statistics of Guangdong power grid in winter under differentalgorithms
    Area Algorithm $\vert~$ACE$\vert~$ (MW) CPS1 (% $\vert~\Delta~f\vert~$ (Hz) CE
    DDRQN-AD 4.8690 199.8558 0.0030 704.9624
    DDRQN 11.5191 198.6730 0.0050 719.7761
    Yuebei PDWoLF-PHC($\lambda~)$ 38.7529 195.3053 0.0101 721.6102
    DWoLF-PHC($\lambda~)$ 70.2284 194.1612 0.0110 723.7792
    Q($\lambda~)$ 135.8834 190.3734 0.0137 737.4928
    DDRQN-AD 3.8699 199.6796 0.0029 681.6042
    DDRQN 10.7540 198.4822 0.0052 698.4800
    Yuexi PDWoLF-PHC($\lambda~)$ 30.8557 194.6218 0.0101 707.6413
    DWoLF-PHC($\lambda~)$ 71.3868 193.3833 0.0111 717.5449
    Q($\lambda~)$ 133.4363 192.6690 0.0138 725.5942
    DDRQN-AD 4.6580 199.0981 0.0027 648.7999
    DDRQN 15.4619 198.3565 0.0052 670.1390
    Zhusanjiao PDWoLF-PHC($\lambda~)$ 32.3917 194.2996 0.0116 672.3745
    DWoLF-PHC($\lambda~)$ 77.1797 192.6169 0.0112 674.3519
    Q($\lambda~)$ 139.7009 179.0764 0.0143 696.9169
    DDRQN-AD 4.8149 199.9593 0.0031 644.2032
    DDRQN 10.5419 198.5774 0.0056 659.5584
    Yuedong PDWoLF-PHC($\lambda~)$ 32.3602 195.2387 0.0099 680.9719
    DWoLF-PHC($\lambda~)$ 72.2484 193.1077 0.0115 689.6905
    Q($\lambda~)$ 147.6071 191.5688 0.0137 702.1414

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1