logo

SCIENTIA SINICA Informationis, Volume 51 , Issue 2 : 247(2021) https://doi.org/10.1360/SSI-2020-0316

Real-time scheduling for carrier-borne aircraft support operations: a reinforcement learning approach

More info
  • ReceivedOct 12, 2020
  • AcceptedNov 18, 2020
  • PublishedJan 26, 2021

Abstract


Funded by

国家自然科学基金(61972362,62036010,61602420,61822701,61772474,61872324,61802351)

中国博士后基金(2018M630836,2018M632802)

河南省自然科学基金(202300410378)

河南省重点研发与推广专项(192102310476)


References

[1] Zhou X G, Feng B S, Chi Z Y, et al. Analysis of carrier-borne aircraft movement rate based on closed line network. Ordn Ind Autom, 2014, 33: 79--83. Google Scholar

[2] Ryan J C, Banerjee A G, Cummings M L. Comparing the Performance of Expert User Heuristics and an Integer Linear Program in Aircraft Carrier Deck Operations. IEEE Trans Cybern, 2014, 44: 761-773 CrossRef Google Scholar

[3] Han W, Si W C, Ding D C, et al. Multi-routes dynamic planning on deck of carrier plane based on clustering PSO. J Beijing Univ Aeronaut Astronautics, 2013, 39: 610--614. Google Scholar

[4] Bian D P, Luan T T, Song Y. A layout method of carrier-based aircraft based on simulated annealing. Appl Sci Technol, 2015, 42: 20--24. Google Scholar

[5] Li J, Sun Z, Li M L, et al. Research on scheduling decision of carrier aircraft support operation. Ship Electron Eng, 2018, 38: 165--168. Google Scholar

[6] Abbeel P. Apprenticeship learning and reinforcement learning with application to robotic control. Palo Alto: Stanford University, 2008. Google Scholar

[7] Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep reinforcement learning. 2013,. arXiv Google Scholar

[8] Mnih V, Kavukcuoglu K, Silver D. Human-level control through deep reinforcement learning. Nature, 2015, 518: 529-533 CrossRef ADS Google Scholar

[9] Lin H, Zhan M F, Zhou F. Optimal scheduling algorithm and simulation of carrier-based aircraft recovery task. J Nav Univ Eng, 2008, 20: 50--54. Google Scholar

[10] Lv X F, Guo X W, Wang Y F. Ammunition scheduling sequence of carrier-based aircraft based on genetic algorithm. Ordn Ind Autom, 2011,30: 10--12. Google Scholar

[11] Feng Q, Zeng S K, Kang R. Simulation and optimization method for dynamic dispatch of carrier-based aircraft under uncertain conditions. J Syst Simul, 2011, 23: 1497--1501. Google Scholar

[12] Han W, Su X C, Chen J F. Multi-aircraft integrated maintenance scheduling method for carrier-based aircraft. J Syst Eng Electron, 2015, 37: 809--816. Google Scholar

[13] Fan J L, Zhu X D, Gao W, et al. Shipboard aircraft dispatching again based on parallel genetic algorithm. J Ordnance Eq Eng, 2019, 40: 139--143. Google Scholar

[14] Michael J W. An Introduction to Multiagent Systems. 2nd ed. Hoboken: John Wiley & Sons, 2009. Google Scholar

[15] Bakker B, Whiteson S, Kester L, et al. Traffic light control by multiagent reinforcement learning systems. In: Interactive Collaborative Information Systems. Berlin: Springer, 2010. 475--510. Google Scholar

[16] Xu Z, Li Z X, Guan Q W, et al. Large-scale order dispatch in on-demand ride-hailing platforms: a learning and planning approach. In: Proceedings of the 24th ACM SIGKDD International Conference, 2018. 905--913. Google Scholar

[17] Wang Y S, Tong Y X, Long C, et al. Adaptive dynamic bipartite graph matching: a reinforcement learning approach. In: Proceedings of the 35th International Conference on Data Engineering (ICDE), 2019. 1478--1489. Google Scholar

[18] Lin K X, Zhao R Y, Xu Z, et al. Efficient large-scale fleet management via multi-agent deep reinforcement learning. In: Proceedings of the 24th ACM SIGKDD International Conference, 2018. 1774--1783. Google Scholar

[19] Li Y X, Zheng Y, Yang Q. Efficient and effective express via contextual cooperative reinforcement learning. In: Proceedings of the 25th ACM SIGKDD International Conference, 2019. 510--519. Google Scholar

[20] Shan C, Mamoulis N, Cheng R, et al. An end-to-end deep RL framework for task arrangement in crowdsourcing platforms. In: Proceedings of the 36th International Conference on Data Engineering, 2020. 49--60. Google Scholar

[21] Zhang J, Liu Y, Zhou K, et al. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In: Proceedings of International Conference on Management of Data, 2019. 415--432. Google Scholar

[22] Li Y X, Zheng Y, Yang Q. Dynamic bike reposition: a spatio-transit reinforcement learning approach. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018. 1724--1733. Google Scholar

[23] Wei H, Zheng G J, Yao H X. IntelliLight: a reinforcement learning approach for intelligent traffic light control. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018. 2496--2505. Google Scholar

[24] Chen S Y, Yu Y, Da Q, et al. Stabilizing reinforcement learning in dynamic environment with application to online recommendation. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018. 1187--1196. Google Scholar

[25] Hu Y J, Da Q, Zeng A X, et al. Reinforcement learning to rank in e-commerce search engine: formalization, analysis, and application. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018. 368--377. Google Scholar

  • Figure 1

    (Color online) Real-time scheduling framework for support tasks based on reinforcement learning

  • Figure 2

    (Color online) Common process of carrier-borne aircraft support operation

  •   

    Algorithm 1 Task scheduling strategy learning algorithm

    Require:Current state $s_i$, action $a_i$, reward $R$, next state $s_{i+1}$.

    Output:Next action $a_{i+1}$.

    Initialize the environment and target network;

    while training is not completed do

    #Step 1: collect experience;

    Randomly choose the next action $a_{i+1}$ or $a_{i+1}={\rm~max}_a~Q^*(\emptyset~(s_i,a_i,R,s_{i+1}))$;

    Calculate the value of the target network and evaluation network, and store the transformation in model $M_Q$;

    #Step 2: update parameters;

    Sample a batch of experience $\{s_i,a_i,R,s_{i+1}\}$;

    Update the target network and evaluation network;

    end while

  • Table 1   Details of carrier-borne aircraft support tasks
    Support task Support time (s) Need resources
    W1 1 No
    W2 3 Yes
    W3 3 Yes
    W4 3 Yes
    W5 3 No
    W6 4 Yes
    W7 4 No
    W8 4 Yes
    W9 2 No
    W10 2 No
    W11 5 Yes
    W12 6 No
    W13 6 Yes
    W14 4 Yes
    W15 1 No
  • Table 2   Details of aircraft carrier support stations (example 1)
    Support station Number of resources Resources
    P1 4 W2, W3, W4, W11
    P2 5 W2, W3, W6, W13, W14
    P3 5 W4, W6, W8, W13, W14
    P4 8 W2, W3, W4, W6, W8, W11, W13, W14
    P5 6 W2, W3, W8, W11, W13, W14
    P6 6 W2, W4, W6, W8, W13, W14
  • Table 3   Execution sequences of support tasks
    Sequence Resources
    S1 W1, W2, W3, W4, W5, W6, W7, W8, W9, W10, W11, W12, W13, W14, W15
    S2 W1, W2, W4, W3, W5, W6, W7, W8, W9, W10, W11, W12, W13, W14, W15
    S3 W1, W3, W2, W4, W5, W6, W7, W8, W9, W10, W11, W12, W13, W14, W15
    S4 W1, W2, W3, W4, W5, W6, W8, W7, W9, W10, W11, W12, W13, W14, W15
    S5 W1, W2, W3, W4, W5, W6, W7, W8, W9, W10, W11, W12, W14, W13, W15
    S6 W1, W2, W3, W4, W5, W6, W8, W7, W9, W10, W11, W12, W14, W13, W15
    S7 W1, W4, W3, W2, W5, W6, W8, W7, W9, W10, W11, W12, W14, W13, W15
    S8 W1, W3, W4, W2, W5, W6, W8, W7, W9, W10, W11, W12, W14, W13, W15
    S9 W1, W3, W4, W2, W5, W6, W8, W7, W9, W10, W11, W12, W13, W14, W15
    S10 W1, W3, W4, W2, W5, W6, W7, W8, W9, W10, W11, W12, W13, W14, W15
  • Table 4   Details of aircraft carrier support stations (example 2)
    Support station Number of resources Resources
    P1 3 W2, W3, W11
    P2 5 W2, W3, W6, W13, W14
    P3 3 W6, W13, W14
    P4 8 W2, W3, W4, W6, W8, W11, W13, W14
    P5 8 W2, W3, W4, W6, W8, W11, W13, W14
    P6 6 W2, W3, W6, W8, W13, W14
    P7 8 W2, W3, W4, W6, W8, W11, W13, W14
    P8 8 W2, W3, W4, W6, W8, W11, W13, W14
    P9 6 W2, W3, W4, W6, W8, W11
    P10 8 W2, W3, W4, W6, W8, W11, W13, W14
    P11 7 W2, W3, W6, W8, W11, W13, W14
    P12 5 W2, W3, W4, W6, W8
    P13 2 W3, W11
    P14 4 W8, W11, W13, W14
    P15 8 W2, W3, W4, W6, W8, W11, W13, W14
    P16 6 W2, W3, W4, W11, W13, W14
  • Table 5   Comparison of average task response time (example 1)
    Sequence Ours Q-learning SA LP GA
    S1 0.0055 0.0069 0.509 0.636 0.464
    S2 0.0048 0.0067 0.477 0.508 0.502
    S3 0.0017 0.0051 0.490 0.607 0.525
    S4 0.0083 0.0135 0.341 0.657 0.478
    S5 0.0014 0.0128 0.508 0.518 0.499
    S6 0.0055 0.0086 0.634 0.572 0.463
    S7 0.0013 0.0146 0.438 0.696 0.454
    S8 0.0014 0.0083 0.507 0.710 0.576
    S9 0.0084 0.0015 0.596 0.602 0.507
    S10 0.0061 0.0092 0.426 0.648 0.474
  • Table 6   Comparison of average task response time (example 2)
    Sequence Ours Q-learning SA LP GA
    S1 0.0275 0.3319 23.792 19.02 21.22
    S2 0.0156 0.3219 23.586 20.28 18.67
    S3 0.0152 0.3253 24.772 22.66 20.33
    S4 0.0352 0.2292 18.764 19.60 27.09
    S5 0.0121 0.3209 21.755 19.75 25.65
    S6 0.0234 0.3281 25.815 18.96 20.20
    S7 0.0128 0.2287 20.083 19.19 22.33
    S8 0.0195 0.3224 20.313 24.77 21.17
    S9 0.0115 0.2261 21.696 18.70 19.75
    S10 0.0129 0.3283 20.797 22.46 22.38