logo

SCIENCE CHINA Information Sciences, Volume 63 , Issue 6 : 169101(2020) https://doi.org/10.1007/s11432-018-9820-3

Reinforcement learning with actor-critic for knowledge graph reasoning

More info
  • ReceivedJun 6, 2018
  • AcceptedJan 31, 2019
  • PublishedMay 9, 2020

Abstract

There is no abstract available for this article.


Acknowledgment

This work was supported by National Natural Science Foundation of China (Grant Nos. 61590924, 61433002) and Science and Technology Innovation Action Plan Project of Shanghai Science and Technology Commission (Grant No. 18511104200).


Supplement

Appendixes A and B.


References

[1] Wang Q, Liu J, Luo Y F, et al. Knowledge base completion via coupled path ranking. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016. 1308--1318. Google Scholar

[2] Chen W H, Xiong W H, Yan X F, et al. Variational knowledge graph reasoning. 2018,. arXiv Google Scholar

[3] Ni L, Cohen W W. Fast query execution for retrieval models based on path-constrained random walks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2010. 881--888. Google Scholar

[4] Wang W Y, Cohen W W. Joint information extraction and reasoning: a scalable statistical relational learning approach. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, 2015. 355--364. Google Scholar

[5] Bordes A, Usunier N, Garcia-Duran A, et al. Translating embeddings for modeling multi-relational data. In: Proceedings of Neural Information Processing Systems, 2013. 2787--2795. Google Scholar

[6] Xiong W H, Hoang T, Wang W Y. DeepPath: a reinforcement learning method for knowledge graph reasoning. 2017,. arXiv Google Scholar

[7] Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms. In: Proceedings of the 31st International Conference on International Conference on Machine Learning, 2014. 387--395. Google Scholar

[8] Andrew C, Justin B, Bryan K, et al. Toward an architecture for neverending language learning. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence, 2010. Google Scholar

  • Table 1  

    Table 1Fact prediction results (MAP)$^{\rm~a)}$

    TasksACRLDeepPathTransETransDImprovement (%)
    personBornInLocation0.48780.28950.26520.0924+68.50
    athletePlaysForTeam0.43840.24350.14000.1494+80.04
    teamPlaysSport0.41200.30840.35670.1216+33.59
    agentBelongsToOrganization0.32870.33080.34980.1286$-$0.64
    athletePlaysInLeague0.51990.50590.46760.0762+2.77
    organizationHeadQuarteredInCity0.64200.59290.25690.1520+8.28
    organizationHiredPerson0.52570.44750.30730.3554+17.47
    $\cdots$
    Overall0.51700.46380.36220.1720+11.47

    a) The bold number represents the best result among the four methods for every task.

  •   

    Algorithm 1 Training procedure

    Initialize parameters $\theta$ of actor-critic network;

    for episode $\Leftarrow$ $1$ to $N$

    Initialize entity pair $\langle~e_1,e_2~\rangle$;

    while ${\rm~num\_path}~<~{\rm~max\_path}$ do

    Randomly BFS, obtain $\langle~e,~r\rangle$;

    if $r~\ne~\emptyset$ then

    Embed $\langle~e,~r\rangle$ to $\langle~s,~a\rangle$;

    Save $\langle~s,~a\rangle$ to $\varepsilon_{\rm~pos}$;

    end if

    end while

    for $\langle~s,~a\rangle$ in $\varepsilon_{\rm~pos}$

    $k~\propto~\nabla_\theta~{\rm~log}\pi~(a|s)~\cdot~A^\pi(s,a)$;

    end for

    end for

    for episode $\Leftarrow$ $1$ to $N$

    Initialize state vector $s_0$;

    while ${\rm~num\_step}~<~{\rm~max\_step}$ do

    $a_1,a_2~\sim~\pi(a|s_0)_{\rm~top2}$, $a_3~\sim~\pi(a|s_0)$;

    Choose the best action $a~\sim~R_t$;

    Save $\langle~s,~a\rangle$ to $\kappa_{\rm~pos}~/~\kappa_{\rm~neg}$;

    end while

    for $\langle~s,~a\rangle$ in $\kappa_{\rm~pos}$

    $k~\propto~\nabla_\theta~\begin{matrix}~\sum_t~{\rm~log}\pi~(a=r_t|s_t)~\cdot~R_{\rm~total};~\end{matrix}$

    end for

    for $\langle~s,~a\rangle$ in $\kappa_{\rm~neg}$

    $k~\propto~\nabla_\theta~\begin{matrix}~\sum_t~{\rm~log}\pi~(a=r_t|s_t)~\cdot~(-1)~\end{matrix}$;

    end for

    end for

Copyright 2020  CHINA SCIENCE PUBLISHING & MEDIA LTD.  中国科技出版传媒股份有限公司  版权所有

京ICP备14028887号-23       京公网安备11010102003388号