SCIENCE CHINA Information Sciences, Volume 63 , Issue 6 : 169101(2020) https://doi.org/10.1007/s11432-018-9820-3

## Reinforcement learning with actor-critic for knowledge graph reasoning

• AcceptedJan 31, 2019
• PublishedMay 9, 2020
Share
Rating

### Acknowledgment

This work was supported by National Natural Science Foundation of China (Grant Nos. 61590924, 61433002) and Science and Technology Innovation Action Plan Project of Shanghai Science and Technology Commission (Grant No. 18511104200).

### Supplement

Appendixes A and B.

### References

[1] Wang Q, Liu J, Luo Y F, et al. Knowledge base completion via coupled path ranking. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016. 1308--1318. Google Scholar

[2] Chen W H, Xiong W H, Yan X F, et al. Variational knowledge graph reasoning. 2018,. arXiv Google Scholar

[3] Ni L, Cohen W W. Fast query execution for retrieval models based on path-constrained random walks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2010. 881--888. Google Scholar

[4] Wang W Y, Cohen W W. Joint information extraction and reasoning: a scalable statistical relational learning approach. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, 2015. 355--364. Google Scholar

[5] Bordes A, Usunier N, Garcia-Duran A, et al. Translating embeddings for modeling multi-relational data. In: Proceedings of Neural Information Processing Systems, 2013. 2787--2795. Google Scholar

[6] Xiong W H, Hoang T, Wang W Y. DeepPath: a reinforcement learning method for knowledge graph reasoning. 2017,. arXiv Google Scholar

[7] Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms. In: Proceedings of the 31st International Conference on International Conference on Machine Learning, 2014. 387--395. Google Scholar

[8] Andrew C, Justin B, Bryan K, et al. Toward an architecture for neverending language learning. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence, 2010. Google Scholar

• Table 1

Table 1Fact prediction results (MAP)$^{\rm~a)}$

 Tasks ACRL DeepPath TransE TransD Improvement (%) personBornInLocation 0.4878 0.2895 0.2652 0.0924 +68.50 athletePlaysForTeam 0.4384 0.2435 0.1400 0.1494 +80.04 teamPlaysSport 0.4120 0.3084 0.3567 0.1216 +33.59 agentBelongsToOrganization 0.3287 0.3308 0.3498 0.1286 $-$0.64 athletePlaysInLeague 0.5199 0.5059 0.4676 0.0762 +2.77 organizationHeadQuarteredInCity 0.6420 0.5929 0.2569 0.1520 +8.28 organizationHiredPerson 0.5257 0.4475 0.3073 0.3554 +17.47 $\cdots$ Overall 0.5170 0.4638 0.3622 0.1720 +11.47

a) The bold number represents the best result among the four methods for every task.

•

Algorithm 1 Training procedure

Initialize parameters $\theta$ of actor-critic network;

for episode $\Leftarrow$ $1$ to $N$

Initialize entity pair $\langle~e_1,e_2~\rangle$;

while ${\rm~num\_path}~<~{\rm~max\_path}$ do

Randomly BFS, obtain $\langle~e,~r\rangle$;

if $r~\ne~\emptyset$ then

Embed $\langle~e,~r\rangle$ to $\langle~s,~a\rangle$;

Save $\langle~s,~a\rangle$ to $\varepsilon_{\rm~pos}$;

end if

end while

for $\langle~s,~a\rangle$ in $\varepsilon_{\rm~pos}$

$k~\propto~\nabla_\theta~{\rm~log}\pi~(a|s)~\cdot~A^\pi(s,a)$;

end for

end for

for episode $\Leftarrow$ $1$ to $N$

Initialize state vector $s_0$;

while ${\rm~num\_step}~<~{\rm~max\_step}$ do

$a_1,a_2~\sim~\pi(a|s_0)_{\rm~top2}$, $a_3~\sim~\pi(a|s_0)$;

Choose the best action $a~\sim~R_t$;

Save $\langle~s,~a\rangle$ to $\kappa_{\rm~pos}~/~\kappa_{\rm~neg}$;

end while

for $\langle~s,~a\rangle$ in $\kappa_{\rm~pos}$

$k~\propto~\nabla_\theta~\begin{matrix}~\sum_t~{\rm~log}\pi~(a=r_t|s_t)~\cdot~R_{\rm~total};~\end{matrix}$

end for

for $\langle~s,~a\rangle$ in $\kappa_{\rm~neg}$

$k~\propto~\nabla_\theta~\begin{matrix}~\sum_t~{\rm~log}\pi~(a=r_t|s_t)~\cdot~(-1)~\end{matrix}$;

end for

end for

Citations

• #### 0

Altmetric

Copyright 2020  CHINA SCIENCE PUBLISHING & MEDIA LTD.  中国科技出版传媒股份有限公司  版权所有