SCIENCE CHINA Information Sciences, Volume 63 , Issue 6 : 169101(2020) https://doi.org/10.1007/s11432-018-9820-3

## Reinforcement learning with actor-critic for knowledge graph reasoning

• AcceptedJan 31, 2019
• PublishedMay 9, 2020
### Acknowledgment

This work was supported by National Natural Science Foundation of China (Grant Nos. 61590924, 61433002) and Science and Technology Innovation Action Plan Project of Shanghai Science and Technology Commission (Grant No. 18511104200).

### Supplement

Appendixes A and B.

• Table 1

Table 1Fact prediction results (MAP)$^{\rm~a)}$

 Tasks ACRL DeepPath TransE TransD Improvement (%) personBornInLocation 0.4878 0.2895 0.2652 0.0924 +68.50 athletePlaysForTeam 0.4384 0.2435 0.1400 0.1494 +80.04 teamPlaysSport 0.4120 0.3084 0.3567 0.1216 +33.59 agentBelongsToOrganization 0.3287 0.3308 0.3498 0.1286 $-$0.64 athletePlaysInLeague 0.5199 0.5059 0.4676 0.0762 +2.77 organizationHeadQuarteredInCity 0.6420 0.5929 0.2569 0.1520 +8.28 organizationHiredPerson 0.5257 0.4475 0.3073 0.3554 +17.47 $\cdots$ Overall 0.5170 0.4638 0.3622 0.1720 +11.47

a) The bold number represents the best result among the four methods for every task.

•

Algorithm 1 Training procedure

Initialize parameters $\theta$ of actor-critic network;

for episode $\Leftarrow$ $1$ to $N$

Initialize entity pair $\langle~e_1,e_2~\rangle$;

while ${\rm~num\_path}~<~{\rm~max\_path}$ do

Randomly BFS, obtain $\langle~e,~r\rangle$;

if $r~\ne~\emptyset$ then

Embed $\langle~e,~r\rangle$ to $\langle~s,~a\rangle$;

Save $\langle~s,~a\rangle$ to $\varepsilon_{\rm~pos}$;

end if

end while

for $\langle~s,~a\rangle$ in $\varepsilon_{\rm~pos}$

$k~\propto~\nabla_\theta~{\rm~log}\pi~(a|s)~\cdot~A^\pi(s,a)$;

end for

end for

for episode $\Leftarrow$ $1$ to $N$

Initialize state vector $s_0$;

while ${\rm~num\_step}~<~{\rm~max\_step}$ do

$a_1,a_2~\sim~\pi(a|s_0)_{\rm~top2}$, $a_3~\sim~\pi(a|s_0)$;

Choose the best action $a~\sim~R_t$;

Save $\langle~s,~a\rangle$ to $\kappa_{\rm~pos}~/~\kappa_{\rm~neg}$;

end while

for $\langle~s,~a\rangle$ in $\kappa_{\rm~pos}$

$k~\propto~\nabla_\theta~\begin{matrix}~\sum_t~{\rm~log}\pi~(a=r_t|s_t)~\cdot~R_{\rm~total};~\end{matrix}$

end for

for $\langle~s,~a\rangle$ in $\kappa_{\rm~neg}$

$k~\propto~\nabla_\theta~\begin{matrix}~\sum_t~{\rm~log}\pi~(a=r_t|s_t)~\cdot~(-1)~\end{matrix}$;

end for

end for

