SCIENTIA SINICA Informationis, Volume 50, Issue 2: 221-238(2020) https://doi.org/10.1360/N112018-00223

## Research on a link-prediction method based on a hierarchical hybrid-feature graph

• AcceptedNov 15, 2018
• PublishedFeb 12, 2020
Share
Rating

### Abstract

Entities in the real world are often interconnected, forming heterogeneous information networks. Link-prediction is a necessary technique for predicting the existence of unobserved or future links in heterogeneous information networks. It is useful to make users better understand the generation and evolution of networks. However, current techniques lack the effective fusion of multiple features, often leading to nonsensical results. Also, it is difficult for them to adapt to the heterogeneity and dynamics of heterogeneous-information networks. In this paper, we present a hierarchical-hybrid-feature-graph (HHFG) model by fully considering structural, semantic, and time features. Also, an HHFG based link-prediction algorithm is proposed to effectively guarantee accuracy. On one hand, it performs a random walk on HHFG based upon hybrid features. On the other hand, parameters such as feature weights and transition coefficients are learned by the gradient-descent method. The experiments demonstrate the feasibility and effectiveness of our key techniques.

### Funded by

• Figure 1

(Color online) Hierarchical hybrid feature graph (HHFG)

• Figure 2

(Color online) The basic idea of HHFG-based link prediciton algorithmn

• Table 1   Dataset
 Level Number of nodes Number of edges Semantic feature level 2867 98677 Structure feature level 9682 15864 Time feature level 15864 39656
•

Algorithm 1 HHFG-based link prediciton algorithmn

Convergence

Require:Heterogeneous information network ${G}$, source node set ${S}$, integer ${k}$;

Output:$\{N_1,\ldots,~N_{|S|}|~N_i\subseteq~V\wedge~|N_i|=k\wedge~1\leq~i\leq~|S|\}$;

Divide $G$ into $G_F$ and $G_L$;

Construct HHFG (denoted as $G&apos;_{\rm~sem}\cup~G&apos;_{\rm~str}\cup~G&apos;_{\rm~time}$) based on $G_F$;

Extract features $F&apos;$ from $G&apos;_{\rm~sem}\cup~G&apos;_{\rm~str}\cup~G&apos;_{\rm~time}$;

$S&apos;={\rm~findSimNodes}(G_F,S)$;

Initialize $W_{\rm~I}$, $W_{\rm~II}$, $W_{\rm~III}$ and $W_{\rm~IV}$;

repeat

for $s&apos;\in~S&apos;$

$P={\rm~compRWProb}(F&apos;,~W_{\rm~I},~W_{\rm~II}~,~W_{\rm~III},~W_{\rm~IV})$;

end for

Update $W_{\rm~I}$, $W_{\rm~II}$, $W_{\rm~III}$ and $W_{\rm~IV}$ based on $G_L$;

until

• Table 2   Parameters setting
 $F_1$ $F_2$ $F_3$ $F_4$ $F_5$ $F_6$ $F_7$ $F_8$ $F_9$ $F_{10}$ $\mu$ $\eta$ 0.0143 0.0115 $-$0.0258 $-$0.0371 0.0083 0.0236 $-$0.003 0.0601 0.0251 0.054 0.255 0.211
• Table 3   Comparision of link prediction methods based on different features with Precision, Recall and AUC
 Method Precision Recall AUC CN 0.103 0.099 0.608 Jaccard 0.105 0.099 0.652 RW 0.106 0.105 0.741 DW 0.115 0.112 0.746 SS 0.123 0.123 0.753 GT 0.125 0.124 0.759 SF 0.129 0.124 0.767 HHFG 0.131 0.124 0.769
• Table 4   Comparision of link prediction methods based on different features with P@K
 Method P@1 P@5 P@10 P@20 P@50 CN 0.142 0.131 0.102 0.099 0.043 Jaccard 0.147 0.128 0.112 0.099 0.037 RW 0.152 0.128 0.121 0.099 0.029 DW 0.163 0.142 0.135 0.099 0.037 SS 0.175 0.161 0.129 0.108 0.043 GT 0.179 0.171 0.131 0.101 0.043 SF 0.183 0.176 0.142 0.106 0.037 HHFG 0.191 0.177 0.139 0.106 0.043
• Table 5   Comparision of link prediction methods based on different features division and transition probability computing strategies with Precision, Recall and AUC
 Method Precision Recall AUC HIN 0.124 0.124 0.747 NTC 0.126 0.124 0.752 HHFG 0.131 0.124 0.769
• #### 0

Citations

• Altmetric

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有