SCIENTIA SINICA Informationis, Volume 50 , Issue 2 : 221-238(2020) https://doi.org/10.1360/N112018-00223

## Research on a link-prediction method based on a hierarchical hybrid-feature graph

• AcceptedNov 15, 2018
• PublishedFeb 12, 2020
Share
Rating

### Abstract

Entities in the real world are often interconnected, forming heterogeneous information networks. Link-prediction is a necessary technique for predicting the existence of unobserved or future links in heterogeneous information networks. It is useful to make users better understand the generation and evolution of networks. However, current techniques lack the effective fusion of multiple features, often leading to nonsensical results. Also, it is difficult for them to adapt to the heterogeneity and dynamics of heterogeneous-information networks. In this paper, we present a hierarchical-hybrid-feature-graph (HHFG) model by fully considering structural, semantic, and time features. Also, an HHFG based link-prediction algorithm is proposed to effectively guarantee accuracy. On one hand, it performs a random walk on HHFG based upon hybrid features. On the other hand, parameters such as feature weights and transition coefficients are learned by the gradient-descent method. The experiments demonstrate the feasibility and effectiveness of our key techniques.

### References

[1] Zhou T, Lü L, Zhang Y C. Predicting missing links via local information. Eur Phys J B 2009, 71: 623--630. Google Scholar

[2] Mitzenmacher M. A brief history of generative models for power law and lognormal distributions. Internet Math 2004, 1: 226--251. Google Scholar

[3] Salton G, McGill M J. Introduction to Modern Information Retrieval. New York: McGraw-Hill, 1986. Google Scholar

[4] Li X Y, Du N, Li H. A deep learning approach to link prediction in dynamic networks. In: Proceedings of SIAM International Conference on Data Mining, 2014. 289--297. Google Scholar

[5] Wang D X, Cui P, Zhu W W. Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2016. 1225--1234. Google Scholar

[6] Lü L, Jin C H, Zhou T. Similarity index based on local paths for link prediction of complex networks. Phys Rev E 2009, 80: 046122. Google Scholar

[7] Brin S, Page L. The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 1998, 30: 107--117. Google Scholar

[8] Jeh G, Widom J. SimRank: a measure of structural-context similarity. In: Proceedings of the 8th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2002. 538--543. Google Scholar

[9] Leskovec J, Horvitz E. Planetary-scale views on a large instant-messaging network. In: Proceedings of the 17th International Conference on World Wide Web, 2008. 915--924. Google Scholar

[10] Akcora C G, Carminati B, Ferrari E. User similarities on social networks. Soc Netw Anal Min 2013, 3: 475--495. Google Scholar

[11] Aiello L M, Barrat A, Schifanella R, et al. Friendship prediction and homophily in social media. ACM Trans Web 2012, 6: 1--33. Google Scholar

[12] Huang Z, Lin D K J. The time-series link prediction problem with applications in communication surveillance. Inform J Comput 2009, 21: 286--303. Google Scholar

[13] Tylenda T, Angelova R, Bedathur S. Towards time-aware link prediction in evolving social networks. In: Proceedings of the 3rd Workshop on Social Network Mining and Analysis, 2009. Google Scholar

[14] Sun Y, Han J, Aggarwal C C, et al. When will it happen? — relationship prediction in heterogeneous information networks. In: Proceedings of the 5th ACM International Conference on Web Search and Data Mining, 2012. 663--672. Google Scholar

[15] Lee C, Nick B, Brandes U, et al. Link prediction with social vector clocks. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013. 784--792. Google Scholar

[16] Liu M, Liu B Q, Liu F. 社会网络中序列行为的链接值及事件结果预测. Sci Sin-Inf, 2015, 45: 1558-1573 CrossRef Google Scholar

[17] Bao Z F, Zeng Y, Tay Y C. SonLP: social network link prediction by principal component regression. In: Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2013. 364--371. Google Scholar

[18] Sun Y Z, Han J W, Yan X F, et al. PathSim: meta path-based top-K similarity search in heterogeneous information networks. In: Proceedings of the VLDB Endowment, 2011. 992--1003. Google Scholar

[19] Sun Y Z, Barber R, Gupta M, et al. Co-author relationship prediction in heterogeneous bibliographic networks.łinebreak In: Proceedings of International Conference on Advances in Social Networks Analysis and Mining, 2011. 121--128. Google Scholar

[20] Huang L W, Li D Y, Ma Y T. A meta path-based link prediction model for heterogeneous information networks. Chinese J Comput, 2014, 37: 848--858. Google Scholar

[21] Chen N, Zhu J, Xia F, et al. Discriminative relational topic models. IEEE Trans Pattern Anal Mach Intell 2015, 37: 973--986. Google Scholar

[22] Wang H, Shi X J, Yeung D Y. Relational deep learning: a deep latent variable model for link prediction. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, 2017. 2688--2694. Google Scholar

[23] Hao Z G, Zhang W X, Chen Z. Link prediction in online social networks based on supervised joint denoising model. Sci Sin Inform 2017, 47: 1551--1565 [郝占刚, 章伟雄, 陈政. 基于监督联合去噪模型的社交网络链接预测. 中国科学: 信息科学, 2017, 47: 1551--1565]. Google Scholar

[24] Chen H X, Yin H Z, Wang W Q, et al. PME: projected metric embedding on heterogeneous networks for link prediction. In: Proceedings of the 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2018. 1177--1186. Google Scholar

[25] Yin H Z, Hu Z T, Zhou X F, et al. Discovering interpretable geo-social communities for user behavior prediction. In: Proceedings of the 32nd IEEE International Conference on Data Engineering, 2016. 942--953. Google Scholar

[26] Yin H Z, Zou L, Nguyen Q, et al. Joint event-partner recommendation in event-based social networks. In: Proceedings of the 34th IEEE International Conference on Data Engineering, 2018. Google Scholar

[27] Xie M, Yin H Z, Xu F J, et al. Learning graph-based POI embedding for location-based recommendation.łinebreak In: Proceedings of ACM International Conference on Information and Knowledge Management, 2016. 15--24. Google Scholar

[28] Fujiwara Y, Nakatsuji M, Onizuka M, et al. Fast and exact top-K search for random walk with restart. In: Proceedings of the VLDB Endowment, 2012. 442--453. Google Scholar

[29] Lü L, Zhou T. Link prediction in complex networks: a survey. Phys A-Stat Mech Appl 2011, 390: 1150--1170. Google Scholar

[30] Powers D M W. Evaluation: from precision, recall and F-Measure to ROC, informedness, markedness & correlation. J Mach Learn Technol, 2011, 2: 37--63. Google Scholar

[31] Bryan P, Rami A, Steven S. DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2014. 701--710. Google Scholar

• Figure 1

(Color online) Hierarchical hybrid feature graph (HHFG)

• Figure 2

(Color online) The basic idea of HHFG-based link prediciton algorithmn

• Figure 3

(Color online) The update strategy of parameters based on gradient descent

• Figure 4

(Color online) The variety of Loss and AUC during parameter optimization

• Figure 5

(Color online) Comparision of link prediction methods based on different features with ROC curve

• Figure 6

(Color online) Comparision of link prediction methods based on different features division and transition probability computing strategies with ROC curve

• Table 1   Dataset
 Level Number of nodes Number of edges Semantic feature level 2867 98677 Structure feature level 9682 15864 Time feature level 15864 39656
•

Algorithm 1 HHFG-based link prediciton algorithmn

Input:Heterogeneous information network ${G}$, source node set ${S}$, integer ${k}$;

Output:$\{N_1,\ldots,~N_{|S|}|~N_i\subseteq~V\wedge~|N_i|=k\wedge~1\leq~i\leq~|S|\}$;

1: Divide $G$ into $G_F$ and $G_L$;

2: Construct HHFG (denoted as $G'_{\rm sem}\cup G'_{\rm str}\cup G'_{\rm time}$) based on $G_F$;

3: Extract features $F'$ from $G'_{\rm sem}\cup G'_{\rm str}\cup G'_{\rm time}$;

4: $S'={\rm findSimNodes}(G_F,S)$;

5: Initialize $W_{\rm~I}$, $W_{\rm~II}$, $W_{\rm~III}$ and $W_{\rm~IV}$;

6:repeat

7: for {$s'\in S'$}

8: $P={\rm compRWProb}(F', W_{\rm I}, W_{\rm II} , W_{\rm III}, W_{\rm IV})$;

9:end for

10: Update $W_{\rm~I}$, $W_{\rm~II}$, $W_{\rm~III}$ and $W_{\rm~IV}$ based on $G_L$;

11: untilConvergence

12: Construct HHFG (denoted as $G_{\rm sem}\cup G_{\rm str}\cup G_{\rm time}$) based on $G$;

13: Extract features $F$ from $G_{\rm sem}\cup G_{\rm str}\cup G_{\rm time}$;

14:FOR{$s\in S$}

15: $P={\rm compRWProb}(F, W_{\rm I}, W_{\rm II} , W_{\rm III}, W_{\rm IV})$;

16: $N_i={\rm sort}(P,k)$;

17: end for

18: Return $\{N_1,\ldots, N_{|S|}\}$.

• Table 2   Parameters setting
 $F_1$ $F_2$ $F_3$ $F_4$ $F_5$ $F_6$ $F_7$ $F_8$ $F_9$ $F_{10}$ $\mu$ $\eta$ 0.0143 0.0115 $-$0.0258 $-$0.0371 0.0083 0.0236 $-$0.003 0.0601 0.0251 0.054 0.255 0.211
• Table 3   Comparision of link prediction methods based on different features with Precision, Recall and AUC
 Method Precision Recall AUC CN 0.103 0.099 0.608 Jaccard 0.105 0.099 0.652 RW 0.106 0.105 0.741 DW 0.115 0.112 0.746 SS 0.123 0.123 0.753 GT 0.125 0.124 0.759 SF 0.129 0.124 0.767 HHFG 0.131 0.124 0.769
• Table 4   Comparision of link prediction methods based on different features with P@K
 Method P@1 P@5 P@10 P@20 P@50 CN 0.142 0.131 0.102 0.099 0.043 Jaccard 0.147 0.128 0.112 0.099 0.037 RW 0.152 0.128 0.121 0.099 0.029 DW 0.163 0.142 0.135 0.099 0.037 SS 0.175 0.161 0.129 0.108 0.043 GT 0.179 0.171 0.131 0.101 0.043 SF 0.183 0.176 0.142 0.106 0.037 HHFG 0.191 0.177 0.139 0.106 0.043
• Table 5   Comparision of link prediction methods based on different features division and transition probability computing strategies with Precision, Recall and AUC
 Method Precision Recall AUC HIN 0.124 0.124 0.747 NTC 0.126 0.124 0.752 HHFG 0.131 0.124 0.769

Citations

• #### 0

Altmetric

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有