logo

SCIENTIA SINICA Informationis, Volume 50 , Issue 2 : 221-238(2020) https://doi.org/10.1360/N112018-00223

Research on a link-prediction method based on a hierarchical hybrid-feature graph

More info
  • ReceivedAug 19, 2018
  • AcceptedNov 15, 2018
  • PublishedFeb 12, 2020

Abstract

Entities in the real world are often interconnected, forming heterogeneous information networks. Link-prediction is a necessary technique for predicting the existence of unobserved or future links in heterogeneous information networks. It is useful to make users better understand the generation and evolution of networks. However, current techniques lack the effective fusion of multiple features, often leading to nonsensical results. Also, it is difficult for them to adapt to the heterogeneity and dynamics of heterogeneous-information networks. In this paper, we present a hierarchical-hybrid-feature-graph (HHFG) model by fully considering structural, semantic, and time features. Also, an HHFG based link-prediction algorithm is proposed to effectively guarantee accuracy. On one hand, it performs a random walk on HHFG based upon hybrid features. On the other hand, parameters such as feature weights and transition coefficients are learned by the gradient-descent method. The experiments demonstrate the feasibility and effectiveness of our key techniques.


Funded by

国家重点研发计划课题(2018YFB1003404)

国家自然科学基金(61472070,U1435216,61672142)

中国国家留学基金委(201806085016)


References

[1] Zhou T, Lü L, Zhang Y C. Predicting missing links via local information. Eur Phys J B 2009, 71: 623--630. Google Scholar

[2] Mitzenmacher M. A brief history of generative models for power law and lognormal distributions. Internet Math 2004, 1: 226--251. Google Scholar

[3] Salton G, McGill M J. Introduction to Modern Information Retrieval. New York: McGraw-Hill, 1986. Google Scholar

[4] Li X Y, Du N, Li H. A deep learning approach to link prediction in dynamic networks. In: Proceedings of SIAM International Conference on Data Mining, 2014. 289--297. Google Scholar

[5] Wang D X, Cui P, Zhu W W. Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2016. 1225--1234. Google Scholar

[6] Lü L, Jin C H, Zhou T. Similarity index based on local paths for link prediction of complex networks. Phys Rev E 2009, 80: 046122. Google Scholar

[7] Brin S, Page L. The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 1998, 30: 107--117. Google Scholar

[8] Jeh G, Widom J. SimRank: a measure of structural-context similarity. In: Proceedings of the 8th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2002. 538--543. Google Scholar

[9] Leskovec J, Horvitz E. Planetary-scale views on a large instant-messaging network. In: Proceedings of the 17th International Conference on World Wide Web, 2008. 915--924. Google Scholar

[10] Akcora C G, Carminati B, Ferrari E. User similarities on social networks. Soc Netw Anal Min 2013, 3: 475--495. Google Scholar

[11] Aiello L M, Barrat A, Schifanella R, et al. Friendship prediction and homophily in social media. ACM Trans Web 2012, 6: 1--33. Google Scholar

[12] Huang Z, Lin D K J. The time-series link prediction problem with applications in communication surveillance. Inform J Comput 2009, 21: 286--303. Google Scholar

[13] Tylenda T, Angelova R, Bedathur S. Towards time-aware link prediction in evolving social networks. In: Proceedings of the 3rd Workshop on Social Network Mining and Analysis, 2009. Google Scholar

[14] Sun Y, Han J, Aggarwal C C, et al. When will it happen? — relationship prediction in heterogeneous information networks. In: Proceedings of the 5th ACM International Conference on Web Search and Data Mining, 2012. 663--672. Google Scholar

[15] Lee C, Nick B, Brandes U, et al. Link prediction with social vector clocks. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013. 784--792. Google Scholar

[16] Liu M, Liu B Q, Liu F. 社会网络中序列行为的链接值及事件结果预测. Sci Sin-Inf, 2015, 45: 1558-1573 CrossRef Google Scholar

[17] Bao Z F, Zeng Y, Tay Y C. SonLP: social network link prediction by principal component regression. In: Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2013. 364--371. Google Scholar

[18] Sun Y Z, Han J W, Yan X F, et al. PathSim: meta path-based top-K similarity search in heterogeneous information networks. In: Proceedings of the VLDB Endowment, 2011. 992--1003. Google Scholar

[19] Sun Y Z, Barber R, Gupta M, et al. Co-author relationship prediction in heterogeneous bibliographic networks.łinebreak In: Proceedings of International Conference on Advances in Social Networks Analysis and Mining, 2011. 121--128. Google Scholar

[20] Huang L W, Li D Y, Ma Y T. A meta path-based link prediction model for heterogeneous information networks. Chinese J Comput, 2014, 37: 848--858. Google Scholar

[21] Chen N, Zhu J, Xia F, et al. Discriminative relational topic models. IEEE Trans Pattern Anal Mach Intell 2015, 37: 973--986. Google Scholar

[22] Wang H, Shi X J, Yeung D Y. Relational deep learning: a deep latent variable model for link prediction. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, 2017. 2688--2694. Google Scholar

[23] Hao Z G, Zhang W X, Chen Z. Link prediction in online social networks based on supervised joint denoising model. Sci Sin Inform 2017, 47: 1551--1565 [郝占刚, 章伟雄, 陈政. 基于监督联合去噪模型的社交网络链接预测. 中国科学: 信息科学, 2017, 47: 1551--1565]. Google Scholar

[24] Chen H X, Yin H Z, Wang W Q, et al. PME: projected metric embedding on heterogeneous networks for link prediction. In: Proceedings of the 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2018. 1177--1186. Google Scholar

[25] Yin H Z, Hu Z T, Zhou X F, et al. Discovering interpretable geo-social communities for user behavior prediction. In: Proceedings of the 32nd IEEE International Conference on Data Engineering, 2016. 942--953. Google Scholar

[26] Yin H Z, Zou L, Nguyen Q, et al. Joint event-partner recommendation in event-based social networks. In: Proceedings of the 34th IEEE International Conference on Data Engineering, 2018. Google Scholar

[27] Xie M, Yin H Z, Xu F J, et al. Learning graph-based POI embedding for location-based recommendation.łinebreak In: Proceedings of ACM International Conference on Information and Knowledge Management, 2016. 15--24. Google Scholar

[28] Fujiwara Y, Nakatsuji M, Onizuka M, et al. Fast and exact top-K search for random walk with restart. In: Proceedings of the VLDB Endowment, 2012. 442--453. Google Scholar

[29] Lü L, Zhou T. Link prediction in complex networks: a survey. Phys A-Stat Mech Appl 2011, 390: 1150--1170. Google Scholar

[30] Powers D M W. Evaluation: from precision, recall and F-Measure to ROC, informedness, markedness & correlation. J Mach Learn Technol, 2011, 2: 37--63. Google Scholar

[31] Bryan P, Rami A, Steven S. DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2014. 701--710. Google Scholar

  • Figure 1

    (Color online) Hierarchical hybrid feature graph (HHFG)

  • Figure 2

    (Color online) The basic idea of HHFG-based link prediciton algorithmn

  • Figure 3

    (Color online) The update strategy of parameters based on gradient descent

  • Figure 4

    (Color online) The variety of Loss and AUC during parameter optimization

  • Figure 5

    (Color online) Comparision of link prediction methods based on different features with ROC curve

  • Figure 6

    (Color online) Comparision of link prediction methods based on different features division and transition probability computing strategies with ROC curve

  • Table 1   Dataset
    Level Number of nodes Number of edges
    Semantic feature level 2867 98677
    Structure feature level 9682 15864
    Time feature level 15864 39656
  •   

    Algorithm 1 HHFG-based link prediciton algorithmn

    Input:Heterogeneous information network ${G}$, source node set ${S}$, integer ${k}$;

    Output:$\{N_1,\ldots,~N_{|S|}|~N_i\subseteq~V\wedge~|N_i|=k\wedge~1\leq~i\leq~|S|\}$;

    1: Divide $G$ into $G_F$ and $G_L$;

    2: Construct HHFG (denoted as $G'_{\rm sem}\cup G'_{\rm str}\cup G'_{\rm time}$) based on $G_F$;

    3: Extract features $F'$ from $G'_{\rm sem}\cup G'_{\rm str}\cup G'_{\rm time}$;

    4: $S'={\rm findSimNodes}(G_F,S)$;

    5: Initialize $W_{\rm~I}$, $W_{\rm~II}$, $W_{\rm~III}$ and $W_{\rm~IV}$;

    6:repeat

    7: for {$s'\in S'$}

    8: $P={\rm compRWProb}(F', W_{\rm I}, W_{\rm II} , W_{\rm III}, W_{\rm IV})$;

    9:end for

    10: Update $W_{\rm~I}$, $W_{\rm~II}$, $W_{\rm~III}$ and $W_{\rm~IV}$ based on $G_L$;

    11: untilConvergence

    12: Construct HHFG (denoted as $G_{\rm sem}\cup G_{\rm str}\cup G_{\rm time}$) based on $G$;

    13: Extract features $F$ from $G_{\rm sem}\cup G_{\rm str}\cup G_{\rm time}$;

    14:FOR{$s\in S$}

    15: $P={\rm compRWProb}(F, W_{\rm I}, W_{\rm II} , W_{\rm III}, W_{\rm IV})$;

    16: $N_i={\rm sort}(P,k)$;

    17: end for

    18: Return $\{N_1,\ldots, N_{|S|}\}$.

  • Table 2   Parameters setting
    $F_1$ $F_2$ $F_3$ $F_4$ $F_5$ $F_6$ $F_7$ $F_8$ $F_9$ $F_{10}$ $\mu$ $\eta$
    0.0143 0.0115 $-$0.0258 $-$0.0371 0.0083 0.0236 $-$0.003 0.0601 0.0251 0.054 0.255 0.211
  • Table 3   Comparision of link prediction methods based on different features with Precision, Recall and AUC
    Method Precision Recall AUC
    CN 0.103 0.099 0.608
    Jaccard 0.105 0.099 0.652
    RW 0.106 0.105 0.741
    DW 0.115 0.112 0.746
    SS 0.123 0.123 0.753
    GT 0.125 0.124 0.759
    SF 0.129 0.124 0.767
    HHFG 0.131 0.124 0.769
  • Table 4   Comparision of link prediction methods based on different features with P@K
    Method P@1 P@5 P@10 P@20 P@50
    CN 0.142 0.131 0.102 0.099 0.043
    Jaccard 0.147 0.128 0.112 0.099 0.037
    RW 0.152 0.128 0.121 0.099 0.029
    DW 0.163 0.142 0.135 0.099 0.037
    SS 0.175 0.161 0.129 0.108 0.043
    GT 0.179 0.171 0.131 0.101 0.043
    SF 0.183 0.176 0.142 0.106 0.037
    HHFG 0.191 0.177 0.139 0.106 0.043
  • Table 5   Comparision of link prediction methods based on different features division and transition probability computing strategies with Precision, Recall and AUC
    Method Precision Recall AUC
    HIN 0.124 0.124 0.747
    NTC 0.126 0.124 0.752
    HHFG 0.131 0.124 0.769

Copyright 2020  CHINA SCIENCE PUBLISHING & MEDIA LTD.  中国科技出版传媒股份有限公司  版权所有

京ICP备14028887号-23       京公网安备11010102003388号