logo

SCIENTIA SINICA Informationis, Volume 50, Issue 2: 221-238(2020) https://doi.org/10.1360/N112018-00223

Research on a link-prediction method based on a hierarchical hybrid-feature graph

More info
  • ReceivedAug 19, 2018
  • AcceptedNov 15, 2018
  • PublishedFeb 12, 2020

Abstract

Entities in the real world are often interconnected, forming heterogeneous information networks. Link-prediction is a necessary technique for predicting the existence of unobserved or future links in heterogeneous information networks. It is useful to make users better understand the generation and evolution of networks. However, current techniques lack the effective fusion of multiple features, often leading to nonsensical results. Also, it is difficult for them to adapt to the heterogeneity and dynamics of heterogeneous-information networks. In this paper, we present a hierarchical-hybrid-feature-graph (HHFG) model by fully considering structural, semantic, and time features. Also, an HHFG based link-prediction algorithm is proposed to effectively guarantee accuracy. On one hand, it performs a random walk on HHFG based upon hybrid features. On the other hand, parameters such as feature weights and transition coefficients are learned by the gradient-descent method. The experiments demonstrate the feasibility and effectiveness of our key techniques.


Funded by

国家重点研发计划课题(2018YFB1003404)

国家自然科学基金(61472070,U1435216,61672142)

中国国家留学基金委(201806085016)

  • Figure 1

    (Color online) Hierarchical hybrid feature graph (HHFG)

  • Figure 2

    (Color online) The basic idea of HHFG-based link prediciton algorithmn

  • Table 1   Dataset
    Level Number of nodes Number of edges
    Semantic feature level 2867 98677
    Structure feature level 9682 15864
    Time feature level 15864 39656
  •   

    Algorithm 1 HHFG-based link prediciton algorithmn

    Convergence

    Require:Heterogeneous information network ${G}$, source node set ${S}$, integer ${k}$;

    Output:$\{N_1,\ldots,~N_{|S|}|~N_i\subseteq~V\wedge~|N_i|=k\wedge~1\leq~i\leq~|S|\}$;

    Divide $G$ into $G_F$ and $G_L$;

    Construct HHFG (denoted as $G'_{\rm~sem}\cup~G'_{\rm~str}\cup~G'_{\rm~time}$) based on $G_F$;

    Extract features $F'$ from $G'_{\rm~sem}\cup~G'_{\rm~str}\cup~G'_{\rm~time}$;

    $S'={\rm~findSimNodes}(G_F,S)$;

    Initialize $W_{\rm~I}$, $W_{\rm~II}$, $W_{\rm~III}$ and $W_{\rm~IV}$;

    repeat

    for $s'\in~S'$

    $P={\rm~compRWProb}(F',~W_{\rm~I},~W_{\rm~II}~,~W_{\rm~III},~W_{\rm~IV})$;

    end for

    Update $W_{\rm~I}$, $W_{\rm~II}$, $W_{\rm~III}$ and $W_{\rm~IV}$ based on $G_L$;

    until

  • Table 2   Parameters setting
    $F_1$ $F_2$ $F_3$ $F_4$ $F_5$ $F_6$ $F_7$ $F_8$ $F_9$ $F_{10}$ $\mu$ $\eta$
    0.0143 0.0115 $-$0.0258 $-$0.0371 0.0083 0.0236 $-$0.003 0.0601 0.0251 0.054 0.255 0.211
  • Table 3   Comparision of link prediction methods based on different features with Precision, Recall and AUC
    Method Precision Recall AUC
    CN 0.103 0.099 0.608
    Jaccard 0.105 0.099 0.652
    RW 0.106 0.105 0.741
    DW 0.115 0.112 0.746
    SS 0.123 0.123 0.753
    GT 0.125 0.124 0.759
    SF 0.129 0.124 0.767
    HHFG 0.131 0.124 0.769
  • Table 4   Comparision of link prediction methods based on different features with P@K
    Method P@1 P@5 P@10 P@20 P@50
    CN 0.142 0.131 0.102 0.099 0.043
    Jaccard 0.147 0.128 0.112 0.099 0.037
    RW 0.152 0.128 0.121 0.099 0.029
    DW 0.163 0.142 0.135 0.099 0.037
    SS 0.175 0.161 0.129 0.108 0.043
    GT 0.179 0.171 0.131 0.101 0.043
    SF 0.183 0.176 0.142 0.106 0.037
    HHFG 0.191 0.177 0.139 0.106 0.043
  • Table 5   Comparision of link prediction methods based on different features division and transition probability computing strategies with Precision, Recall and AUC
    Method Precision Recall AUC
    HIN 0.124 0.124 0.747
    NTC 0.126 0.124 0.752
    HHFG 0.131 0.124 0.769

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1