logo

SCIENTIA SINICA Informationis, Volume 50 , Issue 8 : 1197-1216(2020) https://doi.org/10.1360/N112018-00182

Coupling network vertex representation learning based on network embedding method

More info
  • ReceivedJul 15, 2018
  • AcceptedSep 5, 2019
  • PublishedAug 6, 2020

Abstract

Network representation learning is a basic problem in network data analysis. By learning network representation vectors, network vertices can be represented more accurately. With the development of deep learning, embedding methods have been widely used for network vertex representation learning. Providing that network data have changed in terms of their scale and modality, the research focus gradually shifted from single network mining to coupling network mining. This paper first analyzes the research status of embedding methods for single networks and then compares their advantages and disadvantages. Furthermore, the paper presents a model called CWCNE for coupling network embedding. The random walk and training algorithms of the proposed model are improved to adapt to coupling network features. The validity of the proposed model was verified using social, academic, film, poetry, and work coupling network data. Good results were obtained on community detection, entity recognition, and label classification tasks.


Funded by

国家自然科学基金(61170112)

北京市自然科学基金(4172016)

北京市教委科技计划一般项目(KM201710011006)

公安部重点实验室开放课题


References

[1] Tan S L, Guan Z Y, Cai D, et al. Mapping users across networks by manifold alignment on hypergraph. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence, Québec City, 2014. 159--165. Google Scholar

[2] Zhao D, Wang L, Li S. Immunization of Epidemics in Multiplex Networks. PLoS ONE, 2014, 9: e112018 CrossRef PubMed ADS Google Scholar

[3] Wang W, Tang M, Zhang H F. Epidemic spreading on complex networks with general degree and weight distributions. Phys Rev E, 2014, 90: 042803 CrossRef PubMed ADS arXiv Google Scholar

[4] Zhang H F, Shu P P, Tang M, et al. Preferential imitation of vaccinating behavior can invalidate the targeted subsidy on complex network. 2015,. arXiv Google Scholar

[5] Lerman K, Ghosh R. Information contagion: an empirical study of the spread of news on digg and twitter social networks. Comput Sci, 2010, 52: 166--176. Google Scholar

[6] Li R Q, Tang M, Hui P M. Epidemic spreading on multi-relational networks. Journal of Physics, 2013, 62: 504-510 DOI: 10.7498/aps.62.168903. Google Scholar

[7] Cozzo E, Ba?os R A, Meloni S. Contact-based social contagion in multiplex networks. Phys Rev E, 2013, 88: 050801 CrossRef PubMed ADS arXiv Google Scholar

[8] Zhang X. Multilayer networks science: concepts, theories and data. Complex Syst Complex Sci, 2015, 12: 103--107. Google Scholar

[9] Mollgaard A, Zettler I, Dammeyer J, et al. Measure of Node Similarity in Multilayer Networks. 2016,. arXiv Google Scholar

[10] Granell C, Gómez S, Arenas A. Competing spreading processes on multiplex networks: Awareness and epidemics. Phys Rev E, 2014, 90: 012808 CrossRef PubMed ADS arXiv Google Scholar

[11] Zafarani R, Liu H. Connecting users across social media sites: a behavioral-modeling approach. In: Proceedings of the 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Chicago, 2013. 41--49. Google Scholar

[12] Zafarani R, Liu H. Connecting corresponding identities across communities. In: Proceedings of the 3rd International AAAI Conference on Weblogs and Social Media, San Jose, 2009. 354--357. Google Scholar

[13] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. In: Proceedings of the 27th Conference on Neural Information Processing Systems, Lake Tahoe, 2013. 3111--3119. Google Scholar

[14] Matsuno R, Murata T. MELL: effective embedding method for multiplex networks. In: Proceedings of the 27th World Wide Web Conference, Lyon, 2018. 1261--1268. Google Scholar

[15] Le Q V, Mikolov T. Distributed representations of sentences and documents. Comput Sci, 2014, 4: 1188--1196. Google Scholar

[16] Perozzi B, Alrfou R, Skiena S. DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2014. 701--710. Google Scholar

[17] Gligorijevic V, Panagakis Y, Zafeiriou S. Non-negative matrix factorizations for multiplex network analysis. IEEE Trans Pattern Anal, 2018, 41: 928--940. Google Scholar

[18] Lee D D, Seung H S. Learning the parts of objects by non-negative matrix factorization. Nature, 1999, 401: 788-791 CrossRef PubMed ADS Google Scholar

[19] Feng T, Li S Z, Shum H Y, et al. Local non-negative matrix factorization as a visual representation. In: Proceedings of the 2nd International Conference on Development and Learning, Cambridge, 2002. 178--183. Google Scholar

[20] Guillamet D, Vitrià J, Schiele B. Introducing a weighted non-negative matrix factorization for image classification. Pattern Recognition Lett, 2003, 24: 2447-2454 CrossRef Google Scholar

[21] Wang Y, Jia Y, Hu C, et al. Fisher non-negative matrix factorization for learning local features. In: Proceedings of the Asian Conference on Computer Vision, Jeju Island, 2004. 27--30. Google Scholar

[22] Gao Y, Church G. Improving molecular cancer class discovery through sparse non-negative matrix factorization.. Bioinformatics, 2005, 21: 3970-3975 CrossRef PubMed Google Scholar

[23] Yi L, Rong J, Liu Y. Semi-supervised multi-label learning by constrained non-negative matrix factorization. In: Proceedings of the 21st National Conference on Artificial Intelligence, Boston, 2006. 421--426. Google Scholar

[24] Carmonasaez P, Pascualmarqui R D, Tirado F, et al. Biclustering of gene expression data by non-smooth non-negativematrix factorization. BMC Bioinform, 2006, 7: 205--208. Google Scholar

[25] Tang J, Qu M, Wang M Z, et al. LINE: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, Florence, 2015. 1067--1077. Google Scholar

[26] Grover A, Leskovec J. Node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 2016. 855--864. Google Scholar

[27] Zhou N, Zhao W X, Zhang X. A General Multi-Context Embedding Model for Mining Human Trajectory Data. IEEE Trans Knowl Data Eng, 2016, 28: 1945-1958 CrossRef Google Scholar

[28] Cui P, Wang X, Pei J. A Survey on Network Embedding. IEEE Trans Knowl Data Eng, 2019, 31: 833-852 CrossRef Google Scholar

[29] Wang D X, Cui P, Zhu W W. Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 2016. 1225--1234. Google Scholar

[30] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. 2016,. arXiv Google Scholar

[31] Hamilton W, Ying Z, Leskovec J. Inductive representation learning on large graphs. In: Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, 2017. 1024--1034. Google Scholar

[32] Chen J, Ma T, Xiao C. Fastgcn: fast learning with graph convolutional networks via importance sampling. 2018,. arXiv Google Scholar

[33] Pan S R, Hu R Q, Long G D, et al. Adversarially regularized graph autoencoder for graph embedding. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, 2018. 2609--2615. Google Scholar

[34] Hinton G E, Salakhutdinov R R. Reducing the Dimensionality of Data with Neural Networks. Science, 2006, 313: 504-507 CrossRef PubMed ADS Google Scholar

[35] Wang H W, Wang J, Wang J L, et al. GraphGAN: graph representation learning with generative adversarial nets. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, 2018. 2508--2515. Google Scholar

[36] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. In: Proceedings of the 28th Conference on Neural Information Processing Systems, Montreal, 2014. 2672--2680. Google Scholar

[37] Qu M, Tang J, Shang J B, et al. An attention-based collaboration framework for multi-view network representation learning. In: Proceedings of the 26th ACM International Conference on Information and Knowledge Management, Singapore, 2017. 1767--1776. Google Scholar

[38] Xu L C, Wei X K, Cao J N, et al. Embedding of embedding (eoe): joint embedding for coupled heterogeneous networks. In: Proceedings of the 10th ACM International Conference on Web Search and Data Mining, Cambridge, 2017. 741--749. Google Scholar

[39] Tang J, Cai K K, Su Z, et al. BigNet 2016: first workshop on big network analytics. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, München, 2016. 2505--2506. Google Scholar

[40] Fortunato S. Community detection in graphs. Phys Rep, 2010, 486: 75-174 CrossRef ADS arXiv Google Scholar

[41] Kong X, Zhang J, Yu P S. Inferring anchor links across multiple heterogeneous social networks. In: Proceedings of the 22nd ACM International on Conference on Information and Knowledge Management, New York, 2013. 179--188. Google Scholar

  • Figure 1

    TV drama – film coupling network

  • Figure 2

    Model: CWCNE

  • Figure 3

    Results of community detection in social coupling network. (a) Topology of social coupling network; protectłinebreak (b) visualization of node representation vectors

  • Figure 4

    Modularity contrast of algorithms with different $k$ on film coupling network

  • Figure 5

    The recognition results of the main body of social coupled network

  • Figure 6

    The influence of the known proportion of the coupling nodes on the body recognition

  • Figure 7

    Two label classification P-R curve

  • Figure 8

    The effect of step size on the coupling node vector. (a) Current network step size; (b) coupling network step size

  • Figure 9

    The effect of vector dimension on the coupling node vector

  •   

    Algorithm 1 CouplingSkipWalk

    Require:$G_{1}=(V,E),G_{2}=(V,E)$, coupling vertex CV, embedding size $d$, walks per vertex $\gamma$, walk length $t,t'$, window size $\omega$;

    Initialization: Sample $\Phi$ from $\mathbb{R}^{|V_{1}~V_{2}|\times~d}$;

    Build a binary Tree $T$ from $|V_{1}~V_{2}|$;

    for $i=0$ to $\gamma$

    $O~=~{\rm~suffle}(|V_{1}~V_{2}|)$;

    for $v_{i}\in~O$

    if $v_{i}~\not\in~{\rm~CV}$ then

    $W_{v_{i}}={\rm RandomWalk}(G,v_{i},t)$;

    end if

    if $v_{i}\in~CV$ then

    $W_{v_{i}}={\rm concat(RandomWalk}(G,v_{i},t-t'),{\rm RandomWalk}(G,v_{i}',v'))$;

    end if

    ${\rm~CouplingSkipGram}(\Phi,W_{v_{i}},\omega,{\rm~CV})$;

    end for

    end for

    Output:matrix of vertex representation $\Phi$∈ $\mathbb{R}^{|V_{1}~V_{2}|\times~d}$.

  • Table 1   The basic statistical indicators of 6 sets of coupled network datasets
    Dataset Number of vertexes Number of edges Number of vertexes Number of edges Number of
    in network 1 in network 1 in network 2 in network 2 coupling vertexes
    SCN 22 74 25 103 8
    ACN 2985414 25965384 1053188 3916907 733592
    FCN 9274 138065 2805 293848 1947
    PCN 3429936 27519883 92385 687327 46260
    WCN 372971 919276 17365 38247 15406
    MCN 10000 20000 10000 10000 2000
  •   

    Algorithm 2 CouplingSkipGram

    for $v_{j}\in~W_{v_{i}}$

    for $u_{k}\in~W_{v_{i}[j-\omega:j+\omega]}$

    $J(\Phi)=-{\rm~log}{\rm~Pr}(u_{k}|\Phi(v_{j}))$;

    $\Phi~=~\Phi-\alpha~~\frac{\partial~J}{\partial~\Phi}$;

    if $u_{k}~\in~{\rm~CV}$ then

    $J(\Phi)=-{\rm log}{\rm Pr}(u_{k}|\Phi(v_{j}))$;

    $\Phi~=~\Phi-\alpha~~\frac{\partial~J}{\partial~\Phi}$;

    end if

    end for

    end for

  • Table 2   Comparison of module degree of community detection in coupled network
    Dataset Deepwalk Node2vec LINE SDNE CWCNE
    SCN ($k=5$) 0.334 0.345 0.274 0.312 0.381
    ACN ($k=8$) 0.396 0.431 0.337 0.423 0.426
    FCN ($k=7$) 0.383 0.408 0.372 0.392 0.490
    PCN ($k=13$) 0.435 0.443 0.391 0.413 0.391
    WCN ($k=8$) 0.351 0.385 0.349 0.391 0.373
    MCN ($k=46$) 0.524 0.507 0.493 0.537 0.598
  • Table 3   Module degree of single network community detection
    Dataset Network Deepwalk Node2vec LINE SDNE CWCNE
    SCN SCN1 0.349 0.355 0.294 0.337 0.358
    SCN2 0.341 0.337 0.289 0.341 0.349
    ACN ACN1 0.423 0.429 0.342 0.425 0.425
    ACN2 0.411 0.423 0.318 0.412 0.415
    FCN FCN1 0.462 0.481 0.351 0.477 0.487
    FCN2 0.468 0.479 0.357 0.472 0.479
    PCN PCN1 0.440 0.460 0.403 0.429 0.463
    PCN2 0.439 0.452 0.398 0.424 0.457
    WCN WCN1 0.360 0.368 0.351 0.341 0.368
    WCN2 0.357 0.365 0.364 0.352 0.367
    MCN MCN1 0.516 0.503 0.486 0.529 0.572
    MCN2 0.504 0.491 0.481 0.523 0.548
  • Table 4   The recognition accuracy of the main body of the coupled network
    Dataset Deepwalk Node2vec LINE SDNE CWCNE
    SCN 0.187 0.187 0.174 0.192 0.250
    ACN 0.389 0.391 0.382 0.391 0.394
    FCN 0.437 0.437 0.425 0.439 0.452
    PCN 0.692 0.687 0.673 0.698 0.735
    WCN 0.517 0.526 0.527 0.531 0.564
    MCN 0.604 0.618 0.607 0.611 0.635
  • Table 5   Significant analysis of subject recognition results
    Dataset (%) SCN ACN FCN FCN WCN MCN
    CWCNE vs. DeepWalk 6.3** 0.5 1.5** 4.3** 4.7** 3.1**
    CWCNE vs. Node2vec 6.3** 0.3 1.5** 4.8** 3.8** 1.7**
    CWCNE vs. LINE 7.6** 1.2** 2.7** 6.2** 3.7** 2.8**
    CWCNE vs. SDNE 5.8** 0.3** 1.3** 3.7** 3.3 ** 2.4**
  • Table 6   Two label classification results
    Dataset Measure Deepwalk Node2vec LINE SDNE CWCNE
    SCN $P$ 0.751 0.872 0.793 0.852 0.875
    $R$ 0.681 0.603 0.640 0.623 0.619
    $F$1 0.714 0.712 0.708 0.720 0.725
    ACN $P$ 0.814 0.836 0.803 0.837 0.851
    $R$ 0.659 0.729 0.631 0697 0.783
    $F$1 0.728 0.778 0.707 0.761 0.815
    FCN $P$ 0.688 0.687 0.675 0.683 0.693
    $R$ 0.573 0.598 0.551 0.579 0.604
    $F$1 0.625 0.629 0.607 0.628 0.645
    PCN $P$ 0.937 0.922 0.908 0.925 0.950
    $R$ 0.892 0.872 0.883 0.868 0.895
    $F$1 0.913 0.896 0.895 0.896 0.921
  • Table 7   Accuracy rate of multi classification task
    Dataset Measure Deepwalk Node2vec LINE SDNE CWCNE
    SCN ($N=9$) $P$ 0.276 0.301 0.283 0.298 0.307
    $R$ 0.201 0.248 0.237 0.246 0.246
    $F$1 0.232 0.272 0.258 0.270 0.273
    ACN ($N=6$) $P$ 0.338 0.374 0.352 0.369 0.375
    $R$ 0.308 0.352 0.318 0.341 0.352
    $F$1 0.322 0.363 0.334 0.354 0.363
    PCN ($N=12$) $P$ 0.122 0.173 0.143 0.164 0.183
    $R$ 0.073 0.102 0.088 0.097 0.097
    $F$1 0.091 0.128 0.109 0.122 0.128
    WCN ($N=8$) $P$ 0.325 0.321 0.315 0.322 0.336
    $R$ 0.179 0.183 0.168 0.173 0.185
    $F$1 0.231 0.233 0.219 0.225 0.238

Copyright 2020  CHINA SCIENCE PUBLISHING & MEDIA LTD.  中国科技出版传媒股份有限公司  版权所有

京ICP备14028887号-23       京公网安备11010102003388号