SCIENTIA SINICA Informationis, Volume 50 , Issue 8 : 1197-1216(2020) https://doi.org/10.1360/N112018-00182

Coupling network vertex representation learning based on network embedding method

• AcceptedSep 5, 2019
• PublishedAug 6, 2020
Share
Rating

Abstract

Network representation learning is a basic problem in network data analysis. By learning network representation vectors, network vertices can be represented more accurately. With the development of deep learning, embedding methods have been widely used for network vertex representation learning. Providing that network data have changed in terms of their scale and modality, the research focus gradually shifted from single network mining to coupling network mining. This paper first analyzes the research status of embedding methods for single networks and then compares their advantages and disadvantages. Furthermore, the paper presents a model called CWCNE for coupling network embedding. The random walk and training algorithms of the proposed model are improved to adapt to coupling network features. The validity of the proposed model was verified using social, academic, film, poetry, and work coupling network data. Good results were obtained on community detection, entity recognition, and label classification tasks.

References

[1] Tan S L, Guan Z Y, Cai D, et al. Mapping users across networks by manifold alignment on hypergraph. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence, Québec City, 2014. 159--165. Google Scholar

[2] Zhao D, Wang L, Li S. Immunization of Epidemics in Multiplex Networks. PLoS ONE, 2014, 9: e112018 CrossRef PubMed ADS Google Scholar

[3] Wang W, Tang M, Zhang H F. Epidemic spreading on complex networks with general degree and weight distributions. Phys Rev E, 2014, 90: 042803 CrossRef PubMed ADS arXiv Google Scholar

[4] Zhang H F, Shu P P, Tang M, et al. Preferential imitation of vaccinating behavior can invalidate the targeted subsidy on complex network. 2015,. arXiv Google Scholar

[5] Lerman K, Ghosh R. Information contagion: an empirical study of the spread of news on digg and twitter social networks. Comput Sci, 2010, 52: 166--176. Google Scholar

[6] Li R Q, Tang M, Hui P M. Epidemic spreading on multi-relational networks. Journal of Physics, 2013, 62: 504-510 DOI: 10.7498/aps.62.168903. Google Scholar

[7] Cozzo E, Ba?os R A, Meloni S. Contact-based social contagion in multiplex networks. Phys Rev E, 2013, 88: 050801 CrossRef PubMed ADS arXiv Google Scholar

[8] Zhang X. Multilayer networks science: concepts, theories and data. Complex Syst Complex Sci, 2015, 12: 103--107. Google Scholar

[9] Mollgaard A, Zettler I, Dammeyer J, et al. Measure of Node Similarity in Multilayer Networks. 2016,. arXiv Google Scholar

[10] Granell C, Gómez S, Arenas A. Competing spreading processes on multiplex networks: Awareness and epidemics. Phys Rev E, 2014, 90: 012808 CrossRef PubMed ADS arXiv Google Scholar

[11] Zafarani R, Liu H. Connecting users across social media sites: a behavioral-modeling approach. In: Proceedings of the 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Chicago, 2013. 41--49. Google Scholar

[12] Zafarani R, Liu H. Connecting corresponding identities across communities. In: Proceedings of the 3rd International AAAI Conference on Weblogs and Social Media, San Jose, 2009. 354--357. Google Scholar

[13] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. In: Proceedings of the 27th Conference on Neural Information Processing Systems, Lake Tahoe, 2013. 3111--3119. Google Scholar

[14] Matsuno R, Murata T. MELL: effective embedding method for multiplex networks. In: Proceedings of the 27th World Wide Web Conference, Lyon, 2018. 1261--1268. Google Scholar

[15] Le Q V, Mikolov T. Distributed representations of sentences and documents. Comput Sci, 2014, 4: 1188--1196. Google Scholar

[16] Perozzi B, Alrfou R, Skiena S. DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2014. 701--710. Google Scholar

[17] Gligorijevic V, Panagakis Y, Zafeiriou S. Non-negative matrix factorizations for multiplex network analysis. IEEE Trans Pattern Anal, 2018, 41: 928--940. Google Scholar

[18] Lee D D, Seung H S. Learning the parts of objects by non-negative matrix factorization. Nature, 1999, 401: 788-791 CrossRef PubMed ADS Google Scholar

[19] Feng T, Li S Z, Shum H Y, et al. Local non-negative matrix factorization as a visual representation. In: Proceedings of the 2nd International Conference on Development and Learning, Cambridge, 2002. 178--183. Google Scholar

[20] Guillamet D, Vitrià J, Schiele B. Introducing a weighted non-negative matrix factorization for image classification. Pattern Recognition Lett, 2003, 24: 2447-2454 CrossRef Google Scholar

[21] Wang Y, Jia Y, Hu C, et al. Fisher non-negative matrix factorization for learning local features. In: Proceedings of the Asian Conference on Computer Vision, Jeju Island, 2004. 27--30. Google Scholar

[22] Gao Y, Church G. Improving molecular cancer class discovery through sparse non-negative matrix factorization.. Bioinformatics, 2005, 21: 3970-3975 CrossRef PubMed Google Scholar

[23] Yi L, Rong J, Liu Y. Semi-supervised multi-label learning by constrained non-negative matrix factorization. In: Proceedings of the 21st National Conference on Artificial Intelligence, Boston, 2006. 421--426. Google Scholar

[24] Carmonasaez P, Pascualmarqui R D, Tirado F, et al. Biclustering of gene expression data by non-smooth non-negativematrix factorization. BMC Bioinform, 2006, 7: 205--208. Google Scholar

[25] Tang J, Qu M, Wang M Z, et al. LINE: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, Florence, 2015. 1067--1077. Google Scholar

[26] Grover A, Leskovec J. Node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 2016. 855--864. Google Scholar

[27] Zhou N, Zhao W X, Zhang X. A General Multi-Context Embedding Model for Mining Human Trajectory Data. IEEE Trans Knowl Data Eng, 2016, 28: 1945-1958 CrossRef Google Scholar

[28] Cui P, Wang X, Pei J. A Survey on Network Embedding. IEEE Trans Knowl Data Eng, 2019, 31: 833-852 CrossRef Google Scholar

[29] Wang D X, Cui P, Zhu W W. Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 2016. 1225--1234. Google Scholar

[30] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. 2016,. arXiv Google Scholar

[31] Hamilton W, Ying Z, Leskovec J. Inductive representation learning on large graphs. In: Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, 2017. 1024--1034. Google Scholar

[32] Chen J, Ma T, Xiao C. Fastgcn: fast learning with graph convolutional networks via importance sampling. 2018,. arXiv Google Scholar

[33] Pan S R, Hu R Q, Long G D, et al. Adversarially regularized graph autoencoder for graph embedding. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, 2018. 2609--2615. Google Scholar

[34] Hinton G E, Salakhutdinov R R. Reducing the Dimensionality of Data with Neural Networks. Science, 2006, 313: 504-507 CrossRef PubMed ADS Google Scholar

[35] Wang H W, Wang J, Wang J L, et al. GraphGAN: graph representation learning with generative adversarial nets. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, 2018. 2508--2515. Google Scholar

[36] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. In: Proceedings of the 28th Conference on Neural Information Processing Systems, Montreal, 2014. 2672--2680. Google Scholar

[37] Qu M, Tang J, Shang J B, et al. An attention-based collaboration framework for multi-view network representation learning. In: Proceedings of the 26th ACM International Conference on Information and Knowledge Management, Singapore, 2017. 1767--1776. Google Scholar

[38] Xu L C, Wei X K, Cao J N, et al. Embedding of embedding (eoe): joint embedding for coupled heterogeneous networks. In: Proceedings of the 10th ACM International Conference on Web Search and Data Mining, Cambridge, 2017. 741--749. Google Scholar

[39] Tang J, Cai K K, Su Z, et al. BigNet 2016: first workshop on big network analytics. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, München, 2016. 2505--2506. Google Scholar

[40] Fortunato S. Community detection in graphs. Phys Rep, 2010, 486: 75-174 CrossRef ADS arXiv Google Scholar

[41] Kong X, Zhang J, Yu P S. Inferring anchor links across multiple heterogeneous social networks. In: Proceedings of the 22nd ACM International on Conference on Information and Knowledge Management, New York, 2013. 179--188. Google Scholar

• Figure 1

TV drama – film coupling network

• Figure 2

Model: CWCNE

• Figure 3

Results of community detection in social coupling network. (a) Topology of social coupling network; protectłinebreak (b) visualization of node representation vectors

• Figure 4

Modularity contrast of algorithms with different $k$ on film coupling network

• Figure 5

The recognition results of the main body of social coupled network

• Figure 6

The influence of the known proportion of the coupling nodes on the body recognition

• Figure 7

Two label classification P-R curve

• Figure 8

The effect of step size on the coupling node vector. (a) Current network step size; (b) coupling network step size

• Figure 9

The effect of vector dimension on the coupling node vector

•

Algorithm 1 CouplingSkipWalk

Require:$G_{1}=(V,E),G_{2}=(V,E)$, coupling vertex CV, embedding size $d$, walks per vertex $\gamma$, walk length $t,t'$, window size $\omega$;

Initialization: Sample $\Phi$ from $\mathbb{R}^{|V_{1}~V_{2}|\times~d}$;

Build a binary Tree $T$ from $|V_{1}~V_{2}|$;

for $i=0$ to $\gamma$

$O~=~{\rm~suffle}(|V_{1}~V_{2}|)$;

for $v_{i}\in~O$

if $v_{i}~\not\in~{\rm~CV}$ then

$W_{v_{i}}={\rm RandomWalk}(G,v_{i},t)$;

end if

if $v_{i}\in~CV$ then

$W_{v_{i}}={\rm concat(RandomWalk}(G,v_{i},t-t'),{\rm RandomWalk}(G,v_{i}',v'))$;

end if

${\rm~CouplingSkipGram}(\Phi,W_{v_{i}},\omega,{\rm~CV})$;

end for

end for

Output:matrix of vertex representation $\Phi$∈ $\mathbb{R}^{|V_{1}~V_{2}|\times~d}$.

• Table 1   The basic statistical indicators of 6 sets of coupled network datasets
 Dataset Number of vertexes Number of edges Number of vertexes Number of edges Number of in network 1 in network 1 in network 2 in network 2 coupling vertexes SCN 22 74 25 103 8 ACN 2985414 25965384 1053188 3916907 733592 FCN 9274 138065 2805 293848 1947 PCN 3429936 27519883 92385 687327 46260 WCN 372971 919276 17365 38247 15406 MCN 10000 20000 10000 10000 2000
•

Algorithm 2 CouplingSkipGram

for $v_{j}\in~W_{v_{i}}$

for $u_{k}\in~W_{v_{i}[j-\omega:j+\omega]}$

$J(\Phi)=-{\rm~log}{\rm~Pr}(u_{k}|\Phi(v_{j}))$;

$\Phi~=~\Phi-\alpha~~\frac{\partial~J}{\partial~\Phi}$;

if $u_{k}~\in~{\rm~CV}$ then

$J(\Phi)=-{\rm log}{\rm Pr}(u_{k}|\Phi(v_{j}))$;

$\Phi~=~\Phi-\alpha~~\frac{\partial~J}{\partial~\Phi}$;

end if

end for

end for

• Table 2   Comparison of module degree of community detection in coupled network
 Dataset Deepwalk Node2vec LINE SDNE CWCNE SCN ($k=5$) 0.334 0.345 0.274 0.312 0.381 ACN ($k=8$) 0.396 0.431 0.337 0.423 0.426 FCN ($k=7$) 0.383 0.408 0.372 0.392 0.490 PCN ($k=13$) 0.435 0.443 0.391 0.413 0.391 WCN ($k=8$) 0.351 0.385 0.349 0.391 0.373 MCN ($k=46$) 0.524 0.507 0.493 0.537 0.598
• Table 3   Module degree of single network community detection
 Dataset Network Deepwalk Node2vec LINE SDNE CWCNE SCN SCN1 0.349 0.355 0.294 0.337 0.358 SCN2 0.341 0.337 0.289 0.341 0.349 ACN ACN1 0.423 0.429 0.342 0.425 0.425 ACN2 0.411 0.423 0.318 0.412 0.415 FCN FCN1 0.462 0.481 0.351 0.477 0.487 FCN2 0.468 0.479 0.357 0.472 0.479 PCN PCN1 0.440 0.460 0.403 0.429 0.463 PCN2 0.439 0.452 0.398 0.424 0.457 WCN WCN1 0.360 0.368 0.351 0.341 0.368 WCN2 0.357 0.365 0.364 0.352 0.367 MCN MCN1 0.516 0.503 0.486 0.529 0.572 MCN2 0.504 0.491 0.481 0.523 0.548
• Table 4   The recognition accuracy of the main body of the coupled network
 Dataset Deepwalk Node2vec LINE SDNE CWCNE SCN 0.187 0.187 0.174 0.192 0.250 ACN 0.389 0.391 0.382 0.391 0.394 FCN 0.437 0.437 0.425 0.439 0.452 PCN 0.692 0.687 0.673 0.698 0.735 WCN 0.517 0.526 0.527 0.531 0.564 MCN 0.604 0.618 0.607 0.611 0.635
• Table 5   Significant analysis of subject recognition results
 Dataset (%) SCN ACN FCN FCN WCN MCN CWCNE vs. DeepWalk 6.3** 0.5 1.5** 4.3** 4.7** 3.1** CWCNE vs. Node2vec 6.3** 0.3 1.5** 4.8** 3.8** 1.7** CWCNE vs. LINE 7.6** 1.2** 2.7** 6.2** 3.7** 2.8** CWCNE vs. SDNE 5.8** 0.3** 1.3** 3.7** 3.3 ** 2.4**
• Table 6   Two label classification results
 Dataset Measure Deepwalk Node2vec LINE SDNE CWCNE SCN $P$ 0.751 0.872 0.793 0.852 0.875 $R$ 0.681 0.603 0.640 0.623 0.619 $F$1 0.714 0.712 0.708 0.720 0.725 ACN $P$ 0.814 0.836 0.803 0.837 0.851 $R$ 0.659 0.729 0.631 0697 0.783 $F$1 0.728 0.778 0.707 0.761 0.815 FCN $P$ 0.688 0.687 0.675 0.683 0.693 $R$ 0.573 0.598 0.551 0.579 0.604 $F$1 0.625 0.629 0.607 0.628 0.645 PCN $P$ 0.937 0.922 0.908 0.925 0.950 $R$ 0.892 0.872 0.883 0.868 0.895 $F$1 0.913 0.896 0.895 0.896 0.921
• Table 7   Accuracy rate of multi classification task
 Dataset Measure Deepwalk Node2vec LINE SDNE CWCNE SCN ($N=9$) $P$ 0.276 0.301 0.283 0.298 0.307 $R$ 0.201 0.248 0.237 0.246 0.246 $F$1 0.232 0.272 0.258 0.270 0.273 ACN ($N=6$) $P$ 0.338 0.374 0.352 0.369 0.375 $R$ 0.308 0.352 0.318 0.341 0.352 $F$1 0.322 0.363 0.334 0.354 0.363 PCN ($N=12$) $P$ 0.122 0.173 0.143 0.164 0.183 $R$ 0.073 0.102 0.088 0.097 0.097 $F$1 0.091 0.128 0.109 0.122 0.128 WCN ($N=8$) $P$ 0.325 0.321 0.315 0.322 0.336 $R$ 0.179 0.183 0.168 0.173 0.185 $F$1 0.231 0.233 0.219 0.225 0.238

Citations

• 0

Altmetric

Copyright 2020  CHINA SCIENCE PUBLISHING & MEDIA LTD.  中国科技出版传媒股份有限公司  版权所有