SCIENCE CHINA Information Sciences, Volume 63 , Issue 10 : 202103(2020) https://doi.org/10.1007/s11432-018-9943-9

## Dynamic network embedding via incremental skip-gram with negative sampling

• AcceptedJun 10, 2019
• PublishedSep 18, 2020
Share
Rating

### Abstract

Network representation learning, as an approach to learn low dimensional representations of vertices, has attracted considerable research attention recently.It has been proven extremely useful in many machine learning tasks over large graph.Most existing methods focus on learning the structural representations of vertices in a static network, but cannot guarantee an accurate and efficient embedding in a dynamic network scenario.The fundamental problem of continuously capturing the dynamic properties in an efficient way for a dynamic network remains unsolved.To address this issue, we present an efficient incremental skip-gram algorithm with negative sampling for dynamic network embedding, and provide a set of theoretical analyses to characterize the performance guarantee.Specifically, we first partition a dynamic network into the updated, including addition/deletion of links and vertices, and the retained networks over time.Then we factorize the objective function of network embedding into the added, vanished and retained parts of the network. Next we provide a new stochastic gradient-based method, guided by the partitions of the network, to update the nodes and the parameter vectors. The proposed algorithm is proven to yield an objective function value with a bounded difference to that of the original objective function.The first order moment of the objective difference converges in order of $\mathbb{O}(\frac{1}{n^2})$, and the second order moment of the objective difference can be stabilized in order of $\mathbb{O}(1)$.Experimental results show that our proposal can significantly reduce the training time while preserving the comparable performance.We also demonstrate the correctness of the theoretical analysis and the practical usefulness of the dynamic network embedding.We perform extensive experiments on multiple real-world large network datasets over multi-label classification and link prediction tasks to evaluate the effectiveness and efficiency of the proposed framework, and up to 22 times speedup has been achieved.

### Acknowledgment

This work was supported by National Key RD Program of China (Grant No. 2016YFB1000103) and National Natural Science Foundation of China (Grant Nos. 61872022, 61772151, 61421003, SKLSDE-2018ZX-16).

### Supplement

Appendix

To prove the Lemma 1, we begin by examining the upper- and lower-bounds of ${\rm~E}[X_{i,w}Y_{j,v}Y_{k,v}]$.

Lemma 1. For any $j$ and $k$ such that $j\le~k$, we have \begin{aligned} {\rm E}[X_{i,w}Y_{j,v}Y_{k,v}]&\leqslant\frac{(jk-2j-k+2)\mu_{w}\mu_{v}^2+2j+k-2}{jk}, \\ {\rm E}[X_{i,w}Y_{j,v}Y_{k,v}]&\geqslant\frac{(jk-2j-k+2)\mu_{w}\mu_{v}^2}{jk}. \end{aligned} \tag{25}

proof We have \begin{aligned} {\rm E}[X_{i,w}Y_{j,v}Y_{k,v}]&= {\rm E}\left[X_{i,w}\left(\frac{1}{j}\sum_{l=1}^{j}X_{l,v}\right)\left(\frac{1}{k}\sum_{m=1}^{k}X_{m,v}\right)\right] \\ &= \sum_{l=1}^{j}\sum_{m=1}^{k}\frac{{\rm E}[X_{i,w}X_{l,v}X_{m,v}]}{jk}. \end{aligned} \tag{26}

To prove the lemma, we rewrite the expression by splitting the set of $(l,m)$ into two subsets. Let $S_{i}^{(j,k)}~(j\le~k)$ be a set of $(l,m)$ such that $X_{i,w}$, $X_{l,v}$, and $X_{m,v}$ are independent from each other (i.e., $i,l$ and $m$ are all different), and let $\bar{S}_{i}^{(j,k)}$ be its complementary set, \begin{aligned} S_{i}^{(j,k)} = &\{(l,m)\in \{1,2,3,\ldots,j\}\times\{1,2,3,\ldots,k\}|i\neq l\wedge l\neq m\wedge m \neq i \}, \\ \bar{S}_{i}^{(j,k)} = &\{1,2,3,\ldots,j\}\times\{1,2,3,\ldots,k\}/S_{i}^{(j,k)}. \end{aligned} \tag{27}

Then ${\rm~E}[X_{i,w}Y_{j,v}Y_{k,v}]$ is upper-bounded as

\begin{aligned} {\rm E}[X_{i,w}Y_{j,v}Y_{k,v}] &= \sum_{(l,m)\in S_{i}^{(j,k)}}\frac{{\rm E}[X_{i,w}]{\rm E}[X_{l,v}]{\rm E}[X_{m,v}]}{jk}+ \sum_{(l,m)\in \bar{S}_{i}^{(j,k)}}\frac{{\rm E}[X_{i,w}X_{l,v}X_{m,v}]}{jk} \\ & \leq \sum_{(l,m)\in S_{i}^{(j,k)}}\frac{\mu_{w}\mu_{v}^2}{jk} + \sum_{(l,m)\in \bar{S}_{i}^{(j,k)}}\frac{1}{jk} \\ &=\frac{|S_{i}^{(j,k)}|\mu_{w}\mu_{v}^2+|\bar{S}_{i}^{(j,k)}|}{jk}, \end{aligned} \tag{28}

where the inequality holds because $X_{i,w}$, $Y_{l,v}$, and $Y_{m,v}$ are binary random variables and thus ${\rm~E}[X_{i,w}Y_{l,v}Y_{m,v}]\leq~1$. Here, we have $\bar{S}_{i}^{(j,k)}=2j+k-2$, because $\bar{S}_{i}^{(j,k)}$ includes $j$ elements such that $l=m$ and also includes $k-1$ and $j-1$ elements such that $i=l\neq~m$ and $i=m\neq~l$, respectively. And we consequently have $|S_{i}^{(j,k)}|~=~jk-|\bar{S}_{i}^{(j,k)}|~=~jk-2j-k+2$. Therefore, the upper-bound can be rewritten as \begin{aligned} {\rm E}[X_{i,w}Y_{j,v}Y_{k,v}]&\leqslant\frac{(jk-2j-k+2)\mu_{w}\mu_{v}^2+2j+k-2}{jk}. \end{aligned} \tag{29} Similarly, by making use of $0\leq~{\rm~E}[X_{i,w}Y_{l,v}Y_{m,v}]$, the lower-bound can be derived: \begin{aligned} {\rm E}[X_{i,w}Y_{j,v}Y_{k,v}] &= \sum_{(l,m)\in S_{i}^{(j,k)}}\frac{{\rm E}[X_{i,w}]{\rm E}[X_{l,v}]{\rm E}[X_{m,v}]}{jk}+ \sum_{(l,m)\in \bar{S}_{i}^{(j,k)}}\frac{{\rm E}[X_{i,w}X_{l,v}X_{m,v}]}{jk} \\ & \geq \sum_{(l,m)\in S_{i}^{(j,k)}}\frac{\mu_{w}\mu_{v}^2}{jk} + \sum_{(l,m)\in \bar{S}_{i}^{(j,k)}}\frac{0}{jk} \\ &=\frac{|S_{i}^{(j,k)}|\mu_{w}\mu_{v}^2}{jk}=\frac{(jk-2j-k+2)\mu_{w}\mu_{v}^2}{jk}. \end{aligned} \tag{30}

### References

[1] Hamilton W L, Ying R, Leskovec J. Representation learning on graphs: Methods and applications. In: Proceedings of IEEE Data Engineering Bulletin, 2017. Google Scholar

[2] Cavallari S, Zheng V W, Cai H Y, et al. Learning community embedding with community detection and node embedding on graphs. In: Proceedings of ACM International Conference on Information and Knowledge Management, 2017. 377--386. Google Scholar

[3] Shi C, Hu B, Zhao W X. Heterogeneous Information Network Embedding for Recommendation. IEEE Trans Knowl Data Eng, 2019, 31: 357-370 CrossRef Google Scholar

[4] Hu R J, Aggarwal C C, Ma S, et al. An embedding approach to anomaly detection. In: Proceedings of 2016 IEEE 32nd International Conference on Data Engineering (ICDE), Helsinki, 2016. 385--396. Google Scholar

[5] Grover A, Leskovec J. node2vec: Scalable feature learning for networks. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2016. 855--864. Google Scholar

[6] Perozzi B, Al-Rfou R, Skiena S. Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. New York: ACM, 2014. 701--710. Google Scholar

[7] Tang J, Qu M, Wang M Z, et al. Line: Large-scale information network embedding. In: Proceedings of International World Wide Web Conference, 2015. 1067--1077. Google Scholar

[8] Tang J, Qu M, Mei Q Z. Pte: Predictive text embedding through large-scale heterogeneous text networks. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2015. 1165--1174. Google Scholar

[9] He Y, Li J X, Song Y Q, He M, et al. Time-evolving text classification with deep neural networks. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence Press, 2018. 2241--2247. Google Scholar

[10] Ren X, He W Q, Qu M, et al. Label noise reduction in entity typing by heterogeneous partial-label embedding. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2016. 1825--1834. Google Scholar

[11] Wang D X, Cui P, Zhu W W. Structural deep network embedding. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2016. 1225--1234. Google Scholar

[12] Cui P, Wang X, Pei J. A Survey on Network Embedding. IEEE Trans Knowl Data Eng, 2019, 31: 833-852 CrossRef Google Scholar

[13] Li C Z, Wang S Z, Yang D J, et al. Ppne: property preserving network embedding. In: Proceedings of International Conference on Database Systems for Advanced Applications, Springer, 2017. 163--179. Google Scholar

[14] Yang D J, Wang S Z, Li C Z, et al. From properties to links: Deep network embedding on incomplete graphs. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. New York: ACM, 2017. 367--376. Google Scholar

[15] Zhang H M, Qiu L W, Yi L L, et al. Scalable multiplex network embedding. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence Press, 2018. 3082--3088. Google Scholar

[16] Trivedi R, Dai H J, Wang Y C, et al. Know-evolve: Deep temporal reasoning for dynamic knowledge graphs. In: Proceedings of International Conference on Machine Learning, 2017. 3462--3471. Google Scholar

[17] Zuo Y, Liu G N, Lin H, et al. Embedding temporal network via neighborhood formation. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2018. 2857--2866. Google Scholar

[18] Zhu L H, Guo D, Yin J M. Scalable Temporal Latent Space Inference for Link Prediction in Dynamic Social Networks. IEEE Trans Knowl Data Eng, 2016, 28: 2765-2777 CrossRef Google Scholar

[19] Hamilton W L, Ying R, Leskovec J. Inductive representation learning on large graphs. In: Proceedings of Annual Conference on Neural Information Processing Systems, 2017. Google Scholar

[20] Chen J F, Zhang Q, Huang X J. Incorporate group information to enhance network embedding. In: Proceedings of ACM International Conference on Information and Knowledge Management, 2016. 1901--1904. Google Scholar

[21] Cao S S, Lu W, Xu Q K. Grarep:learning graph representations with global structural information. In: Proceedings of ACM International Conference on Information and Knowledge Management, 2015. 891--900. Google Scholar

[22] Yang C, Sun M S, Liu Z Y, et al. Fast network embedding enhancement via high order proximity approximation. In: Proceedings of International Joint Conference on Artificial Intelligence, 2017. Google Scholar

[23] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. In: Proceedings of Annual Conference on Neural Information Processing Systems, 2013. 1--9. Google Scholar

[24] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. Computer Science, 2013. Google Scholar

[25] Morin F, Bengio Y. Hierarchical probabilistic neural network language model. In: Proceedings of International Conference on Artificial Intelligence and Statistics, 2005. 5: 246--252. Google Scholar

[26] Gutmann M U, Hyvarinen A. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. Journal of Machine Learning Research, 2012, 13: 307--361. Google Scholar

[27] Levy O , Goldberg Y. Neural word embedding as implicit matrix factorization. 2014. In: Proceedings of Advances in Neural Information Processing Systems 27 (NIPS 2014). Google Scholar

[28] Mnih A, Teh Y W. A fast and simple algorithm for training neural probabilistic language models. In: Proceedings of the 29th International Coference on International Conference on Machine Learning, 2012. 419--426. Google Scholar

[29] Ribeiro L F R, Saverese P H P, Figueiredo D R. struc2vec : Learning node representations from structural identity. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2017. 385--394. Google Scholar

[30] Claire D, Marinka Z, David H, Jure L. Learning structural node embeddings via diffusion wavelets. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2018. 1320--1329. Google Scholar

[31] Li J D, Dani H, Hu X, et al. Attributed network embedding for learning in a dynamic environment. In: Proceedings of ACM International Conference on Information and Knowledge Management, 2017. 387--396. Google Scholar

[32] Jian L, Li J D, Liu H. Toward online node classification on streaming networks. In: Proceedings of International Conference on Data Mining and Knowledge Discovery, 2018. 231--257. Google Scholar

[33] Xu K S, Hero A O. Dynamic stochastic blockmodels: Statistical models for time-evolving networks. In: Proceedings of International Conference on Social Computing, Behavioral-Cultural Modeling & Prediction and Behavior Representation in Modeling and Simulation, 2013. 201--210. Google Scholar

[34] Zhou L, Yang Y, Ren X, et al. Dynamic Network Embedding by Modelling Triadic Closure Process In: Proceedings of AAAI Conference on Artificial Intelligence, 2018. Google Scholar

[35] Du L, Wang Y, Song G J, et al. Dynamic network embedding: An extended approach for skip-gram based network embedding. In: Proceedings of International Joint Conference on Artificial Intelligence, 2018. 2086--2092. Google Scholar

[36] Peng H, Li J X, Song Y Q, et al. Incrementally learning the hierarchical softmax function for neural language models. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, 2017. Google Scholar

[37] Kaji N, Kobayashi H. Incremental skip-gram model with negative sampling. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, 2017. Google Scholar

[38] Rudolph M, Blei D. Dynamic embeddings for language evolution. In: Proceedings of the 2018 World Wide Web Conference, pages 1003--1011. International World Wide Web Conferences Steering Committee, 2018. 1003--1011. Google Scholar

[39] Peng H, Bao M J, Li J X. Incremental term representation learning for social network analysis. Future Generation Comput Syst, 2018, 86: 1503-1512 CrossRef Google Scholar

[40] Zafarani, Liu H. Social computing data repository. Google Scholar

[41] Tang L, Liu H. Relational learning via latent social dimensions. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2009. 817--826. Google Scholar

[42] Leskovec J, Mcauley J J. Learning to discover social circles in ego networks. In: Proceedings of Annual Conference on Neural Information Processing Systems, 2012. Google Scholar

[43] Leskovec J, Kleinberg J, Faloutsos C. Graph evolution. ACM Trans Knowl Discov Data, 2007, 1: 2-es CrossRef Google Scholar

[44] Yang J, Leskovec J. Defining and evaluating network communities based on ground-truth. In: Proceedings of ACM SIGKDD Workshop on Mining Data Semantics, 2012. 181--213. Google Scholar

[45] Fan R E, Chang K W, Cho Jui Hsieh, Xiang Rui Wang, and Chih Jen Lin. Liblinear: A library for large linear classification. Journal of Machine Learning Research, 2008, 9: 1871--1874. Google Scholar

[46] Dong Y X, Chawla N V, Swami A. metapath2vec: Scalable representation learning for heterogeneous networks. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2017. 135--144. Google Scholar

[47] Goyal P, Ferrara E. Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Syst, 2018, 151: 78-94 CrossRef Google Scholar

• Figure 1

(Color online) An illustration of the temporal evolving of the dynamic network. The green vertices and edges constitute the initial network in time $t$. The vertex $V_{6}$ (marked in orange color) and the corresponding edge $(V_{1},V_{6})$ (marked in orange color) emerge in time $t+1$. The vertex $V_{3}$ (marked in dotted line) and the corresponding edge $(V_{1},V_{3})$ (marked in dotted line) vanish in time $t+2$.

• Figure 2

(Color online) Number of the neighbor nodes with different settings.

• Figure 3

(Color online) Training time of different methods under different settings. (a) DeepWalk; (b) node2vec.

• Figure 4

(Color online) Speedup performances in dynamic DeepWalk (a) and dynamic node2vec (b) with different settings.

• Figure 5

(Color online) First-order moment (a) and second-order moment (b) with different change rates.

• Table 1

Table 1Statistics of the dynamic network datasets

 Name $|V|$ $|E|$ Label Time step Wikipedia 1985098 1000924086 7 16 BlogCatalog 10312 333983 39 20 Flickr 80513 5899882 195 100 Facebook 1715256 22613981 – 24 ArXiv 18722 198110 195 16 DBLP 524061 20580238 100 730
• Table 2

Table 2The evolving procedure of the dynamic DBLP network

 Window sliding rate (%) 0.1 0.5 1 1.5 2.00 Edge (+) 2018 4547 7266 9937 13498 Edge ($-$) 1527 3381 6464 9884 12860 Window sliding rate (%) 2.5 3 3.5 4 – Edge (+) 17054 22842 26340 29217 – Edge ($-$) 16515 19353 22852 25860 –
• Table 3

Table 3Multi-label classification results by DeepWalk (%)$^{\rm~a)}$

 Item DS Metric 10% 20% 30% 40% 50% 60% 70% 80% 90% Dy BC Mi-F1 36.02 36.21 39.61 40.28 41.11 41.29 41.51 41.47 42.05 Gl BC Mi-F1 36.00 36.20 39.60 40.30 41.00 41.30 41.50 41.50 42.00 Dy BC Ma-F1 21.31 23.81 25.31 26.29 27.33 27.60 27.90 28.18 28.92 Gl BC Ma-F1 21.30 23.80 25.30 26.30 27.30 27.60 27.90 28.20 28.90 Dy Fl Mi-F1 32.30 34.59 36.11 36.88 37.21 37.77 38.05 38.43 38.88 Gl Fl Mi-F1 32.44 34.61 35.94 36.79 37.21 37.79 38.13 38.41 38.76 Dy Fl Ma-F1 13,91 17.14 19.74 21.16 22.07 22.75 23.55 24.11 24.78 Gl Fl Ma-F1 14.06 17.17 19.69 21.11 22.05 22.78 23.62 24.10 24.72 Dy Wp Mi-F1 78.87 79.91 80.42 80.72 80.93 81.15 81.27 81.33 81.42 Gl Wp Mi-F1 78.86 79.93 80.41 80.69 80.93 81.16 81.25 81.35 81.43 Dy Wp Ma-F1 78.72 79.74 80.33 80.56 80.81 80.93 81.11 81.21 81.21 Gl Wp Ma-F1 78.71 79.76 80.32 80.50 80.81 80.94 81.10 81.23 81.31

a) We show the best results with boldface.

• Table 4

Table 4Multi-label classification results by node2vec (%)$^{\rm~a)}$

 Item DS Metric 10% 20% 30% 40% 50% 60% 70% 80% 90% Dy BC Mi-F1 36.71 37.19 39.99 40.30 41.29 42.06 41.44 42.57 42.87 Gl BC Mi-F1 36.70 37.17 39.98 40.30 41.27 42.06 41.46 42.58 42.86 Dy BC Ma-F1 21.40 23.97 25.37 26.39 27.51 27.69 27.96 28.21 28.97 Gl BC Ma-F1 21.40 23.96 25.37 26.38 27.50 27.70 27.97 28.21 28.96 Dy Fl Mi-F1 33.59 35.15 37.11 37.93 38.26 38.91 38.99 39.14 39.44 Gl Fl Mi-F1 33.57 35.16 36.96 37.84 38.27 38.90 38.95 39.17 39.42 Dy Fl Ma-F1 14.11 18.21 20.41 22.25 23.27 23.26 24.72 25.81 25.91 Gl Fl Ma-F1 14.12 18.18 20.43 22.24 23.30 23.28 24.68 25.79 25.94 Dy Wp Mi-F1 79.05 80.05 80.75 80.89 81.39 81.33 81.55 81.62 81.69 Gl Wp Mi-F1 79.04 80.05 80.74 80.87 81.38 81.31 81.55 81.63 81.69 Dy Wp Ma-F1 78.97 79.82 80.60 80.71 81.28 80.26 81.11 81.47 81.57 Gl Wp Ma-F1 78.96 79.83 80.59 80.70 81.27 80.25 81.10 81.48 81.56

a) We show the best results with boldface.

• Table 5

Table 5Multi-label classification results comparison of different embedding methods

 Dataset Algorithms Micro-F1 Macro-F1 BC Dynamic DeepWalk 42.05% 28.92% BC Dynamic node2vec 42.87% 28.97% BC DANE[31] 43.27% 29.12% BC Dynamic SBM[33] 39.41% 22.63% BC DNE[35] 40.75% 26.13% Fl Dynamic DeepWalk 38.88% 24.78% Fl Dynamic node2vec 39.44% 25.91% Fl DANE[31] 32.81% 20.75% Fl Dynamic SBM[33] 36.52% 23.87% Fl DNE[35] 37.87% 24.28%
• Table 6

Table 6Area under curve (AUC) scores for link prediction

 Dataset Algorithm Average Hadamard Weighted-L1 Weighted-L2 Fb Dynamic DeepWalk 0.7268 0.9548 0.9474 0.9536 Fb Global DeepWalk 0.7261 0.9544 0.9461 0.9535 Fb Dynamic node2vec 0.7266 0.9555 0.9504 0.9526 Fb Global node2vec 0.7264 0.9554 0.9503 0.9524 Ax Dynamic DeepWalk 0.7058 0.9275 0.8186 0.8278 Ax Global Deepwalk 0.7056 0.9274 0.8183 0.8276 Ax Dynamic node2vec 0.7204 0.9305 0.8371 0.8474 Ax Global node2vec 0.7203 0.9305 0.8371 0.8474

Citations

• #### 0

Altmetric

Copyright 2020  CHINA SCIENCE PUBLISHING & MEDIA LTD.  中国科技出版传媒股份有限公司  版权所有