logo

SCIENCE CHINA Information Sciences, Volume 63 , Issue 11 : 212205(2020) https://doi.org/10.1007/s11432-020-2827-9

An evolutionary autoencoder for dynamic community detection

More info
  • ReceivedFeb 15, 2020
  • AcceptedMar 12, 2020
  • PublishedSep 28, 2020

Abstract

Dynamic community detection is significant for controlling and capturing the temporal features of networks. The evolutionary clustering framework provides a temporal smoothness constraint for simultaneously maximizing the clustering quality at the current time step and minimizing the clustering deviation between two successive time steps. Based on this framework, some existing methods, such as the evolutionary spectral clustering and evolutionary nonnegative matrix factorization, aim to look for the low-dimensional representation by mapping reconstruction. However, such reconstruction does not address the nonlinear characteristics of networks. In this paper, we propose a semi-supervised algorithm (sE-Autoencoder) to overcome the effects of nonlinear property on the low-dimensional representation. Our proposed method extends the typical nonlinear reconstruction model to the dynamic network by constructing a temporal matrix. More specifically, the potential community characteristics and the previous clustering, as the prior information, are incorporated into the loss function as a regularization term. Experimental results on synthetic and real-world datasets demonstrate that the proposed method is effective and superior to other methods for dynamic community detection.


Acknowledgment

This work was supported by National Natural Science Foundation of China (Grant Nos. U1803263, 6197618, 11931015, 81961138010) and Natural Science Foundation of Chongqing (Grant Nos. cstc2018jcyjAX0274, cstc2019- jcyj-zdxmX0025), Key Area RD Program of Guangdong Province (Grant No. 2019B010137004), Key Area RD Program of Shaanxi Province (Grant No. 2019ZDLGY17-07), and the Fundamental Research Funds for the Central Universities (Grant No. 3102019PJ006).


References

[1] Li A, Cornelius S P, Liu Y Y. The fundamental advantages of temporal networks. Science, 2017, 358: 1042-1046 CrossRef ADS arXiv Google Scholar

[2] Cao J, Bu Z, Wang Y. Detecting Prosumer-Community Groups in Smart Grids From the Multiagent Perspective. IEEE Trans Syst Man Cybern Syst, 2019, 49: 1652-1664 CrossRef Google Scholar

[3] Gosak M, Markovi? R, Dolen?ek J. Network science of biological systems at different scales: A review. Phys Life Rev, 2018, 24: 118-135 CrossRef ADS Google Scholar

[4] Li X, Kurths J, Gao C. A Hybrid Algorithm for Estimating Origin-Destination Flows. IEEE Access, 2018, 6: 677-687 CrossRef Google Scholar

[5] Jalili M, Perc M. Information cascades in complex networks. J Complex Networks, 2017, 74, CrossRef Google Scholar

[6] Pesantez-Cabrera P, Kalyanaraman A. Efficient Detection of Communities in Biological Bipartite Networks. IEEE/ACM Trans Comput Biol Bioinf, 2019, 16: 258-271 CrossRef Google Scholar

[7] Yamir M, Matjaz P. Focus on multilayer networks. New Journal of Physics, 2020, 22: 010201 DOI: 10.1088/1367-2630/ab4fcb. Google Scholar

[8] Li X, Wang Z, Gao C. Reasoning human emotional responses from large-scale social and public media. Appl Math Computation, 2017, 310: 182-193 CrossRef Google Scholar

[9] Hajek B, Wu Y, Xu J. Information Limits for Recovering a Hidden Community. IEEE Trans Inform Theor, 2017, 63: 4729-4745 CrossRef Google Scholar

[10] Gao C, Liang M, Li X. IEEE/ACM Trans Comput Biol Bioinf, 2018, 15: 1916-1928 CrossRef Google Scholar

[11] Zhu P, Dai X, Li X. Community detection in temporal networks via a spreading process. EPL, 2019, 126: 48001 CrossRef Google Scholar

[12] PR-Index: Using the h-Index and PageRank for Determining True Impact. PLoS ONE, 2016, 11: e0161755 CrossRef Google Scholar

[13] Huttlin E L, Bruckner R J, Paulo J A. Architecture of the human interactome defines protein communities and disease networks. Nature, 2017, 545: 505-509 CrossRef Google Scholar

[14] Helbing D, Brockmann D, Chadefaux T. Saving Human Lives: What Complexity Science and Information Systems can Contribute. J Stat Phys, 2015, 158: 735-781 CrossRef ADS arXiv Google Scholar

[15] Network-Based Modeling for Characterizing Human Collective Behaviors During Extreme Events. IEEE Trans Syst Man Cybern Syst, 2017, 47: 171-183 CrossRef Google Scholar

[16] Rossetti G, Cazabet R. Community Discovery in Dynamic Networks. ACM Comput Surv, 2018, 51: 1-37 CrossRef Google Scholar

[17] Lei Tang , Huan Liu , Jianping Zhang . Identifying Evolving Groups in Dynamic Multimode Networks. IEEE Trans Knowl Data Eng, 2012, 24: 72-85 CrossRef Google Scholar

[18] Asur S, Parthasarathy S, Ucar D. An event-based framework for characterizing the evolutionary behavior of interaction graphs. In: Prcoeedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2007. 913--921. Google Scholar

[19] Chakrabarti D, Kumar R, Tomkins A. Evolutionary clustering. In: Prcoeedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006. 554--560. Google Scholar

[20] Chi Y, Song X, Zhou D. On evolutionary spectral clustering. ACM Trans Knowl Discov Data, 2009, 3: 1-30 CrossRef Google Scholar

[21] Ma X, Dong D. Evolutionary Nonnegative Matrix Factorization Algorithms for Community Detection in Dynamic Networks. IEEE Trans Knowl Data Eng, 2017, 29: 1045-1058 CrossRef Google Scholar

[22] Gerlach M, Peixoto T P, Altmann E G. A network approach to topic models. Sci Adv, 2018, 4: eaaq1360 CrossRef ADS arXiv Google Scholar

[23] Peel L, Larremore D B, Clauset A. The ground truth about metadata and community detection in networks. Sci Adv, 2017, 3: e1602548 CrossRef ADS arXiv Google Scholar

[24] Shao J M, Zhang Z, Yu Z J, et al. Community detection and link prediction via cluster driven low rank matrix completion. In: Prcoeedings of the International Joint Conference on Artificial Intelligence, 2019. 3382--3388. Google Scholar

[25] Yang L, Cao X C, He D X, et al. Modularity based community detection with deep learning. In: Prcoeedings of the International Joint Conference on Artificial Intelligence, 2016. 2252--2258. Google Scholar

[26] Yang L, Cao X, Jin D. A Unified Semi-Supervised Community Detection Framework Using Latent Space Graph Regularization. IEEE Trans Cybern, 2015, 45: 2585-2598 CrossRef Google Scholar

[27] Ak?ay E. Collapse and rescue of cooperation in evolving dynamic networks. Nat Commun, 2018, 9: 2692 CrossRef ADS Google Scholar

[28] Gao C, Chen Z, Li X. Multiobjective discrete particle swarm optimization for community detection in dynamic networks. EPL, 2018, 122: 28001 CrossRef ADS Google Scholar

[29] Lin Y R, Chi Y, Zhu S. Analyzing communities and their evolutions in dynamic social networks. ACM Trans Knowl Discov Data, 2009, 3: 1-31 CrossRef Google Scholar

[30] Kim M S, Han J. A particle-and-density based evolutionary clustering method for dynamic networks. Proc VLDB Endow, 2009, 2: 622-633 CrossRef Google Scholar

[31] Folino F, Pizzuti C. An Evolutionary Multiobjective Approach for Community Discovery in Dynamic Networks. IEEE Trans Knowl Data Eng, 2014, 26: 1838-1852 CrossRef Google Scholar

[32] Deb K, Pratap A, Agarwal S. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Computat, 2002, 6: 182-197 CrossRef Google Scholar

[33] de Jesús Rubio J, Angelov P, Pacheco J. Uniformly Stable Backpropagation Algorithm to Train a Feedforward Neural Network. IEEE Trans Neural Netw, 2011, 22: 356-366 CrossRef Google Scholar

[34] Girvan M, Newman M E J. Community structure in social and biological networks. Proc Natl Acad Sci USA, 2002, 99: 7821-7826 CrossRef ADS arXiv Google Scholar

[35] Greene D, Doyle D, Cunningham P. Tracking the evolution of communities in dynamic social networks. In: Prcoeedings of International Conference on Advances in Social Networks Analysis and Mining, 2010. 176--183. Google Scholar

[36] Danon L, Díaz-Guilera A, Duch J. Comparing community structure identification. J Stat Mech, 2005, 2005(09): P09008 CrossRef Google Scholar

[37] Newman M E J, Girvan M. Finding and evaluating community structure in networks. Phys Rev E, 2004, 69: 026113 CrossRef ADS arXiv Google Scholar

[38] Li Z, Zhang S, Wang R S. Quantitative function for community detection. Phys Rev E, 2008, 77: 036109 CrossRef ADS Google Scholar

[39] Wang P, Gao L, Ma X. Dynamic community detection based on network structural perturbation and topological similarity. J Stat Mech, 2017, 2017(1): 013401 CrossRef Google Scholar

[40] Mucha P J, Richardson T, Macon K. Community Structure in Time-Dependent, Multiscale, and Multiplex Networks. Science, 2010, 328: 876-878 CrossRef ADS arXiv Google Scholar

[41] Liu F, Choi D, Xie L. Global spectral clustering in dynamic networks. Proc Natl Acad Sci USA, 2018, 115: 927-932 CrossRef Google Scholar

[42] Liu F, Wu J, Xue S. Detecting the evolving community structure in dynamic social networks. World Wide Web, 2020, 23: 715-733 CrossRef Google Scholar

[43] Wang C Y, Deng Y, Li X H, et al. A label-based nature heuristic algorithm for dynamic community detection. In: Proceedings of Pacific Rim International Conference on Artificial Intelligence, 2019. 621--632. Google Scholar

  • Figure 1

    (Color online) The overview of the sE-Autoencoder algorithm, which consists of (a) adjacent matrix $A_t$ construction, (b) similarity matrix $S_t$ construction, (c) deep nonlinear reconstruction based on the stacked autoencoder, protectłinebreak (d) the potential community structure and the previous clustering, as the prior information, which are reconstructed in the semi-supervised autoencoder based on the evolutionary clustering framework, (e) the low dimension representation based on the feature extraction, (f) nonlinear mapping which fulfills the real-world feature, and (g) $K$-means adopted to discover community structure in nonlinear representation. There is a significant difference between nonlinear properties of the real-world and the linear mapping based on the NMF highlighted by the gray background. In contrast to the NMF, the nonlinear feature embodied in our proposed method is more suitable for characterizing the real social networks.

  • Figure 2

    (Color online) The process of constructing the similarity matrix. A network at $t$ is first converted as an adjacent matrix $A$. Then the similarity matrix $S_{t}$ is computed by (2). More specifically, the numbers of neighbor nodes ${\rm~Neighbor}(v_{i})$ and ${\rm~Neighbor}(v_{j})$ are computed in $v_{i}$ and $v_{j}$, respectively. The common neighbor nodes are obtained by the intersection between ${\rm~Neighbor}(v_{i})$ and ${\rm~Neighbor}(v_{j})$.

  • Figure 3

    (Color online) An example of the autoencoder. The input ${\boldsymbol~S}$ is obtained by (2). The process of encoding maps the similarity matrix ${\boldsymbol~S}$ to a low-dimension embedding ${\boldsymbol~H}$ by (3). The process of decoding restores the latent representation ${\boldsymbol~H}$ to the matrix ${\boldsymbol~M}$ by (4). The low-dimensional encoding ${\boldsymbol~H}$ in the hidden layer of the autoencoder block is known as a community-oriented network representation. We adopt $K$-means to detect the ${\boldsymbol~H}$ community structure. The same color represents the same community. Thus the network $G_{t}$ has discovered two communities.

  • Figure 4

    (Color online) Comparisons of NMI and ERROR among different algorithms on SYN dataset in which the average degree is 16 and $Z$ = 5. sE-Autoencoder outperforms other algorithms at every time step. Similarly, the low ERROR indicates that our proposed method has better robustness. (a) NMI when $nC$% = 10%; (b) NMI when $nC$% = 30%; (c) ERROR when $nC$% = 10%; (d) ERROR when $nC$% = 30%.

  • Figure 5

    (Color online) Comparisons of NMIand ERRORamong different algorithms on SYN dataset in which the average degree is 16 and $Z$ = 6. Although sE-Autoencoder has a little fluctuation as time goes on, such a method demonstrates the capacity of detecting dynamic community structure. In values of ERROR sE-Autoencoder performs more effectively than other algorithms expect for PisCES. (a) NMI when $nC%=10%$; (b) NMI when $nC%=30%$; (c) ERROR when $nC%=10%$; (d) ERROR when $nC%=30%$.

  • Figure 6

    (Color online) Comparisons of NMI, Qand Damong different algorithms on SYN-FIX dataset. The NMI values of sE-Autoencoder, ESPRA, GenLouvain, DECS and L-DMGAPSO are always 1 in (a). When $Z=5$, sE-Autoencoder, ESPRA, GenLouvain, DECS and L-DMGAPSO almost detect original dynamic communities and their curves are overlapped, while the accuracy of other algorithms decreases slightly. (a) NMI when $Z$ = 3; (b) $Q$ when $Z$ = 3; (c) $D$ when $Z$ = 3; protect łinebreak (d) NMI when $Z$ = 5; (e) $Q$ when $Z$ = 5; (f) $D$ when $Z$ = 5.

  • Figure 7

    (Color online) Comparisons of NMI, Qand Damong different algorithms on SYN-VAR dataset. Although the values of $Q$ and $D$ are the same in both sE-Autoencoder and GenLouvain, our method performs better than GenLouvain in (a). In (b) and (c), PisCES slightly precedes our method in the first time step, but sE-Autoencoder and GenLouvain are superior to comparable methods in residual time steps. These curves reveal that sE-Autoencoder has great capacity and stability. (a) NMI when $Z$ = 3; (b) $Q$ when $Z$ = 3; (c) $D$ when $Z$=3; (d) NMI when $Z$ = 5; (e) $Q$ when $Z$ = 5; (f) $D$ when $Z$ = 5.

  • Figure 8

    (Color online) Comparisons of NMI among different algorithms on SYN-EVENT dataset. sE-Autoencoder is more efficient than other algorithms in most time steps. Although sE-NMF approximates our method in (c), our method performs effectively in (a), (b) and (d). (a) Birth and death; (b) expansion and contraction; (c) intermittent communties; (d) merging and splitting.

  • Figure 9

    (Color online) Results of community detection from time step 1 to 9 with an interval of 2 on SYN-FIX dataset. The exchanging flows depict the changes of members of communities. The same color represents the same community at $t$ time step. The same shape denotes vertexes in the same community at $t-2$ time step. Every community has a tiny change, which conforms with the evolutionary clustering framework.

  • Figure 10

    (Color online) Running time of the sE-Autoencoder algorithm against (a) the number of vertexes in the networks, and (b) density of the networks.

  • Table 1  

    Table 1Datasets and their layers configuration$^{\rm~a)}$

    Datasets $|V|$ $|E|$ $T$ Layers configuration
    SYN 128 19950 20 128-64-32-16
    SYN-FIX 128 10240 10 128-64-32-16
    SYN-VAR 256 30562 10 256-128-64-32
    SYN-EVENT 2000 94486 10 2000-1024-512-256
    $G_{1}$ Cellphone Calls 400 9834 10 400-256-128-64
    $G_{2}$ Enron Mail 151 33124 12 151-128-64-32
    $G_{3}$ High School 327 188508 9 327-256-128-64
    $G_{4}$ Hospital 75 32424 9 75-64-32-16
    $G_{5}$ Hypertext 113 20818 5 113-64-32-16
    $G_{6}$ Java 376 40915 66 376-256-128-64
    $G_{7}$ Rados 167 82927 10 167-128-64-32

    a

  • Table 2  

    Table 2Performance on real-world datasets$^{\rm~a)}$

    2*Method ${\rm~NMI}_{\rm~avg}$
    cmidrule (lr)2-8 $G_{1}$ $G_{2}$ $G_{3}$ $G_{4}$ $G_{5}$ $G_{6}$ $G_{7}$
    sE-Autoencoder 0.6816 0.5786 0.7413 0.3426 0.4062 0.8762 0.3825
    sE-NMF 0.2851 0.4647 0.5469 0.1053 0.0124 0.5129 0.1373
    DYNMOGA 0.4367 0.5107 0.6983 0.1590 0.0681 0.7972 0.0959
    FacetNet 0.3753 0.5184 0.7241 0.1651 0.0312 0.6440 0.1629
    ESPRA 0.3445 0.4837 0.6998 0.2417 0.2099 0.7373 0.1550
    GenLouvain 0.2903 0.5781 0.7321 0.3673 0.1457 0.7607 0.2171
    PisCES 0.2025 0.2974 0.6443 0.0878 0.0149 0.6726 0.1626
    DECS 0.6059 0.5783 0.5285 0.3661 0.2245 0.7499 0.2602
    L-DMGAPSO 0.4778 0.4761 0.6752 0.1431 0.0709 0.7589 0.1228

    a

Copyright 2020  CHINA SCIENCE PUBLISHING & MEDIA LTD.  中国科技出版传媒股份有限公司  版权所有

京ICP备14028887号-23       京公网安备11010102003388号