logo

SCIENTIA SINICA Informationis, Volume 49, Issue 5: 613-629(2019) https://doi.org/10.1360/N112018-00149

Optimal community detection method based on average mutual information

More info
  • ReceivedJun 8, 2018
  • AcceptedSep 6, 2018
  • PublishedMay 15, 2019

Abstract

This study proposes an optimal community detection method based on average mutual information (AMI). We calculate the optimal community partition using AMI. This method is applied to the non-overlapping community detection algorithm (GN) and the overlapping community detection algorithm (COPRA), and the AMI-GN algorithm and the AMI-COPRA algorithm are generated respectively. Compared with the performance of the original GN, FN, and IE algorithm, experimental results show that AMI-GN algorithm improves the quality of community detection. Furthermore, compared with the performance of the original COPRA algorithm and the LPPB algorithm, experimental results show that the AMI-COPRA algorithm improves the stability of the original COPRA algorithm, reduces the average number of iterations, and accelerates the convergence speed of the algorithm. Moreover, compared with the LPPB algorithm, the AMI-COPRA algorithm reveals that the quality of the community shows a little difference, but is more stable than the LPPB algorithm. Our study shows that the AMI-based methods can improve the performance of typical non-overlapping community discovery algorithms and overlapping community discovery algorithms.


Funded by

国家自然科学基金(61602186)

广东省科技计划项目(2015B010103002,2016B050502001)


References

[1] Qiao S J, Han N, Zhang K F, et al. Algorithm for detecting overlapping communities from complex network big data. J Softw, 2017, 28: 631--647. Google Scholar

[2] Bai L, Cheng X Q, Liang J Y. Fast graph clustering with a new description model for community detection. Inf Sci, 2017, 388-389: 37-47 CrossRef Google Scholar

[3] Atay Y, Koc I, Babaoglu I. Community detection from biological and social networks: A comparative analysis of metaheuristic algorithms. Appl Soft Computing, 2017, 50: 194-211 CrossRef Google Scholar

[4] Liu B Y, Wang C R, Wang C, et al. Microblog community discovery algorithm based on dynamic topic model with multidimensional data fusion. J Softw, 2017, 28: 246--261. Google Scholar

[5] Yang G, Zheng W P, Wang W J, et al. Community detection algorithm based on weighted dense subgraphs. J Softw, 2017, 28: 3103--3114. Google Scholar

[6] Huang F L, Yu G, Zhang J L, et al. Mining topic sentiment in micro-blogging based on micro-blogger social relation. J Softw, 2017, 28: 694--707. Google Scholar

[7] Wilson S J, Wilkins A D, Lin C H, et al. Discovery of functional and disease pathway by community detection protein-protein interaction networks. Pac Symp Biocomput, 2016, 22: 336--347. Google Scholar

[8] Kernighan B W, Lin S. An Efficient Heuristic Procedure for Partitioning Graphs. Bell Syst Technical J, 1970, 49: 291-307 CrossRef Google Scholar

[9] Barnes E R. An algorithm for partitioning the nodes of a graph. Siam J Alg Disc Math, 1982, 3: 303-304 DOI: 10.1109/CDC.1981.269534. Google Scholar

[10] Girvan M, Newman M E J. Community structure in social and biological networks. In: Proceedings of the National Academy of Sciences of the United States of America, 2002. 99: 7821--7826. Google Scholar

[11] Newman M E J. Fast algorithm for detecting community structure in networks. Phys Rev E, 2004, 69: 066133 CrossRef PubMed ADS Google Scholar

[12] Deng X, Wang B, Wu B, et al. Research and evaluation on modularity modeling in community detecting of complex network based on information entropy. In: Proceedings of the 3rd IEEE International Conference on Secure Software Integration and Reliability Improvement, 2009. 297--302. Google Scholar

[13] Palla G, Derényi I, Farkas I. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 2005, 435: 814-818 CrossRef PubMed ADS Google Scholar

[14] Lancichinetti A, Fortunato S, Kertész J. Detecting the overlapping and hierarchical community structure in complex networks. New J Phys, 2009, 11: 033015 CrossRef ADS Google Scholar

[15] Raghavan U N, Albert R, Kumara S. Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E, 2007, 76: 036106 CrossRef PubMed ADS arXiv Google Scholar

[16] Gregory S. Finding overlapping communities in networks by label propagation. New J Phys, 2010, 12: 103018 CrossRef ADS arXiv Google Scholar

[17] Xie J, Szymanski B K. Towards linear time overlapping community detection in social networks. In: Proceedings of th 16th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Kuala Lumpur, 2012. 25--36. Google Scholar

[18] Chen J, Wan Y. Research on label propagation algorithm based on modularity maximization in the social network. J Commun, 2017, 38: 25--33. Google Scholar

[19] Li C Y, Tang Y, Lin H. Parallel overlapping community detection algorithm in complex networks based on label propagation. Sci Sin inf Sci, 2016, 46: 212-227 CrossRef Google Scholar

[20] Wu Z H, Lin Y F, Gregory S. Balanced Multi-Label Propagation for Overlapping Community Detection in Social Networks. J Comput Sci Technol, 2012, 27: 468-479 CrossRef Google Scholar

[21] Liu S H, Zhu F X, Gan L. A label-propagation-probability-based algorithm for overlapping community detection. Chin J Comput, 2016, 39: 717--729. Google Scholar

[22] Zhang C L, Wang Y L, Wu Y J, et al. Multi-label propagation algorithm for overlapping community discovery based on information entropy and local correlation. J Chin Comput Syst, 2016, 37: 1645--1650. Google Scholar

[23] Xin Y, Yang J, Xie Z Q. A semantic overlapping community detection algorithm based on semantic data elds. Sci Sin inf Sci, 2015, 45: 918-933 CrossRef Google Scholar

[24] Nagarajan N, Sen P, Chaoji V. Community detection in content-sharing social networks. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Niagara Falls, 2013. 82--89. Google Scholar

[25] Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. J Mach Learn Res, 2003, 3: 993--1022. Google Scholar

[26] Duan L, Zhu X Y. Microblog community detection method based on community spatio-temporal topic model. J Univ Electron Sci Tech China, 2014, 43: 465--468. Google Scholar

[27] Yang J, Xin Y, Xie Z Q. Semantics social network community detection algorithm based on topic comprehensive factor analysis. J Comput Res Dev, 2014, 51: 559--569. Google Scholar

[28] Xin Y, Yang J, Xie Z Q. An overlapping semantic community structure detecting algorithm by label propagation. Acta Autom Sin, 2014, 40: 2262--2275. Google Scholar

[29] Xin Y, Yang J, Xie Z Q. An overlapping community structure detecting algorithm in semantic social network based on block field. Acta Autom Sin, 2015, 41: 362--375. Google Scholar

[30] Xin Y, Yang J, Xie Z Q. A semantic overlapping community detecting algorithm in social networks based on random walk. J Comput Res Dev, 2015, 52: 499--511. Google Scholar

[31] Xin Y, Yang J, Xie Z Q. Link-Block method for the semantic overlapping community detection. J Softw, 2016, 27: 363--380. Google Scholar

[32] Perozzi B, Al-Rfou R, Skiena S. Deepwalk: online learning of social representations, In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2014. 701--710. Google Scholar

[33] Grover A, Leskovec J. Node2vec: scalable feature learning for networks, In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2016. 855--864. Google Scholar

[34] Han Z M, Tan X S, Chen Y, et al. NCSS: An effective and efficient complex network community detection algorithm. Sci Sin Inform, 2016, 46: 431--444. Google Scholar

[35] Newman M E J. Finding community structure in networks using the eigenvectors of matrices. Phys Rev E, 2006, 74: 036104 CrossRef PubMed ADS Google Scholar

[36] Palla G, Derényi I, Farkas I. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 2005, 435: 814-818 CrossRef PubMed ADS Google Scholar

[37] Gregory S. An algorithm to find overlapping community structure in networks. In: Proceedings of European Conference on Principles and Practice of Knowledge Discovery in Databases. Berlin: Springer, 2007. 91--102. Google Scholar

[38] Boguná M, Pastor-Satorras R, Díaz-Guilera A. Models of social networks based on social distance attachment. Phys Rev E, 2004, 70: 056122 CrossRef PubMed ADS Google Scholar

[39] Nicosia V, Mangioni G, Carchiolo V. Extending the definition of modularity to directed graphs with overlapping communities. J Stat Mech, 2009, 2009: P03024 CrossRef ADS arXiv Google Scholar

  • Figure 1

    Two cases of the community partition. (a) No community splits; (b) community splits into two communities

  • Figure 2

    Label initialization

  • Figure 3

    Label update phase

  • Figure 4

    (Color online) Schematic of different nodes. (a) Karate dataset; (b) Dolphin dataset

  • Figure 5

    (Color online) Comparison of AMI-COPRA with other two algorithms on average $Q_{ov}$ value

  • Figure 6

    (Color online) Comparison result of AMI-COPRA with COPRA algorithm. (a) Standard deviation $Q_{ov}$; (b) average number of iterations

  • Table 1   Experiment dataset
    Dataset Node number Edge number Community number
    Karate 34 78 2
    Dolphin 62 159 2
    Polbooks 105 441 3
  •   

    Algorithm 1 AMI-GN

    Require:The network represented by the form $G(E,~V~)$, and $E$ represents the total number of nodes in the network, $V$ represents the total number of edges in the network;

    Output:Optimal community structure Max_CS in the network;

    Max_$I_{P}$=INT_MIN;

    for each $i$ in $E$

    Calculate all edge betweennesses and save in array EdgeBetweeness;

    while No community splits do

    Delete edge with the highest EdgeBetweeness value;

    end while

    Calculate $I_{P};$

    if $I_{P}>$Max_$I_{P}$ then

    Max_$I_{P}$=$I_{P}$;

    Record the before and after community structure of community partition corresponding to Max_$I_{P}$;

    end if

    end for

    Calculate the information entropy of the before and after community structure of community partition corresponding to Max_$I_{P}$ and save in Entropy_Before and Entropy_After separately;

    if Entropy_Before$>$Entropy_After then

    Save the community structure of after this partition into Max_CS;

    else

    Save the community structure of before this partition into Max_CS;

    end if

    return Max_CS;

  • Table 2   Comparison of AMI-GN with the other three classical algorithms
    Dataset Index AMI-GN GN FN IE
    2*Karate CN 2 5 2 2
    NMI 0.84 0.58 0.84 0.84
    2*Dolphin CN 2 5 3 2
    NMI 0.89 0.55 0.65 0.89
    2*Polbooks CN 3 5 3 2
    NMI 0.58 0.56 0.57 0.55
  •   

    Algorithm 2 Label update algorithm based on information entropy

    Require:

    Output:

    minEntropy=Float.MAX_VALUE;

    for each label in labelMap

    label_c=label.getLabelC();

    if updatedNodeMap.containsKey(label_c) then

    updatedNodeMap.put(label_c, Float.valueOf(1)+(float)updatedNodeMap.get(key));

    else

    updatedNodeMap.put(label_c, Float.valueOf(1));

    end if

    for each label_c in updatedNodeMap

    currentNodeNumber = updatedNodeMap.get(label_c);

    $x$ = currentNodeNumber/totalNodeNumber;

    currentEntropy$+=$ (float)$(x\cdot(-1)\cdot$Math.log($x$)/Math.log(2));

    end for

    if currentEntropy $<$ minEntropy then

    minEntropy = currentEntropy;

    Record corresponding tag pair and Assign to updateLabelPair;

    end if

    end for

    return updateLabelPair.

  • Table 3   Comparison of AMI-GN algorithm with currently approved partition result
    Dataset Total number of different nodes Different node index
    Karate 1 3
    Dolphin 1 40
    Polbooks 17 $1,5,7,8,19,29,47,49,53,59,65,66,69,77,78,104,105$
  • Table 4   Link analysis of different node in Polbooks dataset
    Node Links in community (R) Links between community (R) Links in community (A) Links between community (A)
    1 2 4, 0 6 0, 0
    5 3 3, 2 5 3, 0
    7 4 7, 0 11 0, 0
    8 1 4, 3 4 3, 1
    19 1 2, 0 3 0, 0
    29 1 2, 0 2 1, 0
    47 0 3, 1 3 1, 0
    49 0 4, 0 4 0, 0
    53 3 1, 1 3 2, 0
    59 5 5, 3 6 4, 3
    65 5 2, 2 6 3, 0
    66 4 2, 1 5 2, 0
    69 3 1, 0 3 1, 0
    77 0 3, 10 11 2, 0
    78 2 5, 0 5 1, 1
    104 1 1, 0 2 0, 0
    105 2 1, 0 2 1, 0
  • Table 5   Experiment dataset
    Dataset Reference Node number Edge number
    Karate [34] 34 78
    Dolphin [34] 62 159
    Football [34] 115 613
    Netscience [35] 379 914
    Protein-protein [13] 2445 6265
    Blogs [36] 3982 6803
    PGP [37] 10680 24316
    Blogs2 [36] 30557 82301
    Cond-mat-2003 [16] 27519 116181
  • Table 6   Comparison of AMI-COPRA with other two algorithms
    Dataset Algorithm Average $Q_{ov}$ Standard deviation $Q_{ov}$ Average number of iterations
    3*Karate COPRA 0.459 0.244 11.3
    LPPB 0.733 0
    AMI-COPRA 0.720 0 7
    3*Dolphin COPRA 0.665 0.069 13.1
    LPPB 0.702 0.018
    AMI-COPRA 0.728 0 6
    3*Football COPRA 0.691 0.030 7.6
    LPPB
    AMI-COPRA 0.707 0.020 5.8
    3*Netscience COPRA 0.812 0.026 24.3
    LPPB
    AMI-COPRA 0.841 0.006 15
    3*Protein-protein COPRA 0.552 0.066 56.2
    LPPB
    AMI-COPRA 0.710 0.001 73.4
    3*Blogs COPRA 0.748 0.007 166.8
    LPPB 0.756 0.002
    AMI-COPRA 0.765 0 151
    3*PGP COPRA 0.788 0.017 249.6
    LPPB 0.796 0.009
    AMI-COPRA 0.793 0 213
    3*Blogs2 COPRA 0.604 0.032 172.5
    LPPB
    AMI-COPRA 0.608 0.005 53
    3*Cond-mat-2003 COPRA 0.675 0.057 331.4
    LPPB
    AMI-COPRA 0.687 0.005 61.7

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1