SCIENTIA SINICA Informationis, Volume 49, Issue 5: 613-629(2019) https://doi.org/10.1360/N112018-00149

## Optimal community detection method based on average mutual information

• AcceptedSep 6, 2018
• PublishedMay 15, 2019
Share
Rating

### Abstract

This study proposes an optimal community detection method based on average mutual information (AMI). We calculate the optimal community partition using AMI. This method is applied to the non-overlapping community detection algorithm (GN) and the overlapping community detection algorithm (COPRA), and the AMI-GN algorithm and the AMI-COPRA algorithm are generated respectively. Compared with the performance of the original GN, FN, and IE algorithm, experimental results show that AMI-GN algorithm improves the quality of community detection. Furthermore, compared with the performance of the original COPRA algorithm and the LPPB algorithm, experimental results show that the AMI-COPRA algorithm improves the stability of the original COPRA algorithm, reduces the average number of iterations, and accelerates the convergence speed of the algorithm. Moreover, compared with the LPPB algorithm, the AMI-COPRA algorithm reveals that the quality of the community shows a little difference, but is more stable than the LPPB algorithm. Our study shows that the AMI-based methods can improve the performance of typical non-overlapping community discovery algorithms and overlapping community discovery algorithms.

### References

[1] Qiao S J, Han N, Zhang K F, et al. Algorithm for detecting overlapping communities from complex network big data. J Softw, 2017, 28: 631--647. Google Scholar

[2] Bai L, Cheng X Q, Liang J Y. Fast graph clustering with a new description model for community detection. Inf Sci, 2017, 388-389: 37-47 CrossRef Google Scholar

[3] Atay Y, Koc I, Babaoglu I. Community detection from biological and social networks: A comparative analysis of metaheuristic algorithms. Appl Soft Computing, 2017, 50: 194-211 CrossRef Google Scholar

[4] Liu B Y, Wang C R, Wang C, et al. Microblog community discovery algorithm based on dynamic topic model with multidimensional data fusion. J Softw, 2017, 28: 246--261. Google Scholar

[5] Yang G, Zheng W P, Wang W J, et al. Community detection algorithm based on weighted dense subgraphs. J Softw, 2017, 28: 3103--3114. Google Scholar

[6] Huang F L, Yu G, Zhang J L, et al. Mining topic sentiment in micro-blogging based on micro-blogger social relation. J Softw, 2017, 28: 694--707. Google Scholar

[7] Wilson S J, Wilkins A D, Lin C H, et al. Discovery of functional and disease pathway by community detection protein-protein interaction networks. Pac Symp Biocomput, 2016, 22: 336--347. Google Scholar

[8] Kernighan B W, Lin S. An Efficient Heuristic Procedure for Partitioning Graphs. Bell Syst Technical J, 1970, 49: 291-307 CrossRef Google Scholar

[9] Barnes E R. An algorithm for partitioning the nodes of a graph. Siam J Alg Disc Math, 1982, 3: 303-304 DOI: 10.1109/CDC.1981.269534. Google Scholar

[10] Girvan M, Newman M E J. Community structure in social and biological networks. In: Proceedings of the National Academy of Sciences of the United States of America, 2002. 99: 7821--7826. Google Scholar

[11] Newman M E J. Fast algorithm for detecting community structure in networks. Phys Rev E, 2004, 69: 066133 CrossRef PubMed ADS Google Scholar

[12] Deng X, Wang B, Wu B, et al. Research and evaluation on modularity modeling in community detecting of complex network based on information entropy. In: Proceedings of the 3rd IEEE International Conference on Secure Software Integration and Reliability Improvement, 2009. 297--302. Google Scholar

[13] Palla G, Derényi I, Farkas I. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 2005, 435: 814-818 CrossRef PubMed ADS Google Scholar

[14] Lancichinetti A, Fortunato S, Kertész J. Detecting the overlapping and hierarchical community structure in complex networks. New J Phys, 2009, 11: 033015 CrossRef ADS Google Scholar

[15] Raghavan U N, Albert R, Kumara S. Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E, 2007, 76: 036106 CrossRef PubMed ADS arXiv Google Scholar

[16] Gregory S. Finding overlapping communities in networks by label propagation. New J Phys, 2010, 12: 103018 CrossRef ADS arXiv Google Scholar

[17] Xie J, Szymanski B K. Towards linear time overlapping community detection in social networks. In: Proceedings of th 16th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Kuala Lumpur, 2012. 25--36. Google Scholar

[18] Chen J, Wan Y. Research on label propagation algorithm based on modularity maximization in the social network. J Commun, 2017, 38: 25--33. Google Scholar

[19] Li C Y, Tang Y, Lin H. Parallel overlapping community detection algorithm in complex networks based on label propagation. Sci Sin inf Sci, 2016, 46: 212-227 CrossRef Google Scholar

[20] Wu Z H, Lin Y F, Gregory S. Balanced Multi-Label Propagation for Overlapping Community Detection in Social Networks. J Comput Sci Technol, 2012, 27: 468-479 CrossRef Google Scholar

[21] Liu S H, Zhu F X, Gan L. A label-propagation-probability-based algorithm for overlapping community detection. Chin J Comput, 2016, 39: 717--729. Google Scholar

[22] Zhang C L, Wang Y L, Wu Y J, et al. Multi-label propagation algorithm for overlapping community discovery based on information entropy and local correlation. J Chin Comput Syst, 2016, 37: 1645--1650. Google Scholar

[23] Xin Y, Yang J, Xie Z Q. A semantic overlapping community detection algorithm based on semantic data elds. Sci Sin inf Sci, 2015, 45: 918-933 CrossRef Google Scholar

[24] Nagarajan N, Sen P, Chaoji V. Community detection in content-sharing social networks. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Niagara Falls, 2013. 82--89. Google Scholar

[25] Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. J Mach Learn Res, 2003, 3: 993--1022. Google Scholar

[26] Duan L, Zhu X Y. Microblog community detection method based on community spatio-temporal topic model. J Univ Electron Sci Tech China, 2014, 43: 465--468. Google Scholar

[27] Yang J, Xin Y, Xie Z Q. Semantics social network community detection algorithm based on topic comprehensive factor analysis. J Comput Res Dev, 2014, 51: 559--569. Google Scholar

[28] Xin Y, Yang J, Xie Z Q. An overlapping semantic community structure detecting algorithm by label propagation. Acta Autom Sin, 2014, 40: 2262--2275. Google Scholar

[29] Xin Y, Yang J, Xie Z Q. An overlapping community structure detecting algorithm in semantic social network based on block field. Acta Autom Sin, 2015, 41: 362--375. Google Scholar

[30] Xin Y, Yang J, Xie Z Q. A semantic overlapping community detecting algorithm in social networks based on random walk. J Comput Res Dev, 2015, 52: 499--511. Google Scholar

[31] Xin Y, Yang J, Xie Z Q. Link-Block method for the semantic overlapping community detection. J Softw, 2016, 27: 363--380. Google Scholar

[32] Perozzi B, Al-Rfou R, Skiena S. Deepwalk: online learning of social representations, In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2014. 701--710. Google Scholar

[33] Grover A, Leskovec J. Node2vec: scalable feature learning for networks, In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2016. 855--864. Google Scholar

[34] Han Z M, Tan X S, Chen Y, et al. NCSS: An effective and efficient complex network community detection algorithm. Sci Sin Inform, 2016, 46: 431--444. Google Scholar

[35] Newman M E J. Finding community structure in networks using the eigenvectors of matrices. Phys Rev E, 2006, 74: 036104 CrossRef PubMed ADS Google Scholar

[36] Palla G, Derényi I, Farkas I. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 2005, 435: 814-818 CrossRef PubMed ADS Google Scholar

[37] Gregory S. An algorithm to find overlapping community structure in networks. In: Proceedings of European Conference on Principles and Practice of Knowledge Discovery in Databases. Berlin: Springer, 2007. 91--102. Google Scholar

[38] Boguná M, Pastor-Satorras R, Díaz-Guilera A. Models of social networks based on social distance attachment. Phys Rev E, 2004, 70: 056122 CrossRef PubMed ADS Google Scholar

[39] Nicosia V, Mangioni G, Carchiolo V. Extending the definition of modularity to directed graphs with overlapping communities. J Stat Mech, 2009, 2009: P03024 CrossRef ADS arXiv Google Scholar

• Figure 1

Two cases of the community partition. (a) No community splits; (b) community splits into two communities

• Figure 2

Label initialization

• Figure 3

Label update phase

• Figure 4

(Color online) Schematic of different nodes. (a) Karate dataset; (b) Dolphin dataset

• Figure 5

(Color online) Comparison of AMI-COPRA with other two algorithms on average $Q_{ov}$ value

• Figure 6

(Color online) Comparison result of AMI-COPRA with COPRA algorithm. (a) Standard deviation $Q_{ov}$; (b) average number of iterations

• Table 1   Experiment dataset
 Dataset Node number Edge number Community number Karate 34 78 2 Dolphin 62 159 2 Polbooks 105 441 3
•

Algorithm 1 AMI-GN

Require:The network represented by the form $G(E,~V~)$, and $E$ represents the total number of nodes in the network, $V$ represents the total number of edges in the network;

Output:Optimal community structure Max_CS in the network;

Max_$I_{P}$=INT_MIN;

for each $i$ in $E$

Calculate all edge betweennesses and save in array EdgeBetweeness;

while No community splits do

Delete edge with the highest EdgeBetweeness value;

end while

Calculate $I_{P};$

if $I_{P}>$Max_$I_{P}$ then

Max_$I_{P}$=$I_{P}$;

Record the before and after community structure of community partition corresponding to Max_$I_{P}$;

end if

end for

Calculate the information entropy of the before and after community structure of community partition corresponding to Max_$I_{P}$ and save in Entropy_Before and Entropy_After separately;

if Entropy_Before$>$Entropy_After then

Save the community structure of after this partition into Max_CS;

else

Save the community structure of before this partition into Max_CS;

end if

return Max_CS;

• Table 2   Comparison of AMI-GN with the other three classical algorithms
 Dataset Index AMI-GN GN FN IE 2*Karate CN 2 5 2 2 NMI 0.84 0.58 0.84 0.84 2*Dolphin CN 2 5 3 2 NMI 0.89 0.55 0.65 0.89 2*Polbooks CN 3 5 3 2 NMI 0.58 0.56 0.57 0.55
•

Algorithm 2 Label update algorithm based on information entropy

Require:

Output:

minEntropy=Float.MAX_VALUE;

for each label in labelMap

label_c=label.getLabelC();

if updatedNodeMap.containsKey(label_c) then

updatedNodeMap.put(label_c, Float.valueOf(1)+(float)updatedNodeMap.get(key));

else

updatedNodeMap.put(label_c, Float.valueOf(1));

end if

for each label_c in updatedNodeMap

currentNodeNumber = updatedNodeMap.get(label_c);

$x$ = currentNodeNumber/totalNodeNumber;

currentEntropy$+=$ (float)$(x\cdot(-1)\cdot$Math.log($x$)/Math.log(2));

end for

if currentEntropy $<$ minEntropy then

minEntropy = currentEntropy;

Record corresponding tag pair and Assign to updateLabelPair;

end if

end for

return updateLabelPair.

• Table 3   Comparison of AMI-GN algorithm with currently approved partition result
 Dataset Total number of different nodes Different node index Karate 1 3 Dolphin 1 40 Polbooks 17 $1,5,7,8,19,29,47,49,53,59,65,66,69,77,78,104,105$
• Table 4   Link analysis of different node in Polbooks dataset
 Node Links in community (R) Links between community (R) Links in community (A) Links between community (A) 1 2 4, 0 6 0, 0 5 3 3, 2 5 3, 0 7 4 7, 0 11 0, 0 8 1 4, 3 4 3, 1 19 1 2, 0 3 0, 0 29 1 2, 0 2 1, 0 47 0 3, 1 3 1, 0 49 0 4, 0 4 0, 0 53 3 1, 1 3 2, 0 59 5 5, 3 6 4, 3 65 5 2, 2 6 3, 0 66 4 2, 1 5 2, 0 69 3 1, 0 3 1, 0 77 0 3, 10 11 2, 0 78 2 5, 0 5 1, 1 104 1 1, 0 2 0, 0 105 2 1, 0 2 1, 0
• Table 5   Experiment dataset
 Dataset Reference Node number Edge number Karate [34] 34 78 Dolphin [34] 62 159 Football [34] 115 613 Netscience [35] 379 914 Protein-protein [13] 2445 6265 Blogs [36] 3982 6803 PGP [37] 10680 24316 Blogs2 [36] 30557 82301 Cond-mat-2003 [16] 27519 116181
• Table 6   Comparison of AMI-COPRA with other two algorithms
 Dataset Algorithm Average $Q_{ov}$ Standard deviation $Q_{ov}$ Average number of iterations 3*Karate COPRA 0.459 0.244 11.3 LPPB 0.733 0 – AMI-COPRA 0.720 0 7 3*Dolphin COPRA 0.665 0.069 13.1 LPPB 0.702 0.018 – AMI-COPRA 0.728 0 6 3*Football COPRA 0.691 0.030 7.6 LPPB – – – AMI-COPRA 0.707 0.020 5.8 3*Netscience COPRA 0.812 0.026 24.3 LPPB – – – AMI-COPRA 0.841 0.006 15 3*Protein-protein COPRA 0.552 0.066 56.2 LPPB – – – AMI-COPRA 0.710 0.001 73.4 3*Blogs COPRA 0.748 0.007 166.8 LPPB 0.756 0.002 – AMI-COPRA 0.765 0 151 3*PGP COPRA 0.788 0.017 249.6 LPPB 0.796 0.009 – AMI-COPRA 0.793 0 213 3*Blogs2 COPRA 0.604 0.032 172.5 LPPB – – – AMI-COPRA 0.608 0.005 53 3*Cond-mat-2003 COPRA 0.675 0.057 331.4 LPPB – – – AMI-COPRA 0.687 0.005 61.7

Citations

• #### 0

Altmetric

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有