logo

SCIENCE CHINA Information Sciences, Volume 59, Issue 7: 070103(2016) https://doi.org/10.1007/s11432-016-5578-9

Detecting protein complexes from DPINs by density based clustering with Pigeon-Inspired \\Optimization Algorithm

More info
  • ReceivedMar 11, 2016
  • AcceptedMay 10, 2016
  • PublishedJun 16, 2016

Abstract

Detecting protein complexes is crucial to understand principles of cellular organization. Plenty evidences have indicated that sub-graphs with high density in protein-protein interaction (PPI) network, especially dynamic PPI network (DPIN), usually correspond to protein complexes. As a well-known density-based clustering algorithm, Density-Based Spatial Clustering of Applications with Noise (DBSCAN) has been used in many areas due to its simplicity and the ability to detect clusters of different sizes and shapes. However, one of its limitations is that the performance of DBSCAN depends on two specified parameters $\varepsilon$ and MinPts, where $\varepsilon$ represents the maximum radius of a neighborhood from an observing point while MinPts means the minimum number of data points contained in such a neighborhood. In this article, we develop a new method named as P-DBSCAN to detect protein complexes in DPIN by using Pigeon-Inspired Optimization (PIO) Algorithm to optimize the parameters $\varepsilon$ and MinPts in DBSCAN. The experiments on DIP and MIPS datasets show that P-DBSCAN outperforms the state-of-the-art methods for protein complex detection in terms of several criteria such as precision, recall and f-measure.


Funded by

National Natural Science Foundation of China(61502290)

National Natural Science Foundation of China(61401263)

Industrial Research Project of Science and Technology in Shaanxi Province(2015GY016)

Fundamental Research Funds for the Central Universities Shaanxi Normal University(GK201501008)

Graduated Student Innovation Foundation of Shaanxi Normal University(2015CXS030)

China Postdoctoral Science Foundation(2015M582606)


Acknowledgment

Acknowledgments

This work was supported by National Natural Science Foundation of China (Grant Nos. 61502290, 61401263), Industrial Research Project of Science and Technology in Shaanxi Province (Grant No. 2015GY016), Fundamental Research Funds for the Central Universities, Shaanxi Normal University (Grant No. GK201501008), Graduated Student Innovation Foundation of Shaanxi Normal University (Grant No. 2015CXS030) and China Postdoctoral Science Foundation (Grant No. 2015M582606).


References

[1] Uetz P, Giot L, Cagney G, et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature, 2000, 403: 623-627 CrossRef Google Scholar

[2] Zhu H, Bilgin M, Bangham R, et al. Global analysis of protein activities using proteome chips. Science, 2001, 293: 2101-2105 CrossRef Google Scholar

[3] Xenarios I, Salwínski L, Duan X J, et al. DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucl Acids Res, 2002, 30: 303-305 CrossRef Google Scholar

[4] Güldener U, Münsterkötter M, Kastenmüller G, et al. CYGD: the comprehensive yeast genome database. Nucl Acids Res, 2005, 33: 364-368 Google Scholar

[5] Cherry J M. SGD: Saccharomyces Genome Database. Nucl Acids Res, 1998, 26: 73-79 CrossRef Google Scholar

[6] Montanez G, Cho Y R. Predicting false positives of protein-protein interaction data by semantic similarity measures. Curr Bioinform, 2013, 8: 339-346 CrossRef Google Scholar

[7] Li M, Zheng R, Zhang H, et al. Effective identification of essential proteins based on priori knowledge, network topology and gene expressions. Methods, 2014, 67: 325-333 CrossRef Google Scholar

[8] Watts D J, Strogatz S H. Collective dynamics of `small-world' networks. Nature, 1998, 393: 440-442 CrossRef Google Scholar

[9] Antonio S, Paul O M. Small-world network approach to identify key residues in protein-protein interaction. Proteins, 2005, 58: 672-682 Google Scholar

[10] Rives A W, Galitski T. Modular organization of cellular networks. Proc Nat Acad Sci USA, 2003, 100: 1128-1133 CrossRef Google Scholar

[11] Palla G, Dernyi I, Farkas I J, et al. Uncoverring the overlapping community structure of complex networks in nature and society. Nature, 2005, 435: 814-818 CrossRef Google Scholar

[12] Adamcsek B, Palla G, Farkas I, et al. CFinder: locating cliques and overlapping modules in biological networks. Bioinformatics, 2006, 22: 1021-1023 CrossRef Google Scholar

[13] Altaf-Ul-Amin M, Shinbo Y, Mihara K, et al. Development and implementation of an algorithm for detection of protein complexes in large interaction networks. BMC Bioinform, 2006, 7: 207-228 CrossRef Google Scholar

[14] Li M, Chen J, Wang J, et al. Modifying the DPClus algorithm for identifying protein complexes based on new topological structures. BMC Bioinform, 2008, 9: 398-413 CrossRef Google Scholar

[15] Peng J, Mona S. SPICi: a fast clustering algorithm for large biological networks. Bioinformatics, 2010, 26: 1105-1111 CrossRef Google Scholar

[16] Liu G, Wong L, Chua H N. Complex discovery from weighted PPI networks. Bioinformatics, 2009, 25: 1891-1897 CrossRef Google Scholar

[17] Leung H C M, Xiang Q, Yiu S M, et al. Predicting protein complexes from PPI data: a core-attachment approach. J Comput Biol, 2009, 16: 133-144 CrossRef Google Scholar

[18] {Wang J X, Liu B B}, Li M, et al. Identifying protein complexes from interaction networks based on clique percolation and distance restriction. BMC Genom, 2010, 11: S10-S24 Google Scholar

[19] Ester M, Kriegel H P, Sander J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, 1996. 226--231. Google Scholar

[20] Duan H B, Qiao P X. Pigeon-inspired optimization: a new swarm intelligence optimizer for air robot path planning. Int J Intell Comput Cybern, 2014, 7: 24-37 CrossRef Google Scholar

[21] Lei X J, Wu S, Ge L, et al. Clustering and overlapping modules detection in PPI network based on IBFO. Proteomics, 2013, 13: 278-290 CrossRef Google Scholar

[22] Lei X J, Tian J F, Ge L, et al. The clustering model and algorithm of PPI network based on propagating mechanism of artificial bee colony. Inform Sci, 2013, 247: 21-39 CrossRef Google Scholar

[23] Lv Q, Wu H J, Wu J Z, et al. A parallel ant colonies approach to de novo prediction of protein backbone in CASP8/9. Sci China Inf Sci, 2013, 56: 108103-39 Google Scholar

[24] Lei X J, Wang F, Wu F X, et al. Protein complex identification through Markov clustering with firefly algorithm on dynamic protein-protein interaction networks. Inf Sci, 2016, 329: 303-316 CrossRef Google Scholar

[25] Lei X J, Ying C, Wu F X, et al. Clustering PPI data by combining FA and SHC method. BMC Genom, 2015, 16: S3-S12 Google Scholar

[26] Zhao J, Zhou R. Pigeon-inspired optimization applied to constrained gliding trajectories. Nonlinear Dyn, 2015, 82: 1781-1795 CrossRef Google Scholar

[27] Li C, Duan H B. Target detection approach for UAVs via improved Pigeon-inspired Optimization and Edge Potential Function. Aerosp Sci Technol, 2014, 39: 352-360 CrossRef Google Scholar

[28] Sun H, Duan H B. PID controller design based on Prey-Predator Pigeon-Inspired Optimization algorithm. In: Proceedings of the International Conference on Mechatronics and Automation, Tianjin, 2014. 1416--1421. Google Scholar

[29] {Wang J X}, Li M, Chen J, et al. A fast hierarchical clustering algorithm for functional modules discovery in protein interaction networks. IEEE/ACM Trans Comput Biol Bioinform. 2011, 8: 607--620. Google Scholar

[30] van Dongen S. Graph clustering by flow simulation. Dissertation for Doctoral Degree. Center for Math and Computer Science (CWI), University of Utrecht. 2000. Google Scholar

[31] King A D, Przulj N, Jurisica I. Protein complex prediction via cost-based clustering. Bioinformatics, 2004, 20: 3013-3020 CrossRef Google Scholar

[32] Zhang A D. Protein interaction networks. New York: Cambridge University Press, 2009. Google Scholar

[33] Radicchi F, Castellano C, Cecconi F, et al. Defining and identifying communities in networks. Proc Nat Acad Sci USA, 2004, 101: 2658-2663 CrossRef Google Scholar

[34] Washburn M P, Wolters D, Yates J R. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat Biotechnol, 2001, 19: 242-247 CrossRef Google Scholar

[35] Cho Y R, Hwang H, Ramanathan M, et al. Semantic integration to identify overlapping functional modules in protein interaction networks. BMC Bioinform, 2007, 8: 265-277 CrossRef Google Scholar

[36] Wang J X, Peng X Q, Li M, et al. Construction and application of dynamic protein interaction network based on time course gene expression data. Proteomics, 2013, 13: 301-312 CrossRef Google Scholar

[37] Tu B P, Kudlicki A, Rowicka M, et al. Logic of the yeast metabolic cycle: temporal compartmentalization of cellular processes. Science, 2005, 310: 1152-1158 CrossRef Google Scholar

[38] Pu S, Wong J, Turner B, et al. Up-to-date catalogues of yeast protein complexes. Nucl Acids Res 2009, 37: 825--831. Google Scholar

[39] Mewes H W, Amid C, Arnold R, et al. MIPS: analysis and annotation of proteins from whole genomes. Nucl Acids Res, 2004, 32: 41-44 Google Scholar

[40] Tang Y, Li M, Wang J X. CytoNCA: a cytoscape plugin for centrality analysis and evaluation of protein interaction networks. Biosystems, 2015, 127: 67-72 CrossRef Google Scholar

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1