SCIENTIA SINICA Informationis, Volume 46, Issue 4: 461-475(2016) https://doi.org/10.1360/N112015-00109

Novel protein-function prediction using a directed hybrid graph

More info
  • ReceivedMay 20, 2015
  • AcceptedJun 11, 2015
  • PublishedApr 13, 2016


Proteins carry out various important activities in an organism. Accurately annotating their functions can boost the advance of life-science research and application. High-throughput techniques generate such a large volume of proteomic and genomic data that it is beyond the capability of low-throughput wet-lab based techniques. Thus, computational model-based large-scale protein-function prediction is one of the key tasks in the post-genomic era. Current machine-learning based methods often focus on predicting the functions of completely unlabeled proteins. These methods ignore the incomplete labels of the labeled proteins, and hence have low accuracy. In this paper, we design a directed Hybrid Graph (dHG) based on the gene ontology hierarchy and the protein-protein interaction network. Next, we use the dHG to predict novel functions by performing a random walk with restart on it. The proposed dHG can predict not only new functions for partially labeled proteins, but also new functions for completely unlabeled proteins. Experimental results on proteins of yeast and humans show that dHG, across various evaluation metrics, achieves better results than other related methods, and costs less time than these methods.

Funded by







[1] Radivojac P, Clark W T, Oron T R, et al. A large-scale evaluation of computational protein function prediction. Nat Methods, 2013, 10: 221-227 CrossRef Google Scholar

[2] Ashburner M, Ball C A, Blake J A, et al. Gene ontology: tool for the unification of biology. Nat Genet, 2000, 25: 25-29 CrossRef Google Scholar

[3] Legrain P, Aebersold R, Archakov A, et al. The human proteome project: current state and future direction. Mol Cell Proteomics, 2011, 10: 3309-3309. Google Scholar

[4] Pandey G, Kumar V, Steinbach M. Computational approach for protein function prediction. Technical Report TR06-028. Twin Cities: Department of Computer Science and Engineering, University of Minnesota, 2006. Google Scholar

[5] Lee D, Redfern O, Orengo C. Predicting protein function from sequence and structure. Nat Rev Mol Cell Biol, 2007, 8: 995-1005 CrossRef Google Scholar

[6] Leslie C S, Eskin E, Cohen A, et al. Mismatch string kernels for discriminative protein classification. Bioinformatics, 2004, 20: 467-476 CrossRef Google Scholar

[7] Spirin V, Mirny L A. Protein complexes and functional modules in molecular networks. Proc Natl Acad Sci, 2003, 100: 12123-12128 CrossRef Google Scholar

[8] Cao M, Pietras C, Feng K, et al. New directions for diffusion-based network prediction of protein function: incorporating pathways with confidence. Bioinformatics, 2014, 30: i219-i227 CrossRef Google Scholar

[9] Cesa-Bianchi N, Re M, Valentini G. Synergy of multi-label hierarchical ensemble, data fusion, and cost-sensitive methods for gene functional inference. Mach Learn, 2012, 88: 209-241 CrossRef Google Scholar

[10] Yu G X, Domeniconi C, Rangwala H, et al. Transductive multi-label ensemble classification for protein function prediction. In: Proceedings of 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2012. 1077-1085. Google Scholar

[11] Zhang M L, Zhou Z H. A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng, 2014, 26: 1819-1837 CrossRef Google Scholar

[12] Wang H, Huang H, Ding C. Function-function correlated multi-label protein function prediction over interaction networks. J Comput Biol, 2013, 20: 322-343 CrossRef Google Scholar

[13] Wu J S, Huang S J, Zhou Z H. Genome-wide protein function prediction through multi-instance multi-label learning. IEEE ACM Trans Comput Biol Bioinform, 2014, 11: 891-902 CrossRef Google Scholar

[14] Valentini G. True Path Rule hierarchical ensembles for genome-wide gene function prediction. IEEE ACM Trans Comput Biol Bioinform, 2011, 8: 832-547 CrossRef Google Scholar

[15] Valentini G. Hierarchical ensemble methods for protein function prediction. ISRN Bioinform, 2014: 901419. Google Scholar

[16] Dessimoz C, Skunca N, Thomas P D. CAFA and the open world of protein function predictions. Trends Genet, 2014, 29: 609-610. Google Scholar

[17] Sun Y Y, Zhang Y, Zhou Z H. Multi-label learning with weak label. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence. USA: AAAI Press, 2011. 293-298. Google Scholar

[18] Bucak S S, Jin R, Jain A K. Multi-label learning with incomplete class assignments. In: Proceedings of the 24th International Conference on Computer Vision and Pattern Recognition, Columbus, 2011. 2801-2808. Google Scholar

[19] Yu G X, Rangwala H, Domeniconi C, et al. Protein function prediction with incomplete annotations. IEEE ACM Trans Comput Biol Bioinform, 2014, 11: 579-591 CrossRef Google Scholar

[20] Yu G X, Domeniconi C, Rangwala H, et al. Protein function prediction using dependence maximization. In: Proceedings of the 24th European Conference on Machine Learning. Berlin: Springer, 2013. 574-589. Google Scholar

[21] Li Y H, Guo Z, Ma W C, et al. Predicting specific functions of protein with partial functions by protein-protein interactions network. Chinese Sci Bull, 2007, 52: 2367-2373 [李彦辉, 郭政, 马文财, 等. 通过蛋白质互作网络预测已知部分功能的蛋白质的精细功能. 科学通报, 2007, 52: 2367-2373]. Google Scholar

[22] Tao Y, Sam L, Li J R, et al. Information theory applied to the sparse gene ontology annotation network to predict novel gene function. Bioinformatics, 2007, 23: i529-i538 CrossRef Google Scholar

[23] Dong B, Khatri P, Done A, et al. Predicting novel human gene ontology annotations using semantic analysis. IEEE ACM Trans Comput Biol Bioinform, 2010, 7: 91-99 CrossRef Google Scholar

[24] Yu G X, Zhu H L, Domeniconi C. Predicting protein functions using incomplete hierarchical labels. BMC Bioinformatics, 2015, 16: 1-12 CrossRef Google Scholar

[25] Sharan R, Ulitsky I, Shamir R. Network-based prediction of protein function. Mol Syst Biol, 2007, 3: 1-15. Google Scholar

[26] Zhang X F, Dai D Q. A framework for incorporating functional interrelationships into protein function prediction algorithms. IEEE ACM Trans Comput Biol Bioinform, 2012, 9: 740-753 CrossRef Google Scholar

[27] Jiang J Q. Learning protein functions from bi-relational graph of proteins and function annotations. In: Proceedings of the 11th International Conference on Algorithms in Bioinformatics. Berlin: Springer, 2011. 128-138. Google Scholar

[28] Tong H H, Faloutsos C, Pan J Y. Random walk with restart: fast solutions and applications. Knowl Informa Syst, 2008, 14: 327-346 CrossRef Google Scholar

[29] Teng Z X, Guo M Z, Liu X Y, et al. Measuring gene functional similarity based on group-wise comparison of GO terms. Bioinformatics, 2013, 29: 1424-1432 CrossRef Google Scholar

[30] Krogan N J, Cagney G, Yu H Y, et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature, 2006, 440: 637-643 CrossRef Google Scholar

[31] Myers C, Barrett D, Hibbs M, et al. Finding function: evaluation methods for functional genomic data. BMC Genomics, 2006, 7: 187-643 CrossRef Google Scholar

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有