logo

SCIENCE CHINA Life Sciences, Volume 62, Issue 4: 594-608(2019) https://doi.org/10.1007/s11427-019-9483-6

Topological evolution of coexpression networks by new gene integration maintains the hierarchical and modular structures in human ancestors

More info
  • ReceivedOct 15, 2018
  • AcceptedNov 5, 2018
  • PublishedMar 21, 2019

Abstract

We analyze the global structure and evolution of human gene coexpression networks driven by new gene integration. When the Pearson correlation coefficient is greater than or equal to 0.5, we find that the coexpression network consists of 334 small components and one “giant” connected subnet comprising of 6317 interacting genes. This network shows the properties of power-law degree distribution and small-world. The average clustering coefficient of younger genes is larger than that of the elderly genes (0.6685 vs. 0.5762). Particularly, we find that the younger genes with a larger degree also show a property of hierarchical architecture. The younger genes play an important role in the overall pivotability of the network and this network contains few redundant duplicate genes. Moreover, we find that gene duplication and orphan genes are two dominant evolutionary forces in shaping this network. Both the duplicate genes and orphan genes develop new links through a “rich-gets-richer” mechanism. With the gradual integration of new genes into the ancestral network, most of the topological structure features of the network would gradually increase. However, the exponent of degree distribution and modularity coefficient of the whole network do not change significantly, which implies that the evolution of coexpression networks maintains the hierarchical and modular structures in human ancestors.


Funded by

grants from the National Natural Science Foundation of China(11571272,11631012)

the National Science and Technology Major Project of China(2012ZX10002001)

the Natural Science Foundation of Shaanxi Province(2015JQ1011)

and the China Postdoctoral Science Foundation(2014M560755)


Acknowledgment

We thank Profs. Yicang Zhou and Yanni Xiao for their valuable discussion. This work was supported by grants from the National Natural Science Foundation of China (11571272, 11201368 and 11631012), the National Science and Technology Major Project of China (2012ZX10002001), the Natural Science Foundation of Shaanxi Province (2015JQ1011) and the China Postdoctoral Science Foundation (2014M560755).


Interest statement

The author(s) declare that they have no conflict of interest.


Supplement

SUPPORTING INFORMATION

File S1ƒPearson correlation coefficients (PCC), gene age (Branch 0–13) and human gene coexpression network (Obayashi et al., 2012).

File S2ƒHuman gene age dataset, “Highly Reliable Age Inference” (Zhang et al., 2010, 2011).

File S3ƒHuman gene origination mechanism dataset, “Highly Reliable Mechanism Inference” (Zhang et al., 2010, 2011).

The supporting information is available online at http://life.scichina.com and https://link.springer.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.


References

[1] Albert R., Jeong H., Barabási A.L.. Error and attack tolerance of complex networks. Nature, 2000, 406: 378-382 CrossRef PubMed Google Scholar

[2] Barkai N., Leibler S.. Robustness in simple biochemical networks. Nature, 1997, 387: 913-917 CrossRef PubMed Google Scholar

[3] Barabási, A.L., and Albert, R. (1999). Emergence of scaling in random networks. Science 286, 509–512. Google Scholar

[4] Barabási, A.L., and Oltvai, Z.N. (2004). Network biology: understanding the cell’s functional organization. Nat Rev Genet 5, 101–113. Google Scholar

[5] Chung F., Lu L., Dewey T.G., Galas D.J.. Duplication models for biological networks. J Comput Biol, 2003, 10: 677-687 CrossRef PubMed Google Scholar

[6] Cohen R., Havlin S.. Scale-free networks are ultrasmall. Phys Rev Lett, 2003, 90: 058701 CrossRef PubMed ADS Google Scholar

[7] Clauset A., Newman M.E.J., Moore C.. Finding community structure in very large networks. Phys Rev E, 2004, 70: 066111 CrossRef PubMed ADS Google Scholar

[8] Chung, W.Y., Albert, R., Albert, I., Nekrutenko, A., and Makova, K.D. (2006). Rapid and asymmetric divergence of duplicate genes in the human gene coexpression network. BMC Bioinformatics 7, 46, 1–14. Google Scholar

[9] Crombach A., Hogeweg P.. Evolution of evolvability in gene regulatory networks. PLoS Comput Biol, 2008, 4: e1000112 CrossRef PubMed ADS Google Scholar

[10] Chen S., Krinsky B.H., Long M.. New genes as drivers of phenotypic evolution. Nat Rev Genet, 2013, 14: 645-660 CrossRef PubMed Google Scholar

[11] Girvan M., Newman M.E.J.. Community structure in social and biological networks. Proc Natl Acad Sci USA, 2002, 99: 7821-7826 CrossRef PubMed ADS Google Scholar

[12] Horvath S., Dong J.. Geometric interpretation of gene coexpression network analysis. PLoS Comput Biol, 2008, 4: e1000117 CrossRef PubMed ADS Google Scholar

[13] Hedges S.B., Marin J., Suleski M., Paymer M., Kumar S.. Tree of life reveals clock-like speciation and diversification. Mol Biol Evol, 2015, 32: 835-845 CrossRef PubMed Google Scholar

[14] Jeong H., Tombor B., Albert R., Oltvai Z.N., Barabási A.L.. The large-scale organization of metabolic networks. Nature, 2000, 407: 651-654 CrossRef PubMed Google Scholar

[15] Jordan I.K., Mariño-Ramírez L., Wolf Y.I., Koonin E.V.. Conservation and coevolution in the scale-free human gene coexpression network. Mol Biol Evol, 2004, 21: 2058-2070 CrossRef PubMed Google Scholar

[16] Lee H.K., Hsu A.K., Sajdak J., Qin J., Pavlidis P.. Coexpression analysis of human genes across many microarray data sets. Genome Res, 2004, 14: 1085-1094 CrossRef PubMed Google Scholar

[17] Li M., Li Q., Ganegoda G.U., Wang J.X., Wu F.X., Pan Y.. Prioritization of orphan disease-causing genes using topological feature and GO similarity between proteins in interaction networks. Sci China Life Sci, 2014, 57: 1064-1071 CrossRef PubMed Google Scholar

[18] Maslov S., Sneppen K.. Specificity and stability in topology of protein networks. Science, 2002, 296: 910-913 CrossRef PubMed ADS Google Scholar

[19] Oldham M.C., Horvath S., Geschwind D.H.. Conservation and evolution of gene coexpression networks in human and chimpanzee brains. Proc Natl Acad Sci USA, 2006, 103: 17973-17978 CrossRef PubMed ADS Google Scholar

[20] Obayashi T., Okamura Y., Ito S., Tadaka S., Motoike I.N., Kinoshita K.. COXPRESdb: a database of comparative gene coexpression networks of eleven species for mammals. Nucleic Acids Res, 2012, 41: D1014-D1020 CrossRef PubMed Google Scholar

[21] Pastor-Satorras R., Smith E., Solé R.V.. Evolving protein interaction networks through gene duplication. J Theor Biol, 2003, 222: 199-210 CrossRef Google Scholar

[22] Prieto C., Risueño A., Fontanillo C., De las Rivas J.. Human gene coexpression landscape: confident network derived from tissue transcriptomic profiles. PLoS ONE, 2008, 3: e3911 CrossRef PubMed ADS Google Scholar

[23] Ravasz E., Somera A.L., Mongru D.A., Oltvai Z.N., Barabási A.L.. Hierarchical organization of modularity in metabolic networks. Science, 2002, 297: 1551-1555 CrossRef PubMed ADS Google Scholar

[24] Shen-Orr S.S., Milo R., Mangan S., Alon U.. Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet, 2002, 31: 64-68 CrossRef PubMed Google Scholar

[25] Sorrells T.R., Johnson A.D.. Making sense of transcription networks. Cell, 2015, 161: 714-723 CrossRef PubMed Google Scholar

[26] Tautz D., Domazet-Lošo T.. The evolutionary origin of orphan genes. Nat Rev Genet, 2011, 12: 692-702 CrossRef PubMed Google Scholar

[27] Watts, D.J., and Strogatz, S.H. (1998). Collective dynamics of ‘small-world’ networks. Nature 393, 440–442. Google Scholar

[28] Wagner A.. The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. Mol Biol Evol, 2001, 18: 1283-1292 CrossRef PubMed Google Scholar

[29] Wagner A.. How the global structure of protein interaction networks evolves. Proc R Soc London Ser B-Biol Sci, 2003, 270: 457-466 CrossRef PubMed Google Scholar

[30] Yu H., Mitra R., Yang J., Li Y.Y., Zhao Z.M.. Algorithms for network-based identification of differential regulators from transcriptome data: a systematic evaluation. Sci China Life Sci, 2014, 57: 1090-1102 CrossRef PubMed Google Scholar

[31] Zhang Y.F., Zhang R., Su B.. Diversity and evolution of microRNA gene clusters. Sci China Ser C-Life Sci, 2009, 52: 261-266 CrossRef PubMed Google Scholar

[32] Zhang Y.E., Vibranovski M.D., Landback P., Marais G.A.B., Long M.. Chromosomal redistribution of male-biased genes in mammalian evolution with two bursts of gene gain on the X chromosome. PLoS Biol, 2010, 8: e1000494 CrossRef PubMed Google Scholar

[33] Zhang Y.E., Landback P., Vibranovski M.D., Long M.. Accelerated recruitment of new brain development genes into the human genome. PLoS Biol, 2011, 9: e1001179 CrossRef PubMed Google Scholar

[34] Zhang W., Landback P., Gschwend A.R., Shen B., Long M.. New genes drive the evolution of gene interaction networks in the human and mouse genomes. Genome Biol, 2015, 16: 202 CrossRef PubMed Google Scholar

  • Figure 1

    A part of the gene age-specific coexpression network in human. Each circle corresponds to a gene (node), and each link (edge) connecting two genes corresponds to a coexpression of the two genes. The color and number of a node indicate the gene age and branch number, respectively. The younger the gene is, the bigger the branch number it is assigned.

  • Figure 2

    The degree and degree distribution of human gene coexpression network. A, The gene age-specific average degree. B, The gene age-specific proportion of hub genes: the number of hub genes with degree k³6 accounts for the total number of genes in corresponding gene age group. C, The degree distribution of whole gene coexpression network. D, The log degree distribution of whole gene coexpression network. E, The degree distribution of younger genes (branch 1–13).

  • Figure 3

    Average shortest path length and distribution. A, The gene age-specific average shortest path length. B, The distribution of shortest path length for whole network. C, The distribution of shortest path length for elderly genes and younger genes in the whole network.

  • Figure 4

    Average clustering coefficient and distribution. A, The gene age-specific average clustering coefficient. B, The average clustering coefficient of genes with degree k. C, The log average clustering coefficient of genes with degree k. D, The average clustering coefficient of younger genes with degree k.

  • Figure 5

    Average node betweenness and average edge betweenness. A, The gene age-specific average node betweenness of whole network. B, The gene age-specific average edge betweenness of whole network.

  • Figure 6

    Fraction of duplicate genes with shared interactions. A, Histogram of the fraction of duplicate genes which share at least one common interacting gene with their parents. B, The fraction of duplicate genes which finally shows the feature of parent-child interaction.

  • Figure 7

    The evolutionary process of a gene duplication and divergence. Shortly after a gene duplication, the parental gene P and duplicate gene C will interact with the same genes. Eventually, some or all of the common interactions will be lost, and new interactions may be gained by the duplicate gene C. In the rightmost panel, gene C has lost two common interactions and gained two new interaction partners.

  • Figure 8

    The fraction of new interaction partners with different degrees. A, The new interaction partners developed by a duplicate gene. B, The new interaction partners developed by an orphan gene.

  • Figure 9

    A flow diagram illustrating the evolutionary process of human gene coexpression network.

  • Figure 10

    The impact of younger genes on the topological structure of ancestral gene coexpression network. A, The influence of younger genes on the number of interaction genes in the whole network. B, The influence of younger genes on the average degree of interaction genes. C, The influence of younger genes on the average shortest path length of the whole network. D, The influence of younger genes on the average clustering coefficient of the whole network. E, The influence of younger genes on the average node betweenness of the whole network. F, The influence of younger genes on the average edge betweenness of the whole network. G, The influence of younger genes on the total number of disconnected components in the whole network. H, The influence of younger genes on the size of largest connected component. I, The influence of younger genes on the exponent of degree distribution of the whole network. J, The influence of younger genes on the modularity coefficient of the whole network.

  • Table 1   Gene age and gene numbers of each branch

    Branches

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    12

    13

    Species

    Zebrafish

    X. tropicalis

    Lizard

    Platypus

    Oppossum

    Elephant

    Cow

    Mouse

    Marmoset

    Rhesus

    Gibbon

    Orangutan

    Chimpanzee

    Human

    Divergence time (Myr)

    –435.0

    –352.0

    –312.0

    –177.0

    –159.0

    –105.0

    –96.0

    –91.9

    –43.0

    –29.3

    –20.2

    –15.8

    –6.7

    0.0

    Gene age (Myr)

    £–400.0

    –393.5

    –332.0

    –244.5

    –168.0

    –132.0

    –100.5

    –93.95

    –67.45

    –36.15

    –24.75

    –18.00

    –11.25

    –3.35

    # of genes

    12,189

    2,186

    668

    1,200

    1,070

    1,144

    162

    66

    154

    120

    82

    77

    162

    182

    # of genes in HGCN

    5,115

    681

    195

    395

    265

    255

    38

    22

    50

    41

    26

    25

    43

    19

    # of DNA-based duplicates in HGCN

    492

    115

    219

    152

    142

    21

    16

    29

    15

    9

    14

    19

    4

    # of RNA-based duplicates in HGCN

    11

    9

    8

    8

    17

    1

    1

    1

    1

    2

    0

    0

    2

    # of orphan genes in HGCN

    51

    16

    42

    13

    13

    1

    0

    2

    0

    0

    0

    1

    3

    The unit of time and age is million year (Myr), –3.35 Myr means 3.35 million years ago. Gene age is calculated as the middle time point of each branch. For example, genes assigned to branch 13 are shown at –3.35 Myr, the average origination time for an interval ranging from –6.7 to 0.0 Myr. The gene age of oldest branch (branch 0) is set as older than –400.0 Myr. HGCN denotes human gene coexpression network. # denotes number.

  • Table 2   Global statistical features of human gene coexpression network

    Types

    Whole network

    Elderly genes

    (Branch 0)

    Younger genes

    (Branch 1–13)

    Giant connected component

    Elderly genes

    (Branch 0)

    Younger genes

    (Branch 1–13)

    Number of genes

    7,170

    5,115

    2,055

    6,317

    4,602

    1,715

    Average degree

    36

    37

    31

    40

    41

    37

    Exponent of degree distribution (95% CI)

    1.264

    (1.221, 1.306)

    1.193

    (1.149, 1.236)

    0.985

    (0.924, 1.046)

    1.248

    (1.205, 1.291)

    1.181

    (1.138, 1.225)

    0.952

    (0.893, 1.010)

    Average shortest path length

    4.20

    4.11

    4.46

    4.20

    4.11

    4.47

    Average clustering coefficient

    0.6019

    0.5762

    0.6685

    0.5960

    0.5720

    0.6606

    Average node betweenness

    8911.0

    9270.6

    8016.0

    10114.1

    10303.9

    9604.9

    Average edge betweenness

    659.5

    658.8

    660.5

    664.8

    662.5

    667.9

    Modularity

    0.6622

    0.6579

    Number of components

    335

    1

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1