logo

SCIENCE CHINA Life Sciences, Volume 62, Issue 4: 489-497(2019) https://doi.org/10.1007/s11427-018-9449-0

Transposable elements significantly contributed to the core promoters in the human genome

More info
  • ReceivedSep 20, 2018
  • AcceptedOct 18, 2018
  • PublishedMar 19, 2019

Abstract

Transposable elements (TEs) are major components of the human genome constituting at least half of it. More than half a century ago, Barbara McClintock and later Roy Britten and Eric Davidson have postulated that they might be major players in the host gene regulation. We have scanned a large amount of data produced by ENCODE project for active transcription binding sites (TFBSs) located in TE-originated parts of polymerase II promoters. In total, more than 35,000 promoters in six different tissues were analyzed and over 26,000 of them harbored TEs. Moreover, these TEs usually provide one or more of TFBSs in the host promoters, which resulted in more than 6% of active TFBSs in these regions located in the TE-originated sequences. Rewiring of transcription circuits played a major role in mammalian evolution and consequently increased their functional and morphological diversity. In this large-scale analysis, we have demonstrated that TEs contributed a large fraction of human TFBSs. Interestingly, these TFBSs usually act in a tissue-specific manner. Thus, our study clearly showed that TEs played a significant role in shaping expression pattern in mammals and humans in particular. Furthermore, since several TE families are still active in our genome, they continue to influence not only our genome architecture but also gene functioning in a broader sense.


Acknowledgment

This work was funded by the Institute of Bioinformatics, Muenster, Germany. We acknowledge support by Open Access Publication Fund of University of Muenster.


Interest statement

The author(s) declare that they have no conflict of interest.


Supplement

SUPPORTING INFORMATION

Table S1ƒThe complete list of the datasets used in the promoter analysis

Table S2ƒGene categories used in the study

Table S3 ƒPathways enriched in the gene set whose promoters harbor TE-derived sequences

Table S4ƒPathways enriched in the gene set whose promoters were devoid of Tes

The supporting information is available online at http://life.scichina.com and http://link.springer.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.


References

[1] Banville D., Boie Y.. Retroviral long terminal repeat is the promoter of the gene encoding the tumor-associated calcium-binding protein oncomodulin in the rat. J Mol Biol, 1989, 207: 481-490 CrossRef Google Scholar

[2] Britten R.J., Davidson E.H.. Gene regulation for higher cells: a theory. Science, 1969, 165: 349-357 CrossRef ADS Google Scholar

[3] Britten, R.J., and Kohne, D.E. (1968). Repeated sequences in DNA. Hundreds of thousands of copies of DNA sequences have been incorporated into the genomes of higher organisms. Science 161, 529–540. Google Scholar

[4] Brosius J.. Retroposons—seeds of evolution. Science, 1991, 251: 753 CrossRef ADS Google Scholar

[5] Chuong E.B., Rumi M.A.K., Soares M.J., Baker J.C.. Endogenous retroviruses function as species-specific enhancer elements in the placenta. Nat Genet, 2013, 45: 325-329 CrossRef PubMed Google Scholar

[6] Cordaux R., Batzer M.A.. The impact of retrotransposons on human genome evolution. Nat Rev Genet, 2009, 10: 691-703 CrossRef PubMed Google Scholar

[7] de Koning A.P.J., Gu W., Castoe T.A., Batzer M.A., Pollock D.D.. Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet, 2011, 7: e1002384 CrossRef PubMed Google Scholar

[8] Doolittle W.F., Sapienza C.. Selfish genes, the phenotype paradigm and genome evolution. Nature, 1980, 284: 601-603 CrossRef ADS Google Scholar

[9] Feschotte C.. Transposable elements and the evolution of regulatory networks. Nat Rev Genet, 2008, 9: 397-405 CrossRef PubMed Google Scholar

[10] Finnegan D.J.. Eukaryotic transposable elements and genome evolution. Trends Genets, 1989, 5: 103-107 CrossRef Google Scholar

[11] Hamdi H.K., Nishio H., Tavis J., Zielinski R., Dugaiczyk A.. Alu-mediated phylogenetic novelties in gene regulation and development. J Mol Biol, 2000, 299: 931-939 CrossRef PubMed Google Scholar

[12] Harrow J., Frankish A., Gonzalez J.M., Tapanari E., Diekhans M., Kokocinski F., Aken B.L., Barrell D., Zadissa A., Searle S., et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res, 2012, 22: 1760-1774 CrossRef PubMed Google Scholar

[13] Hickey, D.A. (1982). Selfish DNA: a sexually-transmitted nuclear parasite. Genetics 101, 519–531. Google Scholar

[14] Jordan I.K., Rogozin I.B., Glazko G.V., Koonin E.V.. Origin of a substantial fraction of human regulatory sequences from transposable elements. Trends Genets, 2003, 19: 68-72 CrossRef Google Scholar

[15] Kanehisa M., Furumichi M., Tanabe M., Sato Y., Morishima K.. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res, 2017, 45: D353-D361 CrossRef PubMed Google Scholar

[16] Kazazian H.H., Moran J.V.. The impact of L1 retrotransposons on the human genome. Nat Genet, 1998, 19: 19-24 CrossRef PubMed Google Scholar

[17] King M.C., Wilson A.C.. Evolution at two levels in humans and chimpanzees. Science, 1975, 188: 107-116 CrossRef ADS Google Scholar

[18] Korenberg J.R., Rykowski M.C.. Human genome organization: Alu, lines, and the molecular structure of metaphase chromosome bands. Cell, 1988, 53: 391-400 CrossRef Google Scholar

[19] Kunarso G., Chia N.Y., Jeyakani J., Hwang C., Lu X., Chan Y.S., Ng H.H., Bourque G.. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat Genet, 2010, 42: 631-634 CrossRef PubMed Google Scholar

[20] Lander E.S., Linton L.M., Birren B., Nusbaum C., Zody M.C., Baldwin J., Devon K., Dewar K., Doyle M., FitzHugh W., et al. Initial sequencing and analysis of the human genome. Nature, 2001, 409: 860-921 CrossRef PubMed Google Scholar

[21] Lynch V.J., Leclerc R.D., May G., Wagner G.P.. Transposon-mediated rewiring of gene regulatory networks contributed to the evolution of pregnancy in mammals. Nat Genet, 2011, 43: 1154-1159 CrossRef PubMed Google Scholar

[22] Makalowski, W. (1995). SINEs as a genomic scrap yard: an essay on genomic evolution. In The Impact of Short Interspersed Elements (SINEs) on the Host Genome, R. Maraia, ed. (Austin TX: R.G. Landes), pp. 81–104. Google Scholar

[23] Makalowski W.. Genomic scrap yard: how genomes utilize all that junk. Gene, 2000, 259: 61-67 CrossRef Google Scholar

[24] Malamy, M.H., Fiandt, M., and Szybalski, W. (1972). Electron-microscopy of polar insertions in lac Operon of Escherichia coli. Mol Gen Genet 119, 207–222. Google Scholar

[25] McClintock B.. The origin and behavior of mutable loci in maize. Proc Natl Acad Sci USA, 1950, 36: 344-355 CrossRef ADS Google Scholar

[26] McClintock, B. (1956). Intranuclear systems controlling gene action and mutation. Brookhaven Symp Biol, 58–74. Google Scholar

[27] Orgel L.E., Crick F.H.C.. Selfish DNA: the ultimate parasite. Nature, 1980, 284: 604-607 CrossRef ADS Google Scholar

[28] Simonti C.N., Pavlicev M., Capra J.A.. Transposable element exaptation into regulatory regions is rare, influenced by evolutionary age, and subject to pleiotropic constraints. Mol Biol Evol, 2017, 34: 2856-2869 CrossRef PubMed Google Scholar

[29] Sloan C.A., Chan E.T., Davidson J.M., Malladi V.S., Strattan J.S., Hitz B.C., Gabdank I., Narayanan A.K., Ho M., Lee B.T., et al. ENCODE data at the ENCODE portal. Nucleic Acids Res, 2016, 44: D726-D732 CrossRef PubMed Google Scholar

[30] Sverdlov E.D.. Perpetually mobile footprints of ancient infections in human genome. FEBS Lett, 1998, 428: 1-6 CrossRef Google Scholar

[31] Thornburg B.G., Gotea V., Makałowski W.. Transposable elements as a significant source of transcription regulating signals. Gene, 2006, 365: 104-110 CrossRef PubMed Google Scholar

[32] Trizzino M., Kapusta A., Brown C.D.. Transposable elements generate regulatory novelty in a tissue-specific fashion. BMC Genomics, 2018, 19: 468 CrossRef PubMed Google Scholar

[33] Venter J.C., Adams M.D., Myers E.W., Li P.W., Mural R.J., Sutton G.G., Smith H.O., Yandell M., Evans C.A., Holt R.A., et al. The sequence of the human genome. Science, 2001, 291: 1304-1351 CrossRef PubMed ADS Google Scholar

[34] Wang J., Vasaikar S., Shi Z., Greer M., Zhang B.. WebGestalt 2017: a more comprehensive, powerful, flexible and interactive gene set enrichment analysis toolkit. Nucleic Acids Res, 2017, 45: W130-W137 CrossRef PubMed Google Scholar

[35] Waring M., Britten R.J.. Nucleotide sequence repetition: a rapidly reassociating fraction of mouse DNA. Science, 1966, 154: 791-794 CrossRef ADS Google Scholar

[36] Wicker T., Sabot F., Hua-Van A., Bennetzen J.L., Capy P., Chalhoub B., Flavell A., Leroy P., Morgante M., Panaud O., et al. A unified classification system for eukaryotic transposable elements. Nat Rev Genet, 2007, 8: 973-982 CrossRef PubMed Google Scholar

[37] Wingender E., Schoeps T., Haubrock M., Krull M., Dönitz J.. TFClass: expanding the classification of human transcription factors to their mammalian orthologs. Nucleic Acids Res, 2018, 46: D343-D347 CrossRef PubMed Google Scholar

  • Figure 1

    Distribution of major types of TEs in the human genome. The promoter regions are depicted in the inner circle, while TE distribution in the rest of the human genome is presented in the outer circle.

  • Figure 2

    Distribution of TE-derived sequences in the pol II promoter regions.

  • Figure 3

    Fraction of promoter area occupied by TE-originated sequences. The promoters were analyzed using sliding window approach with wind size of 50 nt and 5 nt sliding step. Since plotting point was set to the middle of the window, the first dot was placed at position –25 and it represents window –1 to –50.

  • Figure 4

    Distribution of TFBSs in different TE-families localized within promoter regions.

  • Figure 5

    Pair-wise comparison of TFBSs’ uniqueness. Fraction of the query (Y-axis) tissue TFBSs as compared to the reference tissue (X-axis) is listed in a cell. The results for all TFs available for analysis were concatenated. A, TFBSs located in the TE-originated sequences. B, TFBSs located in the whole promoters.

  • Table 1   Number of available TFs experimentally analyzed in different tissues

    Tissue group

    Number of TFs

    Blood

    360

    Liver

    212

    Kidney

    204

    Breast

    93

    Stem cells

    50

    Lung

    50

    Reproductive organs

    24

    Bone marrow

    18

    Fibroplast

    15

    Nerve cells

    11

    Digestive tract

    10

    Prostatet gland

    6

    Blood transport

    5

    Muscle

    5

    Pancreas

    5

    Skin

    4

    Lymph nodes or similar

    3

    Spleen

    3

    Parathyroid

    2

    Adrenal gland

    2

    Fat

    2

    Retina

    1

  • Table 2   Transposon distribution in different genomic regions

    TE family

    Genome

    Promoter regions

    Non-promoter regions

    Number of elements

    Number of nucleotides

    Number of elements

    Number of nucleotides

    Number of elements

    Number of nucleotides

    LINE

    1,516,226

    641,953,033

    16,210

    3,211,688

    1,500,016

    638,741,345

    SINE

    1,779,271

    392,908,499

    32,368

    6,655,074

    1,746,903

    386,253,425

    LTR

    725,763

    268,434,413

    6,549

    1,719,349

    719,214

    266,715,064

    DNA

    489,391

    103,055,478

    6,487

    1,042,442

    482,904

    102,013,036

    Retroposon

    5,397

    4,223,296

    50

    7,089

    5,347

    4,216,207

    Unknown

    5,531

    737,222

    55

    6,043

    5,476

    731,179

    Total

    4,521,579

    1,411,311,941

    61,719

    12,641,685

    4,459,860

    1,398,670,256

  • Table 3   Human genes whose promoters almost completely originated in TEs

    Gene ID

    Gene name

    Fraction of TE-derived sequences

    TE elements

    Gene type

    ENSG00000154415.7

    PPP1R3A

    0.95

    LTR

    Protein coding

    ENSG00000166228.8

    PCBD1

    0.92

    LINE

    Protein coding

    ENSG00000233480.1

    AP000946.2

    0.94

    LINE

    LincRNA

    ENSG00000257729.2

    RP11-788H18.1

    0.95

    Different types

    LincRNA

    ENSG00000258969.1

    RP11-305B6.3

    0.93

    LTR

    LincRNA

    ENSG00000267543.1

    AC015802.3-201

    1.00

    Different types

    Sense intronic

    ENSG00000272386.1

    AC015802.5

    1.00

    Different types

    Sense intronic

    ENSG00000285191.1

    AC090679.2

    0.95

    Different types

    LincRNA

  • Table 4   Pathways overrepresented in different tissue comparisons of TFBSs

    Tissue

    Pathway name

    KEGG number

    Blood

    Aminoacyl-tRNA biosynthesis

    hsa00970

    Basal transcription factors

    hsa03022

    Citrate cycle (TCA cycle)

    hsa00020

    DNA replication

    hsa03030

    NF-κB signaling pathway

    hsa04064

    Nucleotide excision repair

    hsa03420

    Oxidative phosphorylation

    hsa00190

    Proteasome

    hsa03050

    Protein export

    hsa03060

    Ribosome

    hsa03010

    RNA transport

    hsa03013

    SNARE interactions in vesicular transport

    hsa04130

    Ubiquinone and other terpenoid-quinone biosynthesis

    hsa00130

    Breast

    Metabolic pathways

    hsa01100

    Non-alcoholic fatty liver disease (NAFLD)

    hsa04932

    Oxidative phosphorylation

    hsa00190

    Kidney

    Nucleotide excision repair

    hsa00190

    Non-alcoholic fatty liver disease (NAFLD)

    hsa04932

    Oxidative phosphorylation

    hsa00190

    Liver

    Bacterial invasion of epithelial cells

    hsa05100

    Carbon metabolism

    hsa01200

    Citrate cycle (TCA cycle)

    hsa00020

    Non-alcoholic fatty liver disease (NAFLD)

    hsa04932

    Ribosome

    hsa05120

    Lung

    Epithelial cell signaling in Helicobacter pylori infection

    hsa03010

    Ribosome

    hsa05120

    Stem cells

    No enrichment

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1