Transposable elements (TEs) are major components of the human genome constituting at least half of it. More than half a century ago, Barbara McClintock and later Roy Britten and Eric Davidson have postulated that they might be major players in the host gene regulation. We have scanned a large amount of data produced by ENCODE project for active transcription binding sites (TFBSs) located in TE-originated parts of polymerase II promoters. In total, more than 35,000 promoters in six different tissues were analyzed and over 26,000 of them harbored TEs. Moreover, these TEs usually provide one or more of TFBSs in the host promoters, which resulted in more than 6% of active TFBSs in these regions located in the TE-originated sequences. Rewiring of transcription circuits played a major role in mammalian evolution and consequently increased their functional and morphological diversity. In this large-scale analysis, we have demonstrated that TEs contributed a large fraction of human TFBSs. Interestingly, these TFBSs usually act in a tissue-specific manner. Thus, our study clearly showed that TEs played a significant role in shaping expression pattern in mammals and humans in particular. Furthermore, since several TE families are still active in our genome, they continue to influence not only our genome architecture but also gene functioning in a broader sense.
This work was funded by the Institute of Bioinformatics, Muenster, Germany. We acknowledge support by Open Access Publication Fund of University of Muenster.
The author(s) declare that they have no conflict of interest.
SUPPORTING INFORMATION The supporting information is available online at
[1] Banville D., Boie Y.. Retroviral long terminal repeat is the promoter of the gene encoding the tumor-associated calcium-binding protein oncomodulin in the rat. J Mol Biol, 1989, 207: 481-490 CrossRef Google Scholar
[2] Britten R.J., Davidson E.H.. Gene regulation for higher cells: a theory. Science, 1969, 165: 349-357 CrossRef ADS Google Scholar
[3] Britten, R.J., and Kohne, D.E. (1968). Repeated sequences in DNA. Hundreds of thousands of copies of DNA sequences have been incorporated into the genomes of higher organisms. Science 161, 529–540. Google Scholar
[4] Brosius J.. Retroposons—seeds of evolution. Science, 1991, 251: 753 CrossRef ADS Google Scholar
[5] Chuong E.B., Rumi M.A.K., Soares M.J., Baker J.C.. Endogenous retroviruses function as species-specific enhancer elements in the placenta. Nat Genet, 2013, 45: 325-329 CrossRef PubMed Google Scholar
[6] Cordaux R., Batzer M.A.. The impact of retrotransposons on human genome evolution. Nat Rev Genet, 2009, 10: 691-703 CrossRef PubMed Google Scholar
[7] de Koning A.P.J., Gu W., Castoe T.A., Batzer M.A., Pollock D.D.. Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet, 2011, 7: e1002384 CrossRef PubMed Google Scholar
[8] Doolittle W.F., Sapienza C.. Selfish genes, the phenotype paradigm and genome evolution. Nature, 1980, 284: 601-603 CrossRef ADS Google Scholar
[9] Feschotte C.. Transposable elements and the evolution of regulatory networks. Nat Rev Genet, 2008, 9: 397-405 CrossRef PubMed Google Scholar
[10] Finnegan D.J.. Eukaryotic transposable elements and genome evolution. Trends Genets, 1989, 5: 103-107 CrossRef Google Scholar
[11] Hamdi H.K., Nishio H., Tavis J., Zielinski R., Dugaiczyk A.. Alu-mediated phylogenetic novelties in gene regulation and development. J Mol Biol, 2000, 299: 931-939 CrossRef PubMed Google Scholar
[12] Harrow J., Frankish A., Gonzalez J.M., Tapanari E., Diekhans M., Kokocinski F., Aken B.L., Barrell D., Zadissa A., Searle S., et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res, 2012, 22: 1760-1774 CrossRef PubMed Google Scholar
[13] Hickey, D.A. (1982). Selfish DNA: a sexually-transmitted nuclear parasite. Genetics 101, 519–531. Google Scholar
[14] Jordan I.K., Rogozin I.B., Glazko G.V., Koonin E.V.. Origin of a substantial fraction of human regulatory sequences from transposable elements. Trends Genets, 2003, 19: 68-72 CrossRef Google Scholar
[15] Kanehisa M., Furumichi M., Tanabe M., Sato Y., Morishima K.. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res, 2017, 45: D353-D361 CrossRef PubMed Google Scholar
[16] Kazazian H.H., Moran J.V.. The impact of L1 retrotransposons on the human genome. Nat Genet, 1998, 19: 19-24 CrossRef PubMed Google Scholar
[17] King M.C., Wilson A.C.. Evolution at two levels in humans and chimpanzees. Science, 1975, 188: 107-116 CrossRef ADS Google Scholar
[18] Korenberg J.R., Rykowski M.C.. Human genome organization: Alu, lines, and the molecular structure of metaphase chromosome bands. Cell, 1988, 53: 391-400 CrossRef Google Scholar
[19] Kunarso G., Chia N.Y., Jeyakani J., Hwang C., Lu X., Chan Y.S., Ng H.H., Bourque G.. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat Genet, 2010, 42: 631-634 CrossRef PubMed Google Scholar
[20] Lander E.S., Linton L.M., Birren B., Nusbaum C., Zody M.C., Baldwin J., Devon K., Dewar K., Doyle M., FitzHugh W., et al. Initial sequencing and analysis of the human genome. Nature, 2001, 409: 860-921 CrossRef PubMed Google Scholar
[21] Lynch V.J., Leclerc R.D., May G., Wagner G.P.. Transposon-mediated rewiring of gene regulatory networks contributed to the evolution of pregnancy in mammals. Nat Genet, 2011, 43: 1154-1159 CrossRef PubMed Google Scholar
[22]
Makalowski, W. (1995). SINEs as a genomic scrap yard: an essay on genomic evolution. In The Impact of Short Interspersed Elements (SINEs) on the Host Genome, R. Maraia, ed. (Austin TX: R.G. Landes), pp.
[23] Makalowski W.. Genomic scrap yard: how genomes utilize all that junk. Gene, 2000, 259: 61-67 CrossRef Google Scholar
[24]
Malamy, M.H., Fiandt, M., and Szybalski, W. (1972). Electron-microscopy of polar insertions in lac Operon of
[25] McClintock B.. The origin and behavior of mutable loci in maize. Proc Natl Acad Sci USA, 1950, 36: 344-355 CrossRef ADS Google Scholar
[26] McClintock, B. (1956). Intranuclear systems controlling gene action and mutation. Brookhaven Symp Biol, 58–74. Google Scholar
[27] Orgel L.E., Crick F.H.C.. Selfish DNA: the ultimate parasite. Nature, 1980, 284: 604-607 CrossRef ADS Google Scholar
[28] Simonti C.N., Pavlicev M., Capra J.A.. Transposable element exaptation into regulatory regions is rare, influenced by evolutionary age, and subject to pleiotropic constraints. Mol Biol Evol, 2017, 34: 2856-2869 CrossRef PubMed Google Scholar
[29] Sloan C.A., Chan E.T., Davidson J.M., Malladi V.S., Strattan J.S., Hitz B.C., Gabdank I., Narayanan A.K., Ho M., Lee B.T., et al. ENCODE data at the ENCODE portal. Nucleic Acids Res, 2016, 44: D726-D732 CrossRef PubMed Google Scholar
[30] Sverdlov E.D.. Perpetually mobile footprints of ancient infections in human genome. FEBS Lett, 1998, 428: 1-6 CrossRef Google Scholar
[31] Thornburg B.G., Gotea V., Makałowski W.. Transposable elements as a significant source of transcription regulating signals. Gene, 2006, 365: 104-110 CrossRef PubMed Google Scholar
[32] Trizzino M., Kapusta A., Brown C.D.. Transposable elements generate regulatory novelty in a tissue-specific fashion. BMC Genomics, 2018, 19: 468 CrossRef PubMed Google Scholar
[33] Venter J.C., Adams M.D., Myers E.W., Li P.W., Mural R.J., Sutton G.G., Smith H.O., Yandell M., Evans C.A., Holt R.A., et al. The sequence of the human genome. Science, 2001, 291: 1304-1351 CrossRef PubMed ADS Google Scholar
[34] Wang J., Vasaikar S., Shi Z., Greer M., Zhang B.. WebGestalt 2017: a more comprehensive, powerful, flexible and interactive gene set enrichment analysis toolkit. Nucleic Acids Res, 2017, 45: W130-W137 CrossRef PubMed Google Scholar
[35] Waring M., Britten R.J.. Nucleotide sequence repetition: a rapidly reassociating fraction of mouse DNA. Science, 1966, 154: 791-794 CrossRef ADS Google Scholar
[36] Wicker T., Sabot F., Hua-Van A., Bennetzen J.L., Capy P., Chalhoub B., Flavell A., Leroy P., Morgante M., Panaud O., et al. A unified classification system for eukaryotic transposable elements. Nat Rev Genet, 2007, 8: 973-982 CrossRef PubMed Google Scholar
[37] Wingender E., Schoeps T., Haubrock M., Krull M., Dönitz J.. TFClass: expanding the classification of human transcription factors to their mammalian orthologs. Nucleic Acids Res, 2018, 46: D343-D347 CrossRef PubMed Google Scholar
Figure 1
Distribution of major types of TEs in the human genome. The promoter regions are depicted in the inner circle, while TE distribution in the rest of the human genome is presented in the outer circle.
Figure 2
Distribution of TE-derived sequences in the pol II promoter regions.
Figure 3
Fraction of promoter area occupied by TE-originated sequences. The promoters were analyzed using sliding window approach with wind size of
Figure 4
Distribution of TFBSs in different TE-families localized within promoter regions.
Figure 5
Pair-wise comparison of TFBSs’ uniqueness. Fraction of the query (
Tissue group | Number of TFs |
Blood | 360 |
Liver | 212 |
Kidney | 204 |
Breast | 93 |
Stem cells | 50 |
Lung | 50 |
Reproductive organs | 24 |
Bone marrow | 18 |
Fibroplast | 15 |
Nerve cells | 11 |
Digestive tract | 10 |
Prostatet gland | 6 |
Blood transport | 5 |
Muscle | 5 |
Pancreas | 5 |
Skin | 4 |
Lymph nodes or similar | 3 |
Spleen | 3 |
Parathyroid | 2 |
Adrenal gland | 2 |
Fat | 2 |
Retina | 1 |
TE family | Genome | Promoter regions | Non-promoter regions | |||||
Number of elements | Number of nucleotides | Number of elements | Number of nucleotides | Number of elements | Number of nucleotides | |||
LINE | 1,516,226 | 641,953,033 | 16,210 | 3,211,688 | 1,500,016 | 638,741,345 | ||
SINE | 1,779,271 | 392,908,499 | 32,368 | 6,655,074 | 1,746,903 | 386,253,425 | ||
LTR | 725,763 | 268,434,413 | 6,549 | 1,719,349 | 719,214 | 266,715,064 | ||
DNA | 489,391 | 103,055,478 | 6,487 | 1,042,442 | 482,904 | 102,013,036 | ||
Retroposon | 5,397 | 4,223,296 | 50 | 7,089 | 5,347 | 4,216,207 | ||
Unknown | 5,531 | 737,222 | 55 | 6,043 | 5,476 | 731,179 | ||
Total | 4,521,579 | 1,411,311,941 | 61,719 | 12,641,685 | 4,459,860 | 1,398,670,256 |
Gene ID | Gene name | Fraction of TE-derived sequences | TE elements | Gene type |
ENSG00000154415.7 | 0.95 | LTR | Protein coding | |
ENSG00000166228.8 | 0.92 | LINE | Protein coding | |
ENSG00000233480.1 | 0.94 | LINE | LincRNA | |
ENSG00000257729.2 | 0.95 | Different types | LincRNA | |
ENSG00000258969.1 | 0.93 | LTR | LincRNA | |
ENSG00000267543.1 | 1.00 | Different types | Sense intronic | |
ENSG00000272386.1 | 1.00 | Different types | Sense intronic | |
ENSG00000285191.1 | 0.95 | Different types | LincRNA |
Tissue | Pathway name | KEGG number |
Blood | Aminoacyl-tRNA biosynthesis | hsa00970 |
Basal transcription factors | hsa03022 | |
Citrate cycle (TCA cycle) | hsa00020 | |
DNA replication | hsa03030 | |
NF-κB signaling pathway | hsa04064 | |
Nucleotide excision repair | hsa03420 | |
Oxidative phosphorylation | hsa00190 | |
Proteasome | hsa03050 | |
Protein export | hsa03060 | |
Ribosome | hsa03010 | |
RNA transport | hsa03013 | |
SNARE interactions in vesicular transport | hsa04130 | |
Ubiquinone and other terpenoid-quinone biosynthesis | hsa00130 | |
Breast | Metabolic pathways | hsa01100 |
Non-alcoholic fatty liver disease (NAFLD) | hsa04932 | |
Oxidative phosphorylation | hsa00190 | |
Kidney | Nucleotide excision repair | hsa00190 |
Non-alcoholic fatty liver disease (NAFLD) | hsa04932 | |
Oxidative phosphorylation | hsa00190 | |
Liver | Bacterial invasion of epithelial cells | hsa05100 |
Carbon metabolism | hsa01200 | |
Citrate cycle (TCA cycle) | hsa00020 | |
Non-alcoholic fatty liver disease (NAFLD) | hsa04932 | |
Ribosome | hsa05120 | |
Lung | Epithelial cell signaling in Helicobacter pylori infection | hsa03010 |
Ribosome | hsa05120 | |
Stem cells | No enrichment |
Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有
京ICP备18024590号-1