SCIENCE CHINA Life Sciences, Volume 62, Issue 4: 467-488(2019) https://doi.org/10.1007/s11427-018-9458-0

Characterization and evolutionary dynamics of complex regions in eukaryotic genomes

More info
  • ReceivedOct 1, 2018
  • AcceptedNov 5, 2018
  • PublishedFeb 22, 2019


Complex regions in eukaryotic genomes are typically characterized by duplications of chromosomal stretches that often include one or more genes repeated in a tandem array or in relatively close proximity. Nevertheless, the repetitive nature of these regions, together with the often high sequence identity among repeats, have made complex regions particularly recalcitrant to proper molecular characterization, often being misassembled or completely absent in genome assemblies. This limitation has prevented accurate functional and evolutionary analyses of these regions. This is becoming increasingly relevant as evidence continues to support a central role for complex genomic regions in explaining human disease, developmental innovations, and ecological adaptations across phyla. With the advent of long-read sequencing technologies and suitable assemblers, the development of algorithms that can accommodate sample heterozygosity, and the adoption of a pangenomic-like view of these regions, accurate reconstructions of complex regions are now within reach. These reconstructions will finally allow for accurate functional and evolutionary studies of complex genomic regions, underlying the generation of genotype-phenotype maps of unprecedented resolution.

Funded by

a National Science Foundation Grant(MCB-1157876)


This work was supported by a National Science Foundation Grant (MCB-1157876) to J.M.R.

Interest statement

The author(s) declare that they have no conflict of interest.


[1] Abel H.J., Duncavage E.J.. Detection of structural DNA variation from next generation sequencing data: a review of informatic approaches. Cancer Genets, 2013, 206: 432-440 CrossRef PubMed Google Scholar

[2] Absalan, F., and Ronaghi, M. (2007). Molecular inversion probe assay. Methods Mol Biol 396, 315-330. Google Scholar

[3] Abu Bakar S., Hollox E.J., Armour J.A.L.. Allelic recombination between distinct genomic locations generates copy number diversity in human β-defensins. Proc Natl Acad Sci USA, 2009, 106: 853-858 CrossRef PubMed ADS Google Scholar

[4] Abyzov A., Urban A.E., Snyder M., Gerstein M.. CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res, 2011, 21: 974-984 CrossRef PubMed Google Scholar

[5] Adams M.D., Celniker S.E., Holt R.A., Evans C.A, Gocayne J.D., Amanatides P.G., Scherer S.E., Li P.W., Hoskins R.A., Galle R.F., et al. The genome sequence of Drosophila melanogaster. Science, 2000, 287: 2185-2195 CrossRef ADS Google Scholar

[6] Alberts, B. (2008). Molecular Biology of the Cell, 5th edn (New York: Garland Science). Google Scholar

[7] Alkan C., Coe B.P., Eichler E.E.. Genome structural variation discovery and genotyping. Nat Rev Genet, 2011a, 12: 363-376 CrossRef PubMed Google Scholar

[8] Alkan C., Sajjadian S., Eichler E.E.. Limitations of next-generation genome sequence assembly. Nat Methods, 2011b, 8: 61-65 CrossRef PubMed Google Scholar

[9] Ananiev E.V., Chamberlin M.A., Klaiber J., Svitashev S.. Microsatellite megatracts in the maize (Zea mays L.) genome. Genome, 2005, 48: 1061-1069 CrossRef PubMed Google Scholar

[10] Andersson D.I., Jerlström-Hultqvist J., Näsvall J.. Evolution of new functions de novo and from preexisting genes. Cold Spring Harb Perspect Biol, 2015, 7: a017996 CrossRef PubMed Google Scholar

[11] Anhuf D., Eggermann T., Rudnik-Schöneborn S., Zerres K.. Determination of SMN1 and SMN2 copy number using TaqMan™ technology. Hum Mutat, 2003, 22: 74-78 CrossRef PubMed Google Scholar

[12] Arabidopsis Genome Initiative. (2000). Analysis of the genome sequence of the flowering plant Arabidopsis thaliana, Nature 408, 796–815. Google Scholar

[13] Arguello J.R., Chen Y., Yang S., Wang W., Long M.. Origination of an X-linked testes chimeric gene by illegitimate recombination in Drosophila. PLoS Genet, 2006, 2: e77 CrossRef PubMed Google Scholar

[14] Arguello J.R., Connallon T.. Gene duplication and ectopic gene conversion in Drosophila. Genes, 2011, 2: 131-151 CrossRef PubMed Google Scholar

[15] Assogba B.S., Milesi P., Djogbénou L.S., Berthomieu A., Makoundou P., Baba-Moussa L.S., Fiston-Lavier A.S., Belkhir K., Labbé P., Weill M.. The ace-1 locus is amplified in all resistant anopheles gambiae mosquitoes: fitness consequences of homogeneous and heterogeneous duplications. PLoS Biol, 2016, 14: e2000618 CrossRef PubMed Google Scholar

[16] Baltimore D.. Gene conversion: some implications for immunoglobulin genes. Cell, 1981, 24: 592-594 CrossRef Google Scholar

[17] Bankevich A., Nurk S., Antipov D., Gurevich A.A., Dvorkin M., Kulikov A.S., Lesin V.M., Nikolenko S.I., Pham S., Prjibelski A.D., et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol, 2012, 19: 455-477 CrossRef PubMed Google Scholar

[18] Bass C., Field L.M.. Gene amplification and insecticide resistance. Pest Manag Sci, 2011, 67: 886-890 CrossRef PubMed Google Scholar

[19] Bellos E., Johnson M.R., Coin L.J.M.. cnvHiTSeq: integrative models for high-resolution copy number variation detection and genotyping using population sequencing data. Genome Biol, 2012, 13: R120 CrossRef PubMed Google Scholar

[20] Bennett-Baker P.E., Mueller J.L.. CRISPR-mediated isolation of specific megabase segments of genomic DNA. Nucleic Acids Res, 2017, 45: e165 CrossRef PubMed Google Scholar

[21] Bentley D.R., Balasubramanian S., Swerdlow H.P., Smith G.P., Milton J., Brown C.G., Hall K.P., Evers D.J., Barnes C.L., Bignell H.R., et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature, 2008, 456: 53-59 CrossRef PubMed ADS Google Scholar

[22] Bergthorsson U., Andersson D.I., Roth J.R.. Ohno’s dilemma: Evolution of new genes under continuous selection. Proc Natl Acad Sci USA, 2007, 104: 17004-17009 CrossRef PubMed ADS Google Scholar

[23] Berlin K., Koren S., Chin C.S., Drake J.P., Landolin J.M., Phillippy A.M.. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat Biotechnol, 2015, 33: 623-630 CrossRef PubMed Google Scholar

[24] Béziat V., Traherne J.A., Liu L.L., Jayaraman J., Enqvist M., Larsson S., Trowsdale J., Malmberg K.J.. Influence of KIR gene copy number on natural killer cell education. Blood, 2013, 121: 4703-4707 CrossRef PubMed Google Scholar

[25] Bleidorn C.. Third generation sequencing: technology and its potential impact on evolutionary biodiversity research. Systatics Biodiversity, 2016, 14: 1-8 CrossRef Google Scholar

[26] Bresler M., Sheehan S., Chan A.H., Song Y.S.. Telescoper: de novo assembly of highly repetitive regions. Bioinformatics, 2012, 28: i311-i317 CrossRef PubMed Google Scholar

[27] Buermans H.P.J., Vossen R.H.A.M., Anvar S.Y., Allard W.G., Guchelaar H.J., White S.J., den Dunnen J.T., Swen J.J., van der Straaten T.. Flexible and scalable full-length CYP2D6 long amplicon PacBio sequencing. Human Mutat, 2017, 38: 310-316 CrossRef PubMed Google Scholar

[28] Campbell P.J., Stephens P.J., Pleasance E.D., O'Meara S., Li H., Santarius T., Stebbings L.A., Leroy C., Edkins S., Hardy C., et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat Genet, 2008, 40: 722-729 CrossRef PubMed Google Scholar

[29] Cardoso-Moreira M., Arguello J.R., Gottipati S., Harshman L.G., Grenier J.K., Clark A.G.. Evidence for the fixation of gene duplications by positive selection in Drosophila. Genome Res, 2016, 26: 787-798 CrossRef PubMed Google Scholar

[30] Carpenter D., Dhar S., Mitchell L.M., Fu B., Tyson J., Shwan N.A.A., Yang F., Thomas M.G., Armour J.A.L.. Obesity, starch digestion and amylase: association between copy number variants at human salivary (AMY1) and pancreatic (AMY2) amylase genes. Human Mol Genets, 2015, 24: 3472-3480 CrossRef PubMed Google Scholar

[31] Carvalho C.M.B., Lupski J.R.. Mechanisms underlying structural variant formation in genomic disorders. Nat Rev Genet, 2016, 17: 224-238 CrossRef PubMed Google Scholar

[32] Casola C., Ganote C.L., Hahn M.W.. Nonallelic gene conversion in the genus Drosophila. Genetics, 2010, 185: 95-103 CrossRef PubMed Google Scholar

[33] Chaisson M.J.P., Huddleston J., Dennis M.Y., Sudmant P.H., Malig M., Hormozdiari F., Antonacci F., Surti U., Sandstrom R., Boitano M., et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature, 2015a, 517: 608-611 CrossRef PubMed ADS Google Scholar

[34] Chaisson M.J.P., Wilson R.K., Eichler E.E.. Genetic variation and the de novo assembly of human genomes. Nat Rev Genet, 2015b, 16: 627-640 CrossRef PubMed Google Scholar

[35] Chakraborty M., Baldwin-Brown J.G., Long A.D., Emerson J.J.. Contiguous and accurate de novo assembly of metazoan genomes with modest long read coverage. Nucleic Acids Res, 2016, 15: gkw654 CrossRef PubMed Google Scholar

[36] Chakraborty M., VanKuren N.W., Zhao R., Zhang X., Kalsow S., Emerson J.J.. Hidden genetic variation shapes the structure of functional elements in Drosophila. Nat Genet, 2018, 50: 20-25 CrossRef PubMed Google Scholar

[37] Charrier C., Joshi K., Coutinho-Budd J., Kim J.E., Lambert N., de Marchena J., Jin W.L., Vanderhaeghen P., Ghosh A., Sassa T., et al. Inhibition of SRGAP2 function by its human-specific paralogs induces neoteny during spine maturation. Cell, 2012, 149: 923-935 CrossRef PubMed Google Scholar

[38] Chen K., Wallis J.W., McLellan M.D., Larson D.E., Kalicki J.M., Pohl C.S., McGrath S.D., Wendl M.C., Zhang Q., Locke D.P., et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat Methods, 2009, 6: 677-681 CrossRef PubMed Google Scholar

[39] Chen S., Krinsky B.H., Long M.. New genes as drivers of phenotypic evolution. Nat Rev Genet, 2013, 14: 645-660 CrossRef PubMed Google Scholar

[40] Chin C.S., Alexander D.H., Marks P., Klammer A.A., Drake J., Heiner C., Clum A., Copeland A., Huddleston J., Eichler E.E., et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods, 2013, 10: 563-569 CrossRef PubMed Google Scholar

[41] Chin C.S., Peluso P., Sedlazeck F.J., Nattestad M., Concepcion G.T., Clum A., Dunn C., O’Malley R., Figueroa-Balderas R., Morales-Cruz A., et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods, 2016, 13: 1050-1054 CrossRef PubMed Google Scholar

[42] Chung H., Bogwitz M.R., McCart C., Andrianopoulos A., Ffrench-Constant R.H., Batterham P., Daborn P.J.. Cis-regulatory elements in the Accord retrotransposon result in tissue-specific expression of the Drosophila melanogaster insecticide resistance gene Cyp6g1. Genetics, 2007, 175: 1071-1077 CrossRef PubMed Google Scholar

[43] Church D.M., Goodstadt L., Hillier L.W., Zody M.C., Goldstein S., She X., Bult C.J., Agarwala R., Cherry J.L., DiCuccio M., et al. Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biol, 2009, 7: e1000112 CrossRef PubMed Google Scholar

[44] Clarke J., Wu H.C., Jayasinghe L., Patel A., Reid S., Bayley H.. Continuous base identification for single-molecule nanopore DNA sequencing. Nat Nanotech, 2009, 4: 265-270 CrossRef PubMed ADS Google Scholar

[45] Clifton B.D., Librado P., Yeh S.D., Solares E.S., Real D.A., Jayasekera S.U., Zhang W., Shi M., Park R.V., Magie R.D., et al. Rapid functional and sequence differentiation of a tandemly repeated species-specific multigene family in Drosophila. Mol Biol Evol, 2017, 34: 51-65 CrossRef PubMed Google Scholar

[46] Conrad D.F., Hurles M.E.. The population genetics of structural variation. Nat Genet, 2007, 39: S30-S36 CrossRef PubMed Google Scholar

[47] Conrad D.F., Pinto D., Redon R., Feuk L., Gokcumen O., Zhang Y., Aerts J., Andrews T.D., Barnes C., Campbell P., et al. Origins and functional impact of copy number variation in the human genome. Nature, 2010, 464: 704-712 CrossRef PubMed ADS Google Scholar

[48] C. elegans Sequencing Consortium. (1998). Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282, 2012–2018. CrossRef Google Scholar

[49] Deng C., Cheng C.H.C., Ye H., He X., Chen L.. Evolution of an antifreeze protein by neofunctionalization under escape from adaptive conflict. Proc Natl Acad Sci USA, 2010, 107: 21593-21598 CrossRef PubMed ADS Google Scholar

[50] Dennis M.Y., Harshman L., Nelson B.J., Penn O., Cantsilieris S., Huddleston J., Antonacci F., Penewit K., Denman L., Raja A., et al. The evolution and population diversity of human-specific segmental duplications. Nat ecol evol, 2017, 1: 0069 CrossRef PubMed Google Scholar

[51] Dennis M.Y., Nuttle X., Sudmant P.H., Antonacci F., Graves T.A., Nefedov M., Rosenfeld J.A., Sajjadian S., Malig M., Kotkiewicz H., et al. Evolution of human-specific neural SRGAP2 genes by incomplete segmental duplication. Cell, 2012, 149: 912-922 CrossRef PubMed Google Scholar

[52] Des Marais D.L., Rausher M.D.. Escape from adaptive conflict after duplication in an anthocyanin pathway gene. Nature, 2008, 454: 762-765 CrossRef PubMed ADS Google Scholar

[53] Dopman E.B., Hartl D.L.. A portrait of copy-number polymorphism in Drosophila melanogaster. Proc Natl Acad Sci USA, 2007, 104: 19920-19925 CrossRef PubMed ADS Google Scholar

[54] Dujon B.. Yeast evolutionary genomics. Nat Rev Genet, 2010, 11: 512-524 CrossRef PubMed Google Scholar

[55] Dunn B., Richter C., Kvitek D.J., Pugh T., Sherlock G.. Analysis of the Saccharomyces cerevisiae pan-genome reveals a pool of copy number variants distributed in diverse yeast strains from differing industrial environments. Genome Res, 2012, 22: 908-924 CrossRef PubMed Google Scholar

[56] Earl D., Bradnam K., St. John J., Darling A., Lin D., Fass J., Yu H.O.K., Buffalo V., Zerbino D.R., Diekhans M., et al. Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Res, 2011, 21: 2224-2241 CrossRef PubMed Google Scholar

[57] Eid J., Fehr A., Gray J., Luong K., Lyle J., Otto G., Peluso P., Rank D., Baybayan P., Bettman B., et al. Real-time DNA sequencing from single polymerase molecules. Science, 2009, 323: 133-138 CrossRef PubMed ADS Google Scholar

[58] Eirin-Lopez, J.M., Rebordinos, L., Rooney, A.P., and Rozas, J. (2012). The birth-and-death evolution of multigene families revisited. Genome Dynam 7, 170–196. Google Scholar

[59] Emerson J.J., Cardoso-Moreira M., Borevitz J.O., Long M.. Natural selection shapes genome-wide patterns of copy-number polymorphism in Drosophila melanogaster. Science, 2008, 320: 1629-1631 CrossRef PubMed ADS Google Scholar

[60] Ersfeld, K. (2004). Fiber-FISH: fluorescence in situ hybridization on stretched DNA. Methods Mol Biol 270, 395-402. Google Scholar

[61] Faucon F., Dusfour I., Gaude T., Navratil V., Boyer F., Chandre F., Sirisopa P., Thanispong K., Juntarajumnong W., Poupardin R., et al. Identifying genomic changes associated with insecticide resistance in the dengue mosquito Aedes aegypti by deep targeted sequencing. Genome Res, 2015, 25: 1347-1359 CrossRef PubMed Google Scholar

[62] Fawcett J.A., Innan H.. Spreading good news. eLife, 2015, 4: e07108 CrossRef PubMed Google Scholar

[63] Feyereisen R., Dermauw W., Van Leeuwen T.. Genotype to phenotype, the molecular and physiological dimensions of resistance in arthropods. Pesticide Biochem Physiol, 2015, 121: 61-77 CrossRef PubMed Google Scholar

[64] Fiddes I.T., Lodewijk G.A., Mooring M., Bosworth C.M., Ewing A.D., Mantalas G.L., Novak A.M., van den Bout A., Bishara A., Rosenkrantz J.L., et al. Human-specific NOTCH2NL genes affect Notch signaling and cortical neurogenesis. Cell, 2018, 173: 1356-1369.e22 CrossRef PubMed Google Scholar

[65] Florio M., Albert M., Taverna E., Namba T., Brandl H., Lewitus E., Haffner C., Sykes A., Wong F.K., Peters J., et al. Human-specific gene ARHGAP11B promotes basal progenitor amplification and neocortex expansion. Science, 2015, 347: 1465-1470 CrossRef PubMed ADS Google Scholar

[66] Force, A., Lynch, M., Pickett, F.B., Amores, A., Yan, Y.L., and Postlethwait, J. (1999). Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151, 1531–1545. Google Scholar

[67] Francino M.P.. An adaptive radiation model for the origin of new gene functions. Nat Genet, 2005, 37: 573-578 CrossRef PubMed Google Scholar

[68] Gabrieli T., Sharim H., Fridman D., Arbib N., Michaeli Y., Ebenstein Y.. Selective nanopore sequencing of human BRCA1 by Cas9-assisted targeting of chromosome segments (CATCH). Nucleic Acids Res, 2018, 46: e87 CrossRef PubMed Google Scholar

[69] Gao L.Z., Innan H.. Very low gene duplication rate in the yeast genome. Science, 2004, 306: 1367-1370 CrossRef PubMed ADS Google Scholar

[70] Gnerre S., Maccallum I., Przybylski D., Ribeiro F.J., Burton J.N., Walker B.J., Sharpe T., Hall G., Shea T.P., Sykes S., et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci USA, 2011, 108: 1513-1518 CrossRef PubMed ADS Google Scholar

[71] Golicz A.A., Batley J., Edwards D.. Towards plant pangenomics. Plant Biotechnol J, 2016, 14: 1099-1105 CrossRef PubMed Google Scholar

[72] Green P.. Against a whole-genome shotgun. Genome Res, 1997, 7: 410-417 CrossRef Google Scholar

[73] Gu W., Zhang F., Lupski J.R.. Mechanisms for human genomic rearrangements. PathoGenetics, 2008, 1: 4 CrossRef PubMed Google Scholar

[74] Gu Z., Steinmetz L.M., Gu X., Scharfe C., Davis R.W., Li W.H.. Role of duplicate genes in genetic robustness against null mutations. Nature, 2003, 421: 63-66 CrossRef PubMed ADS Google Scholar

[75] Guillemaud, T., Lenormand, T., Bourguet, D., Chevillon, C., Pateur, N., and Raymond, M. (1998). Evolution of resistance in Culex pipiens: allele replacemente and changing environment. Evolution 52, 443–453. Google Scholar

[76] Gurevich A., Saveliev V., Vyahhi N., Tesler G.. QUAST: quality assessment tool for genome assemblies. Bioinformatics, 2013, 29: 1072-1075 CrossRef PubMed Google Scholar

[77] Hahn M.W.. Distinguishing among evolutionary models for the maintenance of gene duplicates. J Hered, 2009, 100: 605-617 CrossRef PubMed Google Scholar

[78] Hahn M.W., Han M.V., Han S.G.. Gene family evolution across 12 Drosophila genomes. PLoS Genet, 2007, 3: e197 CrossRef PubMed Google Scholar

[79] Harewood, L., Chaignat, E., and Alexandre., R. (2012). Structural variation and its effects on expresson. Methods Mol Biol 838, 173-186. Google Scholar

[80] Hastings P.J., Lupski J.R., Rosenberg S.M., Ira G.. Mechanisms of change in gene copy number. Nat Rev Genet, 2009, 10: 551-564 CrossRef PubMed Google Scholar

[81] Hemingway J., Hawkes N.J., McCarroll L., Ranson H.. The molecular basis of insecticide resistance in mosquitoes. Insect Biochem Mol Biol, 2004, 34: 653-665 CrossRef PubMed Google Scholar

[82] Hendrickson H., Slechta E.S., Bergthorsson U., Andersson D.I., Roth J.R.. Amplification-mutagenesis: Evidence that “directed” adaptive mutation and general hypermutability result from growth with a selected gene amplification. Proc Natl Acad Sci USA, 2002, 99: 2164-2169 CrossRef PubMed ADS Google Scholar

[83] Hindson B.J., Ness K.D., Masquelier D.A., Belgrader P., Heredia N.J., Makarewicz A.J., Bright I.J., Lucero M.Y., Hiddessen A.L., Legler T.C., et al. High-throughput droplet digital PCR system for absolute quantitation of DNA copy number. Anal Chem, 2011, 83: 8604-8610 CrossRef PubMed Google Scholar

[84] Hollox E.J.. Copy number variation of beta-defensins and relevance to disease. Cytogenet Genome Res, 2008, 123: 148-155 CrossRef PubMed Google Scholar

[85] Hollox, E.J. (2012). The challenges of studying complex and dynamic regions of the human genome. Methods Mol Biol 838, 187–207. Google Scholar

[86] Hollox, E.J., and Abujaber, R. (2017). Evolution and diversity of defensins in vertebrates. In Evolutionary Biology: Self/Nonself Evolution, Species and Complex Traits Evolution, Methods and Concepts, P. Pontarotti, ed. (Cham, Switzerland: Springer), pp. 27–50. Google Scholar

[87] Hollox E.J., Barber J.C.K., Brookes A.J., Armour J.A.L.. Defensins and the dynamic genome: What we can learn from structural variation at human chromosome band 8p23.1. Genome Res, 2008a, 18: 1686-1697 CrossRef PubMed Google Scholar

[88] Hollox E.J., Huffmeier U., Zeeuwen P.L.J.M., Palla R., Lascorz J., Rodijk-Olthuis D., van de Kerkhof P.C.M., Traupe H., de Jongh G., den Heijer M., et al. Psoriasis is associated with increased β-defensin genomic copy number. Nat Genet, 2008b, 40: 23-25 CrossRef PubMed Google Scholar

[89] Hormozdiari F., Alkan C., Eichler E.E., Sahinalp S.C.. Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Res, 2009, 19: 1270-1278 CrossRef PubMed Google Scholar

[90] Hoskins R.A., Carlson J.W., Kennedy C., Acevedo D., Evans-Holm M., Frise E., Wan K.H., Park S., Mendez-Lago M., Rossi F., et al. Sequence finishing and mapping of Drosophila melanogaster heterochromatin. Science, 2007, 316: 1625-1628 CrossRef PubMed ADS Google Scholar

[91] Huddleston J., Eichler E.E.. An incomplete understanding of human genetic variation. Genetics, 2016, 202: 1251-1254 CrossRef PubMed Google Scholar

[92] Huddleston J., Ranade S., Malig M., Antonacci F., Chaisson M., Hon L., Sudmant P.H., Graves T.A., Alkan C., Dennis M.Y., et al. Reconstructing complex regions of genomes using long-read sequencing technology. Genome Res, 2014, 24: 688-696 CrossRef PubMed Google Scholar

[93] Hughes A.L.. The evolution of functionally novel proteins after gene duplication. Proc R Soc Lond B, 1994, 256: 119-124 CrossRef PubMed Google Scholar

[94] Innan H.. A two-locus gene conversion model with selection and its application to the human RHCE and RHD genes. Proc Natl Acad Sci USA, 2003, 100: 8793-8798 CrossRef PubMed ADS Google Scholar

[95] Innan H.. Population genetic models of duplicated genes. Genetica, 2009, 137: 19-37 CrossRef PubMed Google Scholar

[96] Innan H., Kondrashov F.. The evolution of gene duplications: classifying and distinguishing between models. Nat Rev Genet, 2010, 11: 97-108 CrossRef PubMed Google Scholar

[97] Iqbal Z., Caccamo M., Turner I., Flicek P., McVean G.. De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat Genet, 2012, 44: 226-232 CrossRef PubMed Google Scholar

[98] Istrail S., Sutton G.G., Florea L., Halpern A.L., Mobarry C.M., Lippert R., Walenz B., Shatkay H., Dew I., Miller J.R., et al. Whole-genome shotgun assembly and comparison of human genome assemblies. Proc Natl Acad Sci USA, 2004, 101: 1916-1921 CrossRef PubMed ADS Google Scholar

[99] James C.P., Bajaj-Elliott M., Abujaber R., Forya F., Klein N., David A.L., Hollox E.J., Peebles D.M.. Human beta defensin (HBD) gene copy number affects HBD2 protein levels: impact on cervical bactericidal immunity in pregnancy. Eur J Hum Genet, 2018, 26: 434-439 CrossRef PubMed Google Scholar

[100] Jayaswal V., Jimenez J., Magie R., Nguyen K., Clifton B., Yeh S., Ranz J.M.. A species-specific multigene family mediates differential sperm displacement in Drosophila melanogaster. Evolution, 2018, 72: 399-403 CrossRef PubMed Google Scholar

[101] Jiang W., Johnson C., Jayaraman J., Simecek N., Noble J., Moffatt M.F., Cookson W.O., Trowsdale J., Traherne J.A.. Copy number variation leads to considerable diversity for B but not A haplotypes of the human KIR genes encoding NK cell receptors. Genome Res, 2012a, 22: 1845-1854 CrossRef PubMed Google Scholar

[102] Jiang W., Johnson C., Simecek N., López-Álvarez M.R., Di D., Trowsdale J., Traherne J.A.. qKAT: a high-throughput qPCR method for KIR gene copy number and haplotype determination. Genome Med, 2016, 8: 99 CrossRef PubMed Google Scholar

[103] Jiang W., Zhao X., Gabrieli T., Lou C., Ebenstein Y., Zhu T.F.. Cas9-assisted targeting of chromosome segments CATCH enables one-step targeted cloning of large gene clusters. Nat Commun, 2015, 6: 8101 CrossRef PubMed ADS Google Scholar

[104] Jiang Y., Wang Y., Brudno M.. PRISM: pair-read informed split-read mapping for base-pair level detection of insertion, deletion and structural variants. Bioinformatics, 2012b, 28: 2576-2583 CrossRef PubMed Google Scholar

[105] Jugulam M., Niehues K., Godar A.S., Koo D.H., Danilova T., Friebe B., Sehgal S., Varanasi V.K., Wiersma A., Westra P., et al. Tandem amplification of a chromosomal segment harboring 5-enolpyruvylshikimate-3-phosphate synthase locus confers glyphosate resistance in Kochia scoparia. Plant Physiol, 2014, 166: 1200-1207 CrossRef PubMed Google Scholar

[106] Kaessmann H.. Origins, evolution, and phenotypic impact of new genes. Genome Res, 2010, 20: 1313-1326 CrossRef PubMed Google Scholar

[107] Kajitani R., Toshimoto K., Noguchi H., Toyoda A., Ogura Y., Okuno M., Yabana M., Harada M., Nagayasu E., Maruyama H., et al. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res, 2014, 24: 1384-1395 CrossRef PubMed Google Scholar

[108] Katju V.. In with the old, in with the new: the promiscuity of the duplication process engenders diverse pathways for novel gene creation. Int J Evol Biol, 2012, 2012(2): 1-24 CrossRef PubMed Google Scholar

[109] Katju V., Bergthorsson U.. Copy-number changes in evolution: rates, fitness effects and adaptive significance. Front Genet, 2013, 4: 273 CrossRef Google Scholar

[110] Kondrashov, F.A. (2010). Gene dosage and duplication. In Evolution after Gene Duplication, K. Dittmar, and D. Liberles, ed. (Wiley-Blackwell), pp. 57–76. Google Scholar

[111] Kondrashov F.A.. Gene duplication as a mechanism of genomic adaptation to a changing environment. Proc R Soc B-Biol Sci, 2012, 279: 5048-5057 CrossRef PubMed Google Scholar

[112] Korbel J.O., Abyzov A., Mu X.J., Carriero N., Cayting P., Zhang Z., Snyder M., Gerstein M.B.. PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome Biol, 2009, 10: R23 CrossRef PubMed Google Scholar

[113] Korbel J.O., Urban A.E., Affourtit J.P., Godwin B., Grubert F., Simons J.F., Kim P.M., Palejev D., Carriero N.J., Du L., et al. Paired-end mapping reveals extensive structural variation in the human genome. Science, 2007, 318: 420-426 CrossRef PubMed ADS Google Scholar

[114] Koren S., Schatz M.C., Walenz B.P., Martin J., Howard J.T., Ganapathy G., Wang Z., Rasko D.A., McCombie W.R., Jarvis E.D., et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat Biotechnol, 2012, 30: 693-700 CrossRef PubMed Google Scholar

[115] Koren S., Walenz B.P., Berlin K., Miller J.R., Bergman N.H., Phillippy A.M.. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res, 2017, 27: 722-736 CrossRef PubMed Google Scholar

[116] Krsticevic F.J., Schrago C.G., Carvalho A.B.. Long-read single molecule sequencing to resolve tandem gene copies: The Mst77Y region on the Drosophila melanogaster Y chromosome. G3, 2015, 5: 1145-1150 CrossRef PubMed Google Scholar

[117] Kulathinal, R.J., Sawyer, S.A., Bustamante, C.D., Nurminsky, D., Ponce, R., Ranz, J.M., and Hartl, D.L. (2004). Selective sweep in the evolution of a new sperm-specific gene in Drosophila. In Selective Sweep, D. Nurminsky, ed. (Austin, Texas: Kluwer Academic/Plenum Publishers), pp. 1–12. Google Scholar

[118] Kurtz S., Phillippy A., Delcher A.L., Smoot M., Shumway M., Antonescu C., Salzberg S.L.. Versatile and open software for comparing large genomes. Genome Biol, 2004, 5: R12 CrossRef PubMed Google Scholar

[119] Labbé P., Berthomieu A., Berticat C., Alout H., Raymond M., Lenormand T., Weill M.. Independent duplications of the acetylcholinesterase gene conferring insecticide resistance in the mosquito Culex pipiens. Mol Biol Evol, 2007, 24: 1056-1067 CrossRef PubMed Google Scholar

[120] Lander E.S., Linton L.M., Birren B., Nusbaum C., Zody M.C., Baldwin J., Devon K., Dewar K., Doyle M., FitzHugh W., et al. Initial sequencing and analysis of the human genome. Nature, 2001, 409: 860-921 CrossRef PubMed Google Scholar

[121] Layer R.M., Chiang C., Quinlan A.R., Hall I.M.. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol, 2014, 15: R84 CrossRef PubMed Google Scholar

[122] Li H.. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics, 2016, 32: 2103-2110 CrossRef PubMed Google Scholar

[123] Lin Y., Yuan J., Kolmogorov M., Shen M.W., Chaisson M., Pevzner P.A.. Assembly of long error-prone reads using de Bruijn graphs. Proc Natl Acad Sci USA, 2016, 113: E8396-E8405 CrossRef PubMed Google Scholar

[124] Livak K.J., Schmittgen T.D.. Analysis of relative gene expression data using real-time quantitative PCR and the 2−ΔΔCT method. Methods, 2001, 25: 402-408 CrossRef PubMed Google Scholar

[125] Long M., VanKuren N.W., Chen S., Vibranovski M.D.. New gene evolution: little did we know. Annu Rev Genet, 2013, 47: 307-333 CrossRef PubMed Google Scholar

[126] Luo R., Liu B., Xie Y., Li Z., Huang W., Yuan J., He G., Chen Y., Pan Q., Liu Y., et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience, 2012, 1: 18 CrossRef PubMed Google Scholar

[127] Lupski J.R.. Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits. Trends Genets, 1998, 14: 417-422 CrossRef Google Scholar

[128] Lupski J.R., Stankiewicz P.. Genomic disorders: Molecular mechanisms for rearrangements and conveyed phenotypes. PLoS Genet, 2005, 1: e49-633 CrossRef PubMed Google Scholar

[129] Mardis E.R.. Next-generation sequencing platforms. Annu Rev Anal Chem, 2013, 6: 287-303 CrossRef PubMed ADS Google Scholar

[130] Margulies M., Egholm M., Altman W.E., Attiya S., Bader J.S., Bemben L.A., Berka J., Braverman M.S., Chen Y.J., Chen Z., et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature, 2005, 437: 376-380 CrossRef PubMed ADS Google Scholar

[131] Marques-Bonet T., Girirajan S., Eichler E.E.. The origins and impact of primate segmental duplications. Trends Genets, 2009, 25: 443-454 CrossRef PubMed Google Scholar

[132] Martin M.P., Bashirova A., Traherne J., Trowsdale J., Carrington M.. Cutting edge: Expansion of the KIR locus by unequal crossing over. J Immunol, 2003, 171: 2192-2195 CrossRef Google Scholar

[133] Martins W.F.S., Subramaniam K., Steen K., Mawejje H., Liloglou T., Donnelly M.J., Wilding C.S.. Detection and quantitation of copy number variation in the voltage-gated sodium channel gene of the mosquito Culex quinquefasciatus. Sci Rep, 2017, 7: 5821 CrossRef PubMed ADS Google Scholar

[134] McCoy R.C., Taylor R.W., Blauwkamp T.A., Kelley J.L., Kertesz M., Pushkarev D., Petrov D.A., Fiston-Lavier A.S.. Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements. PLoS ONE, 2014, 9: e106689 CrossRef PubMed ADS Google Scholar

[135] McKernan K.J., Peckham H.E., Costa G.L., McLaughlin S.F., Fu Y., Tsung E.F., Clouser C.R., Duncan C., Ichikawa J.K., Lee C.C., et al. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res, 2009, 19: 1527-1541 CrossRef PubMed Google Scholar

[136] Medvedev P., Stanciu M., Brudno M.. Computational methods for discovering structural variation with next-generation sequencing. Nat Methods, 2009, 6: S13-S20 CrossRef PubMed Google Scholar

[137] Miller J.R., Zhou P., Mudge J., Gurtowski J., Lee H., Ramaraj T., Walenz B.P., Liu J., Stupar R.M., Denny R., et al. Hybrid assembly with long and short reads improves discovery of gene family expansions. BMC Genomics, 2017, 18: 541 CrossRef PubMed Google Scholar

[138] Mohajeri K., Cantsilieris S., Huddleston J., Nelson B.J., Coe B.P., Campbell C.D., Baker C., Harshman L., Munson K.M., Kronenberg Z.N., et al. Interchromosomal core duplicons drive both evolutionary instability and disease susceptibility of the Chromosome 8p23.1 region. Genome Res, 2016, 26: 1453-1467 CrossRef PubMed Google Scholar

[139] Mouches C., Pasteur N., Berge J.B., Hyrien O., Raymond M., de Saint Vincent B.R., de Silvestri M., Georghiou G.P.. Amplification of an esterase gene is responsible for insecticide resistance in a California Culex mosquito. Science, 1986, 233: 778-780 CrossRef ADS Google Scholar

[140] , Waterston R.H., Lindblad-Toh K., Birney E., Rogers J., Abril J.F., Agarwal P., Agarwala R., Ainscough R., Alexandersson M., et al. Initial sequencing and comparative analysis of the mouse genome. Nature, 2002, 420: 520-562 CrossRef PubMed ADS Google Scholar

[141] Myers E.W., Sutton G.G., Delcher A.L., Dew I.M., Fasulo D.P., Flanigan M.J., Kravitz S.A., Mobarry C.M., Reinert K.H.J., Remington K.A., et al. A whole-genome assembly of Drosophila. Science, 2000, 287: 2196-2204 CrossRef ADS Google Scholar

[142] Nagylaki, T., and Petes, T.D. (1982). Intrachromosomal gene conversion and the maintenance of sequence homogeneity among repeated genes. Genetics 100, 315-337. Google Scholar

[143] Näsvall J., Sun L., Roth J.R., Andersson D.I.. Real-time evolution of new genes by innovation, amplification, and divergence. Science, 2012, 338: 384-387 CrossRef PubMed ADS Google Scholar

[144] Nei M., Rooney A.P.. Concerted and birth-and-death evolution of multigene families. Annu Rev Genet, 2005, 39: 121-152 CrossRef Google Scholar

[145] Nguyen D.Q., Webber C., Hehir-Kwa J., Pfundt R., Veltman J., Ponting C.P.. Reduced purifying selection prevails over positive selection in human copy number variant evolution. Genome Res, 2008, 18: 1711-1723 CrossRef PubMed Google Scholar

[146] Nguyen H.T., Boocock J., Merriman T.R., Black M.A.. SRBreak: A read-depth and split-read framework to identify breakpoints of different events inside simple copy-number variable regions. Front Genet, 2016, 7: 160 CrossRef Google Scholar

[147] Nijkamp J.F., van den Broek M.A., Geertman J.M.A., Reinders M.J.T., Daran J.M.G., de Ridder D.. De novo detection of copy number variation by co-assembly. Bioinformatics, 2012, 28: 3195-3202 CrossRef PubMed Google Scholar

[148] Nurminsky D., De Aguiar D., Bustamante C.D., Hartl D.L.. Chromosomal effects of rapid gene evolution in Drosophila melanogaster. Science, 2001, 291: 128-130 CrossRef PubMed ADS Google Scholar

[149] Nurminsky D.I., Nurminskaya M.V., De Aguiar D., Hartl D.L.. Selective sweep of a newly evolved sperm-specific gene in Drosophila. Nature, 1998, 396: 572-575 CrossRef PubMed ADS Google Scholar

[150] Nuttle X., Giannuzzi G., Duyzend M.H., Schraiber J.G., Narvaiza I., Sudmant P.H., Penn O., Chiatante G., Malig M., Huddleston J., et al. Emergence of a Homo sapiens-specific gene family and chromosome 16p11.2 CNV susceptibility. Nature, 2016, 536: 205-209 CrossRef PubMed ADS Google Scholar

[151] Nuttle X., Huddleston J., O'Roak B.J., Antonacci F., Fichera M., Romano C., Shendure J., Eichler E.E.. Rapid and accurate large-scale genotyping of duplicated genes and discovery of interlocus gene conversions. Nat Meth, 2013, 10: 903-909 CrossRef PubMed Google Scholar

[152] O’Roak B.J., Vives L., Fu W., Egertson J.D., Stanaway I.B., Phelps I.G., Carvill G., Kumar A., Lee C., Ankenman K., et al. Multiplex targeted sequencing identifies recurrently mutated genes in autism spectrum disorders. Science, 2012, 338: 1619-1622 CrossRef PubMed ADS Google Scholar

[153] Obbard D.J., Maclennan J., Kim K.W., Rambaut A., O’Grady P.M., Jiggins F.M.. Estimating divergence dates and substitution rates in the Drosophila phylogeny. Mol Biol Evol, 2012, 29: 3459-3473 CrossRef PubMed Google Scholar

[154] Ohno, S. (1970). Evolution by Gene Duplication (New York: Springer-Verlag). Google Scholar

[155] Ohta T.. Allelic and nonallelic homology of a supergene family.. Proc Natl Acad Sci USA, 1982, 79: 3251-3254 CrossRef ADS Google Scholar

[156] Osada N., Innan H.. Duplication and gene conversion in the Drosophila melanogaster genome. PLoS Genet, 2008, 4: e1000305 CrossRef PubMed Google Scholar

[157] Owen R.P., Sangkuhl K., Klein T.E., Altman R.B.. Cytochrome P450 2D6. Pharmacogenet Genomics, 2009, 19: 559-562 CrossRef PubMed Google Scholar

[158] Parham, P. (2005). Influence of KIR diversity on human immunity. Adv Exp Med Biol 560, 4750. Google Scholar

[159] Parham P., Norman P.J., Abi-Rached L., Guethlein L.A.. Human-specific evolution of killer cell immunoglobulin-like receptor recognition of major histocompatibility complex class I molecules. Philos Trans R Soc B-Biol Sci, 2012, 367: 800-811 CrossRef PubMed Google Scholar

[160] Parra G., Bradnam K., Korf I.. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics, 2007, 23: 1061-1067 CrossRef PubMed Google Scholar

[161] Perry G.H., Dominy N.J., Claw K.G., Lee A.S., Fiegler H., Redon R., Werner J., Villanea F.A., Mountain J.L., Misra R., et al. Diet and the evolution of human amylase gene copy number variation. Nat Genet, 2007, 39: 1256-1260 CrossRef PubMed Google Scholar

[162] Pillai S., Gopalan V., Lam A.K.Y.. Review of sequencing platforms and their applications in phaeochromocytoma and paragangliomas. Crit Rev Oncol/Hematol, 2017, 116: 58-67 CrossRef PubMed Google Scholar

[163] Pinheiro L.B., Coleman V.A., Hindson C.M., Herrmann J., Hindson B.J., Bhat S., Emslie K.R.. Evaluation of a droplet digital polymerase chain reaction format for DNA copy number quantification. Anal Chem, 2012, 84: 1003-1011 CrossRef PubMed Google Scholar

[164] Pirooznia M., Goes F.S., Zandi P.P.. Whole-genome CNV analysis: advances in computational approaches. Front Genet, 2015, 06: 138 CrossRef Google Scholar

[165] Ponce R., Hartl D.L.. The evolution of the novel Sdic gene cluster in Drosophila melanogaster. Gene, 2006, 376: 174-183 CrossRef PubMed Google Scholar

[166] Ponchel F., Toomes C., Bransfield K., Leong F.T., Douglas S.H., Field S.L., Bell S.M., Combaret V., Puisieux A., Mighell A.J., et al. Real-time PCR based on SYBR-Green I fluorescence: an alternative to the TaqMan assay for a relative quantification of gene rearrangements, gene amplifications and micro gene deletions.. BMC Biotechnol, 2003, 3: 18 CrossRef PubMed Google Scholar

[167] Pyo C.W., Wang R., Vu Q., Cereb N., Yang S.Y., Duh F.M., Wolinsky S., Martin M.P., Carrington M., Geraghty D.E.. Recombinant structures expand and contract inter and intragenic diversification at the KIR locus. BMC Genomics, 2013, 14: 89 CrossRef PubMed Google Scholar

[168] Ranz J.M., Parsch J.. Newly evolved genes: moving from comparative genomics to functional studies in model systems. Bioessays, 2012, 34: 477-483 CrossRef PubMed Google Scholar

[169] Ranz J.M., Ponce A.R., Hartl D.L., Nurminsky D.. Origin and evolution of a new gene expressed in the Drosophila sperm axoneme. Genetica, 2003, 118: 233-244 CrossRef Google Scholar

[170] Rausch T., Zichner T., Schlattl A., Stütz A.M., Benes V., Korbel J.O.. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics, 2012, 28: i333-i339 CrossRef PubMed Google Scholar

[171] Raymond M., Poulin E., Boiroux V., Dupont E., Pasteur N.. Stability of insecticide resistance due to amplification of esterase genes in Culex pipiens. Heredity, 1993, 70: 301-307 CrossRef Google Scholar

[172] Redon R., Ishikawa S., Fitch K.R., Feuk L., Perry G.H., Andrews T.D., Fiegler H., Shapero M.H., Carson A.R., Chen W., et al. Global variation in copy number in the human genome. Nature, 2006, 444: 444-454 CrossRef PubMed ADS Google Scholar

[173] Reisner W., Larsen N.B., Silahtaroglu A., Kristensen A., Tommerup N., Tegenfeldt J.O., Flyvbjerg H.. Single-molecule denaturation mapping of DNA in nanofluidic channels. Proc Natl Acad Sci USA, 2010, 107: 13294-13299 CrossRef PubMed ADS Google Scholar

[174] Remnant E.J., Good R.T., Schmidt J.M., Lumb C., Robin C., Daborn P.J., Batterham P.. Gene duplication in the major insecticide target site, Rdl, in Drosophila melanogaster. Proc Natl Acad Sci USA, 2013, 110: 14705-14710 CrossRef PubMed ADS Google Scholar

[175] Ritz A., Bashir A., Sindi S., Hsu D., Hajirasouliha I., Raphael B.J.. Characterization of structural variants with single molecule and hybrid sequencing approaches. Bioinformatics, 2014, 30: 3458-3466 CrossRef PubMed Google Scholar

[176] Rodrigo G., Fares M.A.. Intrinsic adaptive value and early fate of gene duplication revealed by a bottom-up approach. eLife, 2018, 7: e29739 CrossRef PubMed Google Scholar

[177] Rogers R.L., Bedford T., Hartl D.L.. Formation and longevity of chimeric and duplicate genes in Drosophila melanogaster. Genetics, 2009, 181: 313-322 CrossRef PubMed Google Scholar

[178] Rogers R.L., Cridland J.M., Shao L., Hu T.T., Andolfatto P., Thornton K.R.. Landscape of standing variation for tandem duplications in Drosophila yakuba and Drosophila simulans. Mol Biol Evol, 2014, 31: 1750-1766 CrossRef PubMed Google Scholar

[179] Salzberg S.L., Phillippy A.M., Zimin A., Puiu D., Magoc T., Koren S., Treangen T.J., Schatz M.C., Delcher A.L., Roberts M., et al. GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res, 2012, 22: 557-567 CrossRef PubMed Google Scholar

[180] Sedlazeck F.J., Rescheneder P., Smolka M., Fang H., Nattestad M., von Haeseler A., Schatz M.C.. Accurate detection of complex structural variations using single-molecule sequencing. Nat Methods, 2018, 15: 461-468 CrossRef PubMed Google Scholar

[181] She X., Liu G., Ventura M., Zhao S., Misceo D., Roberto R., Cardone M.F., Rocchi M., Rocchi M., Green E.D., et al. A preliminary comparative analysis of primate segmental duplications shows elevated substitution rates and a great-ape expansion of intrachromosomal duplications. Genome Res, 2006, 16: 576-583 CrossRef PubMed Google Scholar

[182] Simão F.A., Waterhouse R.M., Ioannidis P., Kriventseva E.V., Zdobnov E.M.. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics, 2015, 31: 3210-3212 CrossRef PubMed Google Scholar

[183] Sindi S.S., Onal S., Peng L.C., Wu H.T., Raphael B.J.. An integrative probabilistic model for identification of structural variation in sequencing data. Genome Biol, 2012, 13: R22 CrossRef PubMed Google Scholar

[184] Spofford J.B.. Heterosis and the evolution of duplications. Am Natist, 1969, 103: 407-432 CrossRef Google Scholar

[185] Stancu M.C., van Roosmalen M.J., Renkens I., Nieboer M.M., Middelkamp S., de Ligt J., Pregno G., Giachino D., Mandrile G., Espejo Valle-Inclan J., et al. Mapping and phasing of structural variation in patient genomes using nanopore sequencing. Nat Commun, 2017, 8: 1326 CrossRef PubMed ADS Google Scholar

[186] Staňková H., Hastie A.R., Chan S., Vrána J., Tulpová Z., Kubaláková M., Visendi P., Hayashi S., Luo M., Batley J., et al. BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes. Plant Biotechnol J, 2016, 14: 1523-1531 CrossRef PubMed Google Scholar

[187] Stranger B.E., Forrest M.S., Dunning M., Ingle C.E., Beazley C., Thorne N., Redon R., Bird C.P., de Grassi A., Lee C., et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science, 2007, 315: 848-853 CrossRef PubMed ADS Google Scholar

[188] Sudmant P.H., Rausch T., Gardner E.J., Handsaker R.E., Abyzov A., Huddleston J., Zhang Y., Ye K., Jun G., Hsi-Yang Fritz M., et al. An integrated map of structural variation in 2,504 human genomes. Nature, 2015, 526: 75-81 CrossRef PubMed ADS Google Scholar

[189] Tettelin H., Masignani V., Cieslewicz M.J., Donati C., Medini D., Ward N.L., Angiuoli S.V., Crabtree J., Jones A.L., Durkin A.S., et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial “pan-genome”. Proc Natl Acad Sci USA, 2005, 102: 13950-13955 CrossRef PubMed ADS Google Scholar

[190] Traherne J.A., Martin M., Ward R., Ohashi M., Pellett F., Gladman D., Middleton D., Carrington M., Trowsdale J.. Mechanisms of copy number variation and hybrid gene formation in the KIR immune gene complex. Human Mol Genets, 2010, 19: 737-751 CrossRef PubMed Google Scholar

[191] Trappe K., Emde A.K., Ehrlich H.C., Reinert K.. Gustaf: Detecting and correctly classifying SVs in the NGS twilight zone. Bioinformatics, 2014, 30: 3484-3490 CrossRef PubMed Google Scholar

[192] Traut W., Rahn I.M., Winking H., Kunze B., Weichenhan D.. Evolution of a 6–200 Mb long-range repeat cluster in the genus Mus. Chromosoma, 2001, 110: 247-252 CrossRef Google Scholar

[193] VanKuren N.W., Long M.. Gene duplicates resolving sexual conflict rapidly evolved essential gametogenesis functions. Nat Ecol Evol, 2018, 2: 705-712 CrossRef PubMed Google Scholar

[194] Veitia R.A.. Exploring the etiology of haploinsufficiency. Bioessays, 2002, 24: 175-184 CrossRef PubMed Google Scholar

[195] Venter J.C., Adams M.D., Myers E.W., Li P.W., Mural R.J., Sutton G.G., Smith H.O., Yandell M., Evans C.A., Holt R.A., et al. The sequence of the human genome. Science, 2001, 291: 1304-1351 CrossRef PubMed ADS Google Scholar

[196] Voskoboynik A., Neff N.F., Sahoo D., Newman A.M., Pushkarev D., Koh W., Passarelli B., Fan H.C., Mantalas G.L., Palmeri K.J., et al. The genome sequence of the colonial chordate, Botryllus schlosseri. eLife, 2013, 2: e00569 CrossRef PubMed Google Scholar

[197] Walker B.J., Abeel T., Shea T., Priest M., Abouelliel A., Sakthikumar S., Cuomo C.A., Zeng Q., Wortman J., Young S.K., et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE, 2014, 9: e112963 CrossRef PubMed ADS Google Scholar

[198] Walsh, J.B. (1987). Sequence-dependent gene conversion: can duplicated genes diverge fast enough to escape conversion? Genetics 117, 543-557. Google Scholar

[199] Weber J.L., Myers E.W.. Human whole-genome shotgun sequencing. Genome Res, 1997, 7: 401-409 CrossRef Google Scholar

[200] Weichenhan D., Kunze B., Winking H., van Geel M., Osoegawa K., de Jong P.J., Traut W.. Source and component genes of a 6–200 Mb gene cluster in the house mouse. Mamm Genome, 2001, 12: 590-594 CrossRef Google Scholar

[201] Wondji C.S., Irving H., Morgan J., Lobo N.F., Collins F.H., Hunt R.H., Coetzee M., Hemingway J., Ranson H.. Two duplicated P450 genes are associated with pyrethroid resistance in Anopheles funestus, a major malaria vector. Genome Res, 2009, 19: 452-459 CrossRef PubMed Google Scholar

[202] Xi R., Hadjipanayis A.G., Luquette L.J., Kim T.M., Lee E., Zhang J., Johnson M.D., Muzny D.M., Wheeler D.A., Gibbs R.A., et al. Copy number variation detection in whole-genome sequencing data using the Bayesian information criterion. Proc Natl Acad Sci USA, 2011, 108: E1128-E1136 CrossRef PubMed ADS Google Scholar

[203] Xie C., Tammi M.T.. CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC BioInf, 2009, 10: 80 CrossRef PubMed Google Scholar

[204] Yao R., Zhang C., Yu T., Li N., Hu X., Wang X., Wang J., Shen Y.. Evaluation of three read-depth based CNV detection tools using whole-exome sequencing data. Mol Cytogenet, 2017, 10: 30 CrossRef PubMed Google Scholar

[205] Ye C., Hill C.M., Wu S., Ruan J., Ma Z.S.. DBG2OLC: Efficient assembly of large genomes using long erroneous reads of the third generation sequencing technologies. Sci Rep, 2016, 6: 31900 CrossRef PubMed ADS Google Scholar

[206] Ye K., Schulz M.H., Long Q., Apweiler R., Ning Z.. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics, 2009, 25: 2865-2871 CrossRef PubMed Google Scholar

[207] Yeh S.D., Do T., Abbassi M., Ranz J.M.. Functional relevance of the newly evolved sperm dynein intermediate chain multigene family in Drosophila melanogaster males. Commun Integrat Biol, 2012a, 5: 462-465 CrossRef PubMed Google Scholar

[208] Yeh S.D., Do T., Chan C., Cordova A., Carranza F., Yamamoto E.A., Abbassi M., Gandasetiawan K.A., Librado P., Damia E., et al. Functional evidence that a recently evolved Drosophila sperm-specific gene boosts sperm competition. Proc Natl Acad Sci USA, 2012b, 109: 2043-2048 CrossRef PubMed ADS Google Scholar

[209] Yoon S., Xuan Z., Makarov V., Ye K., Sebat J.. Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res, 2009, 19: 1586-1592 CrossRef PubMed Google Scholar

[210] Zhang B., Sambono J.L., Morgan J.A.T., Venus B., Rolls P., Lew-Tabor A.E.. An evaluation of quantitative PCR assays (TaqMan® and SYBR Green) for the detection of Babesia bigemina and Babesia bovis, and a novel fluorescent-ITS1-PCR capillary electrophoresis method for genotyping B. bovis isolates. Vet Sci, 2016, 3: 23 CrossRef PubMed Google Scholar

[211] Zhang F., Carvalho C.M.B., Lupski J.R.. Complex human chromosomal and genomic rearrangements. Trends Genets, 2009, 25: 298-307 CrossRef PubMed Google Scholar

[212] Zhang J., Wang J., Wu Y.. An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data. BMC BioInf, 2012, 13: S6 CrossRef PubMed Google Scholar

[213] Zhang Z.D., Du J., Lam H., Abyzov A., Urban A.E., Snyder M., Gerstein M.. Identification of genomic indels and structural variations using split reads. BMC Genomics, 2011, 12: 375 CrossRef PubMed Google Scholar

[214] Zhao M., Wang Q., Wang Q., Jia P., Zhao Z.. Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives. BMC BioInf, 2013, 14: S1 CrossRef PubMed Google Scholar

[215] Zhao Q., Feng Q., Lu H., Li Y., Wang A., Tian Q., Zhan Q., Lu Y., Zhang L., Huang T., et al. Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. Nat Genet, 2018, 50: 278-284 CrossRef PubMed Google Scholar

[216] Zhou J., Lemos B., Dopman E.B., Hartl D.L.. Copy-number variation: the balance between gene dosage and expression in Drosophila melanogaster. Genome Biol Evol, 2011, 3: 1014-1024 CrossRef PubMed Google Scholar

[217] Zimin A.V., Marçais G., Puiu D., Roberts M., Salzberg S.L., Yorke J.A.. The MaSuRCA genome assembler. Bioinformatics, 2013, 29: 2669-2677 CrossRef PubMed Google Scholar

[218] Zimin A.V., Puiu D., Luo M.C., Zhu T., Koren S., Marçais G., Yorke J.A., Dvořák J., Salzberg S.L.. Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm. Genome Res, 2017, 27: 787-792 CrossRef PubMed Google Scholar

  • Figure 1

    The Sdic multigene family of D. melanogaster. A, Each Sdic repeat is composed of three parts: a defective version of a TE; a truncated version of the parental gene AnxB10 (AnxB10-like); and the transcriptional unit Sdic (Nurminsky et al., 1998; Ponce and Hartl, 2006; Ranz et al., 2003). Sdic combines one stretch derived from AnxB10, contributing to the promoter region only, and three discontinued stretches from sw that donate to the transcribed region of Sdic. The first exon (*) derives from previously intronic sw sequence. This structure is quite similar across repeats with the exception of the most 3′ fourth of each transcriptional unit. B, The organization of the Sdic region across different releases of the genome assembly for the reference strain ISO-1 as in FlyBase. Sdic CN increased from four to seven in the latest release of the genome assembly. C, Reconstruction of the Sdic region using a genome assembly scaffolded with PacBio sequencing data (Berlin et al., 2015; Clifton et al., 2017). Sdic CN decreased from seven to six compared to the reference assembly. The different Sdic copies are only color-coded in the most reliable reconstruction of the region, denoting the differences they harbor at the nucleotide level.

  • Table 1   Modern genome sequencing technologies

    Sequencing system


    Read lengths

    Single pass Error rate

    Sequencing output




    ABI 3730xl

    Sanger sequencing

    900 bp


    2.76 Mb/run

    Highest accuracy

    Short reads; very low throughput


    454 GS-FLX


    up to 1000 bp


    0.7 Gb/run

    High accuracy;longest SGS reads

    Short reads; low throughput; very high cost

    Life Technologies

    SOLiD 5500xl

    Ligation and two-base coding

    75 bp (fragment)

    75 bp + 35 bp (paired-end)

    up to 60 bp + 60 bp (mate-paired)


    160 Gb/run

    Highest SGSaccuracy

    Very short reads; high cost; poor ability to resolve repetitive regions


    HiSeq 4000

    MiSeq (v3 kit)

    NextSeq 550

    Sequencing by synthesis

    150 bp ×2

    300 bp ×2

    150 bp ×2


    1.3–1.5 Tb/flow cell

    13.2–15 Gb/flow cell

    100–120 Gb/flow cell

    High accuracy; high throughput; lowest cost

    Very short reads; poor ability to resolverepetitive regions

    Tru-Seq Synthetic Long Reads

    Sequencing by synthesis

    150 bp ×2

    ~10 kb synthetic reads


    Dependent on Illumina system

    Provides better alignment of short reads to repeats

    Poor ability to resolve highly similar repeats

    Pacific Biosciences


    Single Molecule Real-Time (SMRT) sequencing

    ~20 kb average

    >100 kb max


    5–10 Gb/SMRT cell

    Longer reads

    Lowest accuracy

    Oxford Nanopore



    10–20 kb average

    ~900 kb max


    10–20 Gb/flow cell

    Longest reads

    Low accuracy

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有