logo

SCIENCE CHINA Life Sciences, Volume 61, Issue 8: 871-884(2018) https://doi.org/10.1007/s11427-018-9360-0

De novo assembly of a Chinese soybean genome

More info
  • ReceivedJun 15, 2018
  • AcceptedJul 5, 2018
  • PublishedJul 27, 2018

Abstract

Soybean was domesticated in China and has become one of the most important oilseed crops. Due to bottlenecks in their introduction and dissemination, soybeans from different geographic areas exhibit extensive genetic diversity. Asia is the largest soybean market; therefore, a high-quality soybean reference genome from this area is critical for soybean research and breeding. Here, we report the de novo assembly and sequence analysis of a Chinese soybean genome for “Zhonghuang 13” by a combination of SMRT, Hi-C and optical mapping data. The assembled genome size is 1.025 Gb with a contig N50 of 3.46 Mb and a scaffold N50 of 51.87 Mb. Comparisons between this genome and the previously reported reference genome (cv. Williams 82) uncovered more than 250,000 structure variations. A total of 52,051 protein coding genes and 36,429 transposable elements were annotated for this genome, and a gene co-expression network including 39,967 genes was also established. This high quality Chinese soybean genome and its sequence analysis will provide valuable information for soybean improvement in the future.


Funded by

the National Natural Science Foundation of China(91531304,31525018,31370266,31788103)

the “Strategic Priority Research Program” of the Chinese Academy of Sciences(XDA08000000)

and the State Key Laboratory of Plant Cell and Chromosome Engineering(PCCE-KF-2017-03)


Acknowledgment

This work was supported by the National Natural Science Foundation of China (91531304, 31525018, 31370266, and 31788103), the “Strategic Priority Research Program” of the Chinese Academy of Sciences (XDA08000000), and the State Key Laboratory of Plant Cell and Chromosome Engineering (PCCE-KF-2017-03).


Interest statement

The author(s) declare that they have no conflict of interest.


Supplement

SUPPORTING INFORMATION

Figure S1 PCR validation for insertion or deletions regions in Gmax_ZH13.

Figure S2 Gene controlling soybean flower color exists allelic difference between Zhonghuang 13 and Williams 82.

Figure S3 High quality ZH13 genome sequence can improve gene identification, an example from GWAS analysis.

Table S1 Data statistics for different sequence types

Table S2 Correspondence between Gmax_ZH13 and Glycine_max_v2.0 annotation genes

Table S3 Assembly comparison between Gmax_ZH13 and other previously released three soybean genomes

Table S4 Detailed chromosome comparison between Gmax_ZH13 and Glycine_max_v2.0

Table S5 Translocation events between Gmax_ZH13 and Glycine_max_v2.0 genome

Table S6 Inversion events between Gmax_ZH13 and Glycine_max_v2.0 genome

Table S7 Translocation & inversion events between Gmax_ZH13 and Glycine_max_v2.0 genome

Table S8 Specifically presence genome regions in Gmax_ZH13 genome

Table S9 Specifically presence genome regions in Glycine_max_v2.0 genome

Table S10 Small insertion regions in Gmax_ZH13 and Glycine_max_v2.0 genome

Table S11 Genes co-expressed with 9 known soybean flower time related genes

Supplemental File 1

Supplemental File 2

Supplemental File 3

The supporting information is available online at http://life.scichina.com and https://link.springer.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.


References

[1] Akdemir K.C., Chin L.. HiCPlotter integrates genomic data with interaction matrices. Genome Biol, 2015, 16: 198 CrossRef PubMed Google Scholar

[2] Badouin H., Gouzy J., Grassa C.J., Murat F., Staton S.E., Cottret L., Lelandais-Brière C., Owens G.L., Carrère S., Mayjonade B., et al. The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution. Nature, 2017, 546: 148-152 CrossRef PubMed ADS Google Scholar

[3] Besemer J., Borodovsky M.. GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res, 2005, 33: W451-W454 CrossRef PubMed Google Scholar

[4] Bickhart D.M., Rosen B.D., Koren S., Sayre B.L., Hastie A.R., Chan S., Lee J., Lam E.T., Liachko I., Sullivan S.T., et al. Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat Genet, 2017, 49: 643-650 CrossRef PubMed Google Scholar

[5] Bolger A.M., Lohse M., Usadel B.. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 2014, 30: 2114-2120 CrossRef PubMed Google Scholar

[6] Burton J.N., Adey A., Patwardhan R.P., Qiu R., Kitzman J.O., Shendure J.. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat Biotechnol, 2013, 31: 1119-1125 CrossRef PubMed Google Scholar

[7] Byrum, J. R., Kinney, A. J., Shoemaker, R. C., and Diers, B. W. (1995). Mapping of the microsomal and plastid omega-3 fatty acid desaturases in soybean [Glycine max (L.) Merr.]. Soybean Genet Newslett 22, 181–184. Google Scholar

[8] Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J., Bealer K., Madden T.L.. BLAST+: architecture and applications. BMC BioInf, 2009, 10: 421 CrossRef PubMed Google Scholar

[9] Carter, T.E., Nelson, R., Sneller, C.H., and Cui, Z. (2004). Soybeans: improvement, production and uses, Third edition (agronomy) (Madison, Wisconsin, USA). Google Scholar

[10] Chaisson M.J., Tesler G.. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC BioInf, 2012, 13: 238 CrossRef PubMed Google Scholar

[11] Chan C., Qi X., Li M.W., Wong F.L., Lam H.M.. Recent developments of genomic research in soybean. J Genets Genomics, 2012, 39: 317-324 CrossRef PubMed Google Scholar

[12] Chen G., Shi T., Shi L.. Characterizing and annotating the genome using RNA-seq data. Sci China Life Sci, 2017, 60: 116-125 CrossRef PubMed Google Scholar

[13] Childs K.L., Davidson R.M., Buell C.R.. Gene coexpression network analysis as a source of functional annotation for rice genes. PLoS ONE, 2011, 6: e22196 CrossRef PubMed ADS Google Scholar

[14] Clavijo B.J., Venturini L., Schudoma C., Accinelli G.G., Kaithakottil G., Wright J., Borrill P., Kettleborough G., Heavens D., Chapman H., et al. An improved assembly and annotation of the allohexaploid wheat genome identifies complete families of agronomic genes and provides genomic evidence for chromosomal translocations. Genome Res, 2017, 27: 885-896 CrossRef PubMed Google Scholar

[15] Contreras-Soto R.I., Mora F., Lazzari F., de Oliveira M.A.R., Scapim C.A., Schuster I.. Genome-wide association mapping for flowering and maturity in tropical soybean: implications for breeding strategies. Breed Sci, 2017, 67: 435-449 CrossRef PubMed Google Scholar

[16] Du H., Yu Y., Ma Y., Gao Q., Cao Y., Chen Z., Ma B., Qi M., Li Y., Zhao X., et al. Sequencing and de novo assembly of a near complete indica rice genome. Nat Commun, 2017, 8: 15324 CrossRef PubMed ADS Google Scholar

[17] Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., Gingeras T.R.. STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 2013, 29: 15-21 CrossRef PubMed Google Scholar

[18] Dooner H.K., He L.. Maize genome structure variation: interplay between retrotransposon polymorphisms and genic recombination. Plant Cell, 2008, 20: 249-258 CrossRef PubMed Google Scholar

[19] Du J., Grant D., Tian Z., Nelson R.T., Zhu L., Shoemaker R.C., Ma J.. SoyTEdb: a comprehensive database of transposable elements in the soybean genome. BMC Genomics, 2010, 11: 113 CrossRef PubMed Google Scholar

[20] Fang C., Ma Y., Wu S., Liu Z., Wang Z., Yang R., Hu G., Zhou Z., Yu H., Zhang M., et al. Genome-wide association studies dissect the genetic networks underlying agronomical traits in soybean. Genome Biol, 2017, 18: 161 CrossRef PubMed Google Scholar

[21] Foley J.A., Ramankutty N., Brauman K.A., Cassidy E.S., Gerber J.S., Johnston M., Mueller N.D., O’Connell C., Ray D.K., West P.C., et al. Solutions for a cultivated planet. Nature, 2011, 478: 337-342 CrossRef PubMed ADS Google Scholar

[22] Funatsuki H., Kawaguchi K., Matsuba S., Sato Y., Ishimoto M.. Mapping of QTL associated with chilling tolerance during reproductive growth in soybean. Theor Appl Genet, 2005, 111: 851-861 CrossRef PubMed Google Scholar

[23] Gai J., Wang Y., Wu X., Chen S.. A comparative study on segregation analysis and QTL mapping of quantitative traits in plants—with a case in soybean. Front Agric China, 2007, 1: 1-7 CrossRef Google Scholar

[24] Githiri S.M., Yang D., Khan N.A., Xu D., Komatsuda T., Takahashi R.. QTL analysis of low temperature induced browning in soybean seed coats. J Heredity, 2007, 98: 360-366 CrossRef PubMed Google Scholar

[25] Gizlice Z., Carter T.E., Burton J.W.. Genetic base for North American public soybean cultivars released between 1947 and 1988. Crop Sci, 1994, 34: 1143-1151 CrossRef Google Scholar

[26] Guo, H., Liu, J., Luo, L., Wei, X., Zhang, J., Qi, Y ., Zhang, B., Liu, H., and Xiao, P. (2017). Complete chloroplast genome sequences of Schisandra chinensis: genome structure, comparative analysis, and phylogenetic relationship of basal angiosperms. Sci China Life Sci 60, 1–5. Google Scholar

[27] Haas B.J.. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res, 2003, 31: 5654-5666 CrossRef Google Scholar

[28] Haas B.J., Salzberg S.L., Zhu W., Pertea M., Allen J.E., Orvis J., White O., Buell C.R., Wortman J.R.. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol, 2008, 9: R7 CrossRef PubMed Google Scholar

[29] Hirsch C.N., Hirsch C.D., Brohammer A.B., Bowman M.J., Soifer I., Barad O., Shem-Tov D., Baruch K., Lu F., Hernandez A.G., et al. Draft assembly of elite inbred line PH207 provides insights into genomic and transcriptome diversity in maize. Plant Cell, 2016, 28: 2700-2714 CrossRef PubMed Google Scholar

[30] Holligan D., Zhang X., Jiang N., Pritham E.J., Wessler S.R.. The transposable element landscape of the model legume Lotus japonicus. Genetics, 2006, 174: 2215-2228 CrossRef PubMed Google Scholar

[31] Hoshino A., Jayakumar V., Nitasaka E., Toyoda A., Noguchi H., Itoh T., Shin-I T., Minakuchi Y., Koda Y., Nagano A.J., et al. Genome sequence and analysis of the Japanese morning glory Ipomoea nil. Nat Commun, 2016, 7: 13295 CrossRef PubMed ADS Google Scholar

[32] Hyten D.L., Song Q., Zhu Y., Choi I.Y., Nelson R.L., Costa J.M., Specht J.E., Shoemaker R.C., Cregan P.B.. Impacts of genetic bottlenecks on soybean genome diversity. Proc Natl Acad Sci USA, 2006, 103: 16666-16671 CrossRef PubMed ADS Google Scholar

[33] Jarvis D.E., Ho Y.S., Lightfoot D.J., Schmöckel S.M., Li B., Borm T.J.A., Ohyanagi H., Mineta K., Michell C.T., Saber N., et al. The genome of Chenopodium quinoa. Nature, 2017, 542: 307-312 CrossRef PubMed ADS Google Scholar

[34] Jiao, Y., Peluso, P., Shi, J., Liang, T., Stitzer, M.C., Wang, B., Campbell, M.S., Stein, J.C., Wei, X., and Chin, C.S. (2017). Improved maize reference genome with single-molecule technologies. Nature 546, 524-527. Google Scholar

[35] Jun, T.H., Freewalt, K., Michel, A.P., and Mian, R. (2014). Identification of novel QTL for leaf traits in soybean. Plant Breed 133, 61-66. Google Scholar

[36] Kawakatsu T., Huang S.S.C., Jupe F., Sasaki E., Schmitz R.J., Urich M.A., Castanon R., Nery J.R., Barragan C., He Y., et al. Epigenomic diversity in a global collection of Arabidopsis thaliana accessions. Cell, 2016, 166: 492-505 CrossRef PubMed Google Scholar

[37] Keilwagen J., Hartung F., Paulini M., Twardziok S.O., Grau J.. Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi. BMC BioInf, 2018, 19: 189 CrossRef PubMed Google Scholar

[38] Keim, P., Diers, B.W., Olson, T.C., and Shoemaker, R.C. (1990). RFLP mapping in soybean: association between marker loci and variation in quantitative traits. Genetics 126, 735-742. Google Scholar

[39] Khan N.A., Githiri S.M., Benitez E.R., Abe J., Kawasaki S., Hayashi T., Takahashi R.. QTL analysis of cleistogamy in soybean. Theor Appl Genet, 2008, 117: 479-487 CrossRef PubMed Google Scholar

[40] Kim H.K., Kim Y.C., Kim S.T., Son B.G., Choi Y.W., Kang J.S., Park Y.H., Cho Y.S., Choi I.S.. Analysis of quantitative trait loci (QTLs) for seed size and fatty acid composition using recombinant inbred lines in soybean. J Life Sci, 2010, 20: 1186-1192 CrossRef Google Scholar

[41] Komatsu K., Okuda S., Takahashi M., Matsunaga R., Nakazawa Y.. Quantitative trait loci mapping of pubescence density and flowering time of insect-resistant soybean (Glycine max L. Merr.). Genet Mol Biol, 2007, 30: 635-639 CrossRef Google Scholar

[42] Kong F., Liu B., Xia Z., Sato S., Kim B.M., Watanabe S., Yamada T., Tabata S., Kanazawa A., Harada K., et al. Two coordinately regulated homologs of FLOWERING LOCUS T are involved in the control of photoperiodic flowering in soybean. Plant Physiol, 2010, 154: 1220-1231 CrossRef PubMed Google Scholar

[43] Kong F., Nan H., Cao D., Li Y., Wu F., Wang J., Lu S., Yuan X., Cober E.R., Abe J., et al. A new dominant gene conditions early flowering and maturity in soybean. Crop Sci, 2014, 54: 2529-2535 CrossRef Google Scholar

[44] Koo S.C., Bracko O., Park M.S., Schwab R., Chun H.J., Park K.M., Seo J.S., Grbic V., Balasubramanian S., Schmid M., et al. Control of lateral organ development and flowering time by the Arabidopsis thaliana MADS-box Gene AGAMOUS-LIKE6. Plant J, 2010, 62: 807-816 CrossRef PubMed Google Scholar

[45] Koren S., Walenz B.P., Berlin K., Miller J.R., Bergman N.H., Phillippy A.M.. Canu: scalable and accurate long-read assembly via adaptivek-mer weighting and repeat separation. Genome Res, 2017, 27: 722-736 CrossRef PubMed Google Scholar

[46] Korf I.. Gene finding in novel genomes. BMC BioInf, 2004, 5: 59 CrossRef PubMed Google Scholar

[47] Krouk G., Mirowski P., LeCun Y., Shasha D.E., Coruzzi G.M.. Predictive network modeling of the high-resolution dynamic plant transcriptome in response to nitrate. Genome Biol, 2010, 11: R123 CrossRef PubMed Google Scholar

[48] Kuroda Y., Kaga A., Tomooka N., Yano H., Takada Y., Kato S., Vaughan D.. QTL affecting fitness of hybrids between wild and cultivated soybeans in experimental fields. Ecol Evol, 2013, 3: 2150-2168 CrossRef PubMed Google Scholar

[49] Kurtz S., Phillippy A., Delcher A.L., Smoot M., Shumway M., Antonescu C., Salzberg S.L.. Versatile and open software for comparing large genomes.. Genome Biol, 2004, 5: R12 CrossRef PubMed Google Scholar

[50] Lam H.M., Xu X., Liu X., Chen W., Yang G., Wong F.L., Li M.W., He W., Qin N., Wang B., et al. Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection. Nat Genet, 2010, 42: 1053-1059 CrossRef PubMed Google Scholar

[51] Le B.H., Cheng C., Bui A.Q., Wagmaister J.A., Henry K.F., Pelletier J., Kwong L., Belmonte M., Kirkbride R., Horvath S., et al. Global analysis of gene activity during Arabidopsis seed development and identification of seed-specific transcription factors. Proc Natl Acad Sci USA, 2010, 107: 8063-8070 CrossRef PubMed ADS Google Scholar

[52] Li B., Dewey C.N.. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC BioInf, 2011, 12: 323 CrossRef PubMed Google Scholar

[53] Li Y.H., Li W., Zhang C., Yang L., Chang R.Z., Gaut B.S., Qiu L.J.. Genetic diversity in domesticated soybean (Glycine max) and its wild progenitor (Glycine soja) for simple sequence repeat and single-nucleotide polymorphism loci. New Phytologist, 2010, 188: 242-253 CrossRef PubMed Google Scholar

[54] Li Y., Zhao S., Ma J., Li D., Yan L., Li J., Qi X., Guo X., Zhang L., He W., et al. Molecular footprints of domestication and improvement in soybean revealed by whole genome re-sequencing. BMC Genomics, 2013, 14: 579 CrossRef PubMed Google Scholar

[55] Li Y., Zhou G., Ma J., Jiang W., Jin L., Zhang Z., Guo Y., Zhang J., Sui Y., Zheng L., et al. De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits. Nat Biotechnol, 2014, 32: 1045-1052 CrossRef PubMed Google Scholar

[56] Lieberman-Aiden E., van Berkum N.L., Williams L., Imakaev M., Ragoczy T., Telling A., Amit I., Lajoie B.R., Sabo P.J., Dorschner M.O., et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 2009, 326: 289-293 CrossRef PubMed ADS Google Scholar

[57] Liu C., Shi L., Zhu Y., Chen H., Zhang J., Lin X., Guan X.. CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences. BMC Genomics, 2012, 13: 715 CrossRef PubMed Google Scholar

[58] Liu, Z.X., Li, H.H., Wen, Z.X., Fan, X.H., Li, Y.H., Guan, R.X., Guo, Y., Wang, S.M., Wang, D.C., and Qiu, L.J. (2017). Comparison of genetic diversity between Chinese and American soybean (Glycine max (L.)) accessions revealed by high-density SNPs. Front Plant Sci 8, 2014. Google Scholar

[59] Lupski J.R., de Oca-Luna R.M., Slaugenhaupt S., Pentao L., Guzzetta V., Trask B.J., Saucedo-Cardenas O., Barker D.F., Killian J.M., Garcia C.A., et al. DNA duplication associated with Charcot-Marie-Tooth disease type 1A. Cell, 1991, 66: 219-232 CrossRef Google Scholar

[60] Lu S., Zhao X., Hu Y., Liu S., Nan H., Li X., Fang C., Cao D., Shi X., Kong L., et al. Natural variation at the soybean J locus improves adaptation to the tropics and enhances yield. Nat Genet, 2017, 49: 773-779 CrossRef PubMed Google Scholar

[61] Lv S., Wu W., Wang M., Meyer R.S., Ndjiondjop M.N., Tan L., Zhou H., Zhang J., Fu Y., Cai H., et al. Genetic control of seed shattering during African rice domestication. Nat Plants, 2018, 4: 331-337 CrossRef PubMed Google Scholar

[62] Ma S.S., Bohnert H.J., Dinesh-Kumar S.P.. AtGGM2014, an Arabidopsis gene co-expression network for functional studies. Sci China Life Sci, 2015, 58: 276-286 CrossRef PubMed Google Scholar

[63] Ma S., Ding Z., Li P.. Maize network analysis revealed gene modules involved in development, nutrients utilization, metabolism, and stress response. BMC Plant Biol, 2017, 17: 131 CrossRef PubMed Google Scholar

[64] Ma S., Gong Q., Bohnert H.J.. An Arabidopsis gene network based on the graphical Gaussian model. Genome Res, 2007, 17: 1614-1625 CrossRef PubMed Google Scholar

[65] Mansur, L., Lark, K., Kross, H., and Oliveira, A. (1993). Interval mapping of quantitative trait loci for reproductive, morphological, and seed traits of soybean (Glycine max L.). Theor Appl Genet 86, 907-913. Google Scholar

[66] Mansur L.M., Orf J.H., Chase K., Jarvik T., Cregan P.B., Lark K.G.. Genetic mapping of agronomic traits using recombinant inbred lines of soybean. Crop Sci, 1996, 36: 1327-1336 CrossRef Google Scholar

[67] Mao T., Li J., Wen Z., Wu T., Wu C., Sun S., Jiang B., Hou W., Li W., Song Q., et al. Association mapping of loci controlling genetic and environmental interaction of soybean flowering time under various photo-thermal conditions. BMC Genomics, 2017, 18: 415 CrossRef PubMed Google Scholar

[68] McCarthy E.M., McDonald J.F.. LTR_STRUC: a novel search and identification program for LTR retrotransposons. Bioinformatics, 2003, 19: 362-367 CrossRef Google Scholar

[69] Oldham M.C., Horvath S., Geschwind D.H.. Conservation and evolution of gene coexpression networks in human and chimpanzee brains. Proc Natl Acad Sci USA, 2006, 103: 17973-17978 CrossRef PubMed ADS Google Scholar

[70] Orf, J., Chase, K., Jarvik, T., Mansur, L., Cregan, P., Adler, F., and Lark, K. (1999). Genetics of soybean agronomic traits: I. Comparison of three related recombinant inbred populations. Crop Sci 39, 1642-1651. Google Scholar

[71] Oyoo M.E., Githiri S.M., Benitez E.R., Takahashi R.. QTL analysis of net-like cracking in soybean seed coats. Breed Sci, 2010, 60: 28-33 CrossRef Google Scholar

[72] Palomeque, L., Li-Jun, L., Li, W., Hedges, B., Cober, E.R., and Rajcan, I. (2009). QTL in mega-environments: II. Agronomic trait QTL co-localized with seed yield QTL detected in a population derived from a cross of high-yielding adapted × high-yielding exotic soybean lines. Theor Appl Genet 119, 429-436. Google Scholar

[73] Pooprompan, P., Wasee, S., Toojinda, T., Abe, J., Chanprame, S., and Srinives, P. (2006). Molecular marker analysis of days to flowering in vegetable soybean (Glycine max (L.) Merrill). Kasetsart Journal 40, 573-581. Google Scholar

[74] Ray D.K., Mueller N.D., West P.C., Foley J.A.. Yield trends are insufficient to double global crop production by 2050. PLoS ONE, 2013, 8: e66428 CrossRef PubMed ADS Google Scholar

[75] Raymond O., Gouzy J., Just J., Badouin H., Verdenaud M., Lemainque A., Vergne P., Moja S., Choisne N., Pont C., et al. The Rosa genome provides new insights into the domestication of modern roses. Nat Genet, 2018, 50: 772-777 CrossRef PubMed Google Scholar

[76] Reinprecht Y., Poysa V.W., Yu K., Rajcan I., Ablett G.R., Pauls K.P.. Seed and agronomic QTL in low linolenic acid, lipoxygenase-free soybean (Glycine max (L.) Merrill) germplasm. Genome, 2006, 49: 1510-1527 CrossRef PubMed Google Scholar

[77] Rhee S.Y., Mutwil M.. Towards revealing the functions of all genes in plants. Trends Plant Sci, 2014, 19: 212-221 CrossRef PubMed Google Scholar

[78] Samanfar B., Molnar S.J., Charette M., Schoenrock A., Dehne F., Golshani A., Belzile F., Cober E.R.. Mapping and identification of a potential candidate gene for a novel maturity locus, E10, in soybean. Theor Appl Genet, 2017, 130: 377-390 CrossRef PubMed Google Scholar

[79] Saski C., Lee S.B., Daniell H., Wood T.C., Tomkins J., Kim H.G., Jansen R.K.. Complete chloroplast genome sequence of Glycine max and comparative analyses with other legume genomes. Plant Mol Biol, 2005, 59: 309-322 CrossRef PubMed Google Scholar

[80] Schäfer, J., and Strimmer, K. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Mol Biol 4, Article32. Google Scholar

[81] Schmidt M.H.W., Vogel A., Denton A.K., Istace B., Wormit A., van de Geest H., Bolger M.E., Alseekh S., Maß J., Pfaff C., et al. De novo assembly of a newSolanum pennellii accession using nanopore sequencing. Plant Cell, 2017, 29: 2336-2348 CrossRef PubMed Google Scholar

[82] Schmutz J., Cannon S.B., Schlueter J., Ma J., Mitros T., Nelson W., Hyten D.L., Song Q., Thelen J.J., Cheng J., et al. Genome sequence of the palaeopolyploid soybean. Nature, 2010, 463: 178-183 CrossRef PubMed ADS Google Scholar

[83] Seo J.S., Rhie A., Kim J., Lee S., Sohn M.H., Kim C.U., Hastie A., Cao H., Yun J.Y., Kim J., et al. De novo assembly and phasing of a Korean human genome. Nature, 2016, 538: 243-247 CrossRef PubMed ADS Google Scholar

[84] Serin E.A.R., Nijveen H., Hilhorst H.W.M., Ligterink W.. Learning from co-expression networks: possibilities and challenges. Front Plant Sci, 2016, 7: 444 CrossRef PubMed Google Scholar

[85] Servant N., Varoquaux N., Lajoie B.R., Viara E., Chen C.J., Vert J.P., Heard E., Dekker J., Barillot E.. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol, 2015, 16: 259 CrossRef PubMed Google Scholar

[86] Shi L., Guo Y., Dong C., Huddleston J., Yang H., Han X., Fu A., Li Q., Li N., Gong S., et al. Long-read sequencing and de novo assembly of a Chinese genome. Nat Commun, 2016, 7: 12065 CrossRef PubMed ADS Google Scholar

[87] Shimomura, M., Kanamori, H., Komatsu, S., Namiki, N., Mukai, Y., Kurita, K., Kamatsuki, K., Ikawa, H., Yano, R., and Ishimoto, M. (2015). The Glycine max cv. Enrei genome for improvement of Japanese soybean cultivars. Int J Genomics 2015, 358127. Google Scholar

[88] Simão F.A., Waterhouse R.M., Ioannidis P., Kriventseva E.V., Zdobnov E.M.. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics, 2015, 31: 3210-3212 CrossRef PubMed Google Scholar

[89] Stanke M., Morgenstern B.. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res, 2005, 33: W465-W467 CrossRef PubMed Google Scholar

[90] Studer A., Zhao Q., Ross-Ibarra J., Doebley J.. Identification of a functional transposon insertion in the maize domestication gene tb1. Nat Genet, 2011, 43: 1160-1163 CrossRef PubMed Google Scholar

[91] Tasma I.M., Lorenzen L.L., Green D.E., Shoemaker R.C.. Mapping genetic loci for flowering time, maturity, and photoperiod insensitivity in soybean. Mol Breeding, 2001, 8: 25-35 CrossRef Google Scholar

[92] VanBuren R., Bryant D., Edger P.P., Tang H., Burgess D., Challabathula D., Spittle K., Hall R., Gu J., Lyons E., et al. Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum. Nature, 2015, 527: 508-511 CrossRef PubMed ADS Google Scholar

[93] Walker B.J., Abeel T., Shea T., Priest M., Abouelliel A., Sakthikumar S., Cuomo C.A., Zeng Q., Wortman J., Young S.K., et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE, 2014, 9: e112963 CrossRef PubMed ADS Google Scholar

[94] Wang K., Huang G., Zhu Y.. Transposable elements play an important role during cotton genome evolution and fiber cell development. Sci China Life Sci, 2016, 59: 112-121 CrossRef PubMed Google Scholar

[95] Wang Z., Tian Z.X.. Genomics progress will facilitate molecular breeding in soybean. Sci China Life Sci, 2015, 58: 813-815 CrossRef PubMed Google Scholar

[96] Watanabe S., Xia Z., Hideshima R., Tsubokura Y., Sato S., Yamanaka N., Takahashi R., Anai T., Tabata S., Kitamura K., et al. A map-based cloning strategy employing a residual heterozygous line reveals that theGIGANTEA gene is involved in soybean maturity and flowering. Genetics, 2011, 188: 395-407 CrossRef PubMed Google Scholar

[97] Wei H., Yordanov Y.S., Georgieva T., Li X., Busov V.. Nitrogen deprivation promotesPopulus root growth through global transcriptome reprogramming and activation of hierarchical genetic networks. New Phytol, 2013, 200: 483-497 CrossRef PubMed Google Scholar

[98] Wei L., Cao X.. The effect of transposable elements on phenotypic variation: insights from plants to humans. Sci China Life Sci, 2016, 59: 24-37 CrossRef PubMed Google Scholar

[99] Wilson, R.F. (2008). Soybean: Market Driven Research Needs in Genetics and Genomics of Soybean, G. Stacey, ed. (New York: Springer), pp. 3-16. Google Scholar

[100] Windram O., Madhou P., McHattie S., Hill C., Hickman R., Cooke E., Jenkins D.J., Penfold C.A., Baxter L., Breeze E., et al. Arabidopsis defense against Botrytis cinerea: chronology and regulation deciphered by high-resolution temporal transcriptomic analysis. Plant Cell, 2012, 24: 3530-3557 CrossRef PubMed Google Scholar

[101] Wolfe C.J., Kohane I.S., Butte A.J.. Systematic survey reveals general applicability of “guilt-by-association” within gene coexpression networks. BMC BioInf, 2005, 6: 227 CrossRef PubMed Google Scholar

[102] Xia Z., Watanabe S., Yamada T., Tsubokura Y., Nakashima H., Zhai H., Anai T., Sato S., Yamazaki T., Lü S., et al. Positional cloning and characterization reveal the molecular basis for soybean maturity locus E1 that regulates photoperiodic flowering. Proc Natl Acad Sci USA, 2012, 109: E2155-E2164 CrossRef PubMed ADS Google Scholar

[103] Yamanaka N., Nagamura Y., Tsubokura Y., Yamamoto K., Takahashi R., Kouchi H., Yano M., Sasaki T., Harada K.. Quantitative trait locus analysis of flowering time in soybean using a RFLP linkage map.. Breed Sci, 2000, 50: 109-115 CrossRef Google Scholar

[104] Yamanaka N.. An informative linkage map of soybean reveals QTLs for flowering time, leaflet morphology and regions of segregation distortion. DNA Res, 2001, 8: 61-72 CrossRef Google Scholar

[105] Yue Y., Liu N., Jiang B., Li M., Wang H., Jiang Z., Pan H., Xia Q., Ma Q., Han T., et al. A single nucleotide deletion in J encoding gmelf3 confers long juvenility and is associated with adaption of tropic soybean. Mol Plant, 2017, 10: 656-658 CrossRef PubMed Google Scholar

[106] Zabala G., Vodkin L.O.. A rearrangement resulting in small tandem repeats in the F3′5′H gene of white flower genotypes is associated with the soybean locus. Crop Sci, 2007, 47: S-113 CrossRef Google Scholar

[107] Zhang J., Chen L.L., Xing F., Kudrna D.A., Yao W., Copetti D., Mu T., Li W., Song J.M., Xie W., et al. Extensive sequence divergence between the reference genomes of two eliteindica rice varieties Zhenshan 97 and Minghui 63. Proc Natl Acad Sci USA, 2016, 113: E5163-E5171 CrossRef PubMed Google Scholar

[108] Zhang S.R., Wang H., Wang Z., Ren Y., Niu L., Liu J., Liu B.. Photoperiodism dynamics during the domestication and improvement of soybean. Sci China Life Sci, 2017, 60: 1416-1427 CrossRef PubMed Google Scholar

[109] Zhang, W.K., Wang, Y.J., Luo, G.Z., Zhang, J.S., He, C.Y., Wu, X.L., Gai, J.Y., and Chen, S.Y. (2004). QTL mapping of ten agronomic traits on the soybean (Glycine max L. Merr.) genetic map and their association with EST markers. Theor Appl Genet 108, 11311139. Google Scholar

[110] Zhao C., Takeshima R., Zhu J., Xu M., Sato M., Watanabe S., Kanazawa A., Liu B., Kong F., Yamada T., et al. A recessive allele for delayed flowering at the soybean maturity locus E9 is a leaky allele of FT2a, a FLOWERING LOCUS T ortholog. BMC Plant Biol, 2016, 16: 20 CrossRef PubMed Google Scholar

[111] Zhou Z., Jiang Y., Wang Z., Gou Z., Lyu J., Li W., Yu Y., Shu L., Zhao Y., Ma Y., et al. Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean. Nat Biotechnol, 2015, 33: 408-414 CrossRef PubMed Google Scholar

  • Figure 1

    Whole-genome comparison between Gmax_ZH13 and Glycine_max_v2.0. A, Intra-chromosome comparisons. Gaps in assembled chromosomes, specifically presence regions, synteny aligned regions, inversion regions, translocation regions and translocation & inversion regions are included. B, Inter-chromosome comparisons. Tracks from outer to inner circles indicate SNP number and small insertion number, lines between each chromosome show translocation or translocation & inversion events.

  • Figure 2

    Combining gene co-expression network and QTL/GWAS regions to predict soybean flowering time related genes. A, Genes co-expressed with 9 known soybean flowering time related genes at the first level. Nodes represent genes, and edges represent connections between genes. Edge width correlates to the connected genes’ expression pattern similarity; the thicker the edge, the higher the expression correlation its connected genes have. B, 26 soybean flowering time related genes predicted by GWAS and/or QTL regions appearing in (A). Ref1 is Fang et al., 2017.

  • Figure 3

    SoyZH13_16G177400 is a gene controlling soybean flowering time. A, Different haplotypes of SoyZH13_16G177400 show significantly different flowering times. Green blocks indicate the gene’s CDS region and “H” is an abbreviation for “haplotype”. Nucleotides marked in red are the mutant forms compared to Gmax_ZH13 genome. The different letters to the right of each column indicate significant differences by ANOVA test (P<0.01). B, Geographic distribution of accessions roughly in accordance with their haplotypes of SoyZH13_16G177400. The phylogenetic tree is modified from Figure 1b of the reference (Fang et al., 2017). “HL” is an abbreviation for “high latitude”, and “LL” is an abbreviation for “low latitude”.

  • Figure 4

    Combining gene co-expression network and QTL/GWAS regions to predict linoleic acid content related genes. A, Genes co-expressed with FAD3A at the first level. Nodes represent genes and edges represent connections between genes. Edge width correlates to the connected genes’ expression pattern similarity; the thicker the edge, the higher the expression correlation its connected genes have. B, Linoleic acid content shows a significant difference between accessions with two different haplotypes in SoyZH13_02G207800. *** denotes t-test P<0.001.

  • Table 1   Assembly statistics of the soybean Gmax_ZH13 genome

    Assemblya)

    Contigsb)

    Scaffolds

    Unplaced contigsc)

    Contig N50 (Mb)d)

    Scaffold N50 (Mb)d)

    Assembly size (Gb)

    Assembly in scaffolds (%)

    PacBio

    1,559

    --

    --

    2.6

    --

    1.007

    --

    BioNano-BspQI

    --

    518

    --

    --

    3.79

    1.012

    --

    BioNano-BssSI

    --

    1,181

    --

    --

    1.3

    1.031

    --

    PacBio+BioNano

    826

    59

    717

    3.46

    25.12

    1.025

    96.85

    PacBio+BioNano+Hi-C

    836

    21

    549

    3.46

    51.87

    1.025

    97

    Assemblies are listed as the steps of combining different sequence data types for genome assembly. b) The number of continuous stretches of sequence within the scaffold without gaps >3 bases in length of at least 100 bases. c) Unplaced contigs are defined as input contigs that were not placed by the optical map or Hi-C in a scaffold. d) All N50 values are based on the Gmax_ZH13 assembled size.

  • Table 2   Transposable element and repeat sequence composition in the Gmax_ZH13 genome

    Repeat type

    Classification

    Intact/Solo numbera)

    DNA content (bp)

    DNA content

    (%)

    Class I: Retrotransposon

    LTR-Retrotransposon

    Ty1/copia

    12,641

    106,803,505

    10.42%

    Ty3/gypsy

    17,935

    331,019,054

    32.29%

    Others

    100

    4,012,789

    0.39%

    Non-LTR Retrotransposon

    LINE

    330

    9,300,199

    0.91%

    SINE

     

    399,615

    0.04%

    Class II: DNA Transposon

    Subclass I:

    Tc1/Mariner

    7

    144,681

    0.01%

    hAT

    42

    219,006

    0.02%

    Mutator

    2,242

    22,699,372

    2.22%

    PIF/Harbinger

    71

    1,169,142

    0.11%

    Pong

    10

    352,522

    0.03%

    CACTA

    59

    6,404,368

    0.62%

    MITE

    Tourist

    1,356

    1,212,517

    0.11%

    Stowaway

    1,562

    1,110,643

    0.11%

    Subclass II:

    Helitron

    74

    2,905,865

    0.29%

    Tandem Repeat

     

    10,546,540

    1.03%

    Unknown

     

    42,453,694

    4.14%

    Total

    36,429

    540,753,512

    52.75%

    Number of transposable elements with clear boundaries and signatures of insertion sites.

Copyright 2019 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1