A detailed description of DNA sequence variation across the genome is a pre-requisite for the systematic analysis of variants underlying trait variation in wheat and critical for understanding the role of various evolutionary factors in shaping genome diversity. Newly developed sequencing technologies offer the possibility for obtaining a complete catalog of single nucleotide polymorphisms (SNPs) and insertions/deletions (indels) even for the complex wheat genome. The recent release of wheat genome assembly allowed us to describe the chromosomal distribution of variants and their potential effect on gene function. A developed haplotype map will be a valuable tool for imputing genotypes and transferring sequence-level variation data across multiple gene mapping projects, thereby increasing the power and precision of trait mapping in GWAS and helping to understand better the basis of complex phenotypic traits.
The data were generated by re-sequencing 62 diverse wheat lines using whole exome capture (WEC) and genotyping-by-sequencing (GBS) approaches. The panel of wheat lines was selected to capture the genetic diversity of the major global wheat growing regions and included landraces and cultivars (see Fig. 1). Out of these lines 32 are founders of the spring wheat nested-association mapping (NAM) population including 2,400 recombinant inbred lines (RILs). We identified 1.57 million SNPs and 161,719 small indels distributed across all 21 chromosomes. In coding sequences we identified 83,622 non-synonymous and 76,361 synonymous SNPs. Based on high-confidence gene models in the CSS contigs, we determined that only 1,600 and 1,583 SNPs are predicted to produce premature termination codons and splice-site disruptions.
Please refer to our paper for more details.
|Wheat lines||Origin||Improvement status||Growth habit||Region||Large region|
|Opata||Mexico||cultivar||Spring||North and Central America||The Americas|
|W7984||Mexico||synthetic||Spring||North and Central America||The Americas|
|Clear White||USA||cultivar||Spring||North and Central America||The Americas|
|Vorobey||Mexico||cultivar||Spring||North and Central America||The Americas|
|Klein Chamaco||Argentina||cultivar||Spring||South America||The Americas|
|Pavon||Mexico||cultivar||Spring||North and Central America||The Americas|
|acc2||USA||Breeding line||Spring||North and Central America||The Americas|
|acc3||USA||Breeding line||Spring||North and Central America||The Americas|
|acc4||USA||Breeding line||Spring||North and Central America||The Americas|
|acc1||USA||Breeding line||Spring||North and Central America||The Americas|
|acc5||USA||Breeding line||Spring||North and Central America||The Americas|
|PI349512||Switzerland||landrace||spring||Western and Northern Europe||Europe|
|PI477870||Peru||landrace||spring||South America||The Americas|
|Dharwar Dry||India||cultivar||spring||South-central Asia||Asia|
|Cham 6||Syria/Lebanon||cultivar||spring||Western Asia||Asia|
|Chakwal 86||Pakistan||cultivar||spring||South-central Asia||Asia|
|Berkut||Mexico||cultivar||spring||North and Central America||The Americas|
|PI192569||Sweden||landrace||spring||Western and Northern Europe||Europe|
|PI192147||Ethiopia||landrace||spring||South and East Africa||Africa|
|PI565213||Bolivia||landrace||spring||South America||The Americas|
|PI82469||North Korea||landrace||spring||Eastern Asia||Asia|
|PI245368||Guatemala||landrace||spring||North and Central America||The Americas|
|PI166180||India||landrace||spring||South-central Asia||The Americas|
|PI192001||Angola||landrace||spring||South and East Africa||Africa|
|PI153785||Brazil||landrace||spring||South America||The Americas|
|Marquis||Canada||cultivar||spring||North and Central America||The Americas|
|Neepawa||Canada||cultivar||spring||North and Central America||The Americas|
|AC Barrie||Canada||cultivar||spring||North and Central America||The Americas|
|Chinese Spring||China||cultivar||spring||Eastern Asia||Asia|
|Utmost||Canada||cultivar||spring||North and Central America||The Americas|
|Rialto||United Kingdom||cultivar||winter||Western and Northern Europe||Europe|
|Truman||USA||cultivar||winter||North and Central America||The Americas|
|49-2914 H1096||Argentina||Breeding line||facultative||South America||The Americas|
|102||Chile||cultivar||facultative||South America||The Americas|
|93||Bulgaria||cultivar||facultative||Western and Northern Europe||Europe|
|Taxi||United Kingdom||cultivar||winter||Western and Northern Europe||Europe|
|PR267||United States||cultivar||winter||North and Central America||The Americas|
|Roemer Winter||Germany||cultivar||winter||Western and Northern Europe||Europe|
|407-IV/60||Bosnia and Herzegovina||cultivar||facultative||Western and Northern Europe||Europe|
|403||Chile||cultivar||winter||South America||The Americas|
|Avalon||United Kingdom||cultivar||winter||Western and Northern Europe||Europe|
|Approach||SNP class||SNP subclass||Total||A gen.||B gen.||D gen.|
The PHS statistic was calculated as described by Toomajian et al. using custom Perl script. Utilizing this statistic we can detect genomic regions where haplotypes extend for relatively large portions of the genome normalized by the overall length of haplotype blocks within in the genome. These regions are characteristic of regions undergoing partial selective sweeps. Thresholds of the PHS statistic were determined by taking the 97.5 percentile of the overall distribution of PHS values, which are 0.72, 0.67, and 1.23 for the A, B, and D genomes, respectively.
We performed a genome scan for selected regions using a XP-CLR approach that is robust to assumptions regarding recombination rates and demography. In this method two populations are compared for allele frequency differentiation and the extent of linked variation to detect regions where change in frequency occurred too quickly to be caused by random drift. The XP-CLR scores were estimated using code downloaded from here . A set of grid points are placed along the chromosome arms with a spacing of 500 kb, the window size was chosen to be 0.1 cM, and the maximum number of SNPs in each window was fixed to be 500. The critical values for putative selection targets were estimated based on the 97.5 percentile of the test statistic distribution for each wheat genome. Critical values for XP-CLR statistics were 4.9, 4.5, and 5.0 for the A, B, and D genomes, respectively.
Genotype imputation was performed using Beagle v.4 89 with the following
window=5000 overlap=500 burns-its=10 impute-its=10.
To increase the accuracy of imputation, the settings of burns-its and
impute-its have been increased from the default settings (burns-its=5,
impute-its=5) to 10 (according to recommendations in user's manual). The
accuracy of genotype imputation assessed in windows including from 1,000 to
5,000 markers for cultivars Avalon and Rialto showed no significant
differences. A setting of window=5000 was selected because of its
To test the accuracy of imputation, we sequentially selected each cultivar from the panel of 62 lines and masked all genotyped sites, except ~14,000 SNPs overlapping between the WEC and 90K SNP array. At these SNP sites at least 75% of accessions in both datasets had genotype calls. The remaining 61 cultivars were used as a reference panel for imputing 649,502 SNPs that were ordered along the wheat chromosomes. After imputation genotypes were filtered using the different thresholds of genotype probability assessed by Beagle. The filtered predicted genotypes in each cultivar were compared with the actual genotype calls obtained by WEC sequencing to assess the accuracy of imputation.
USDA National Institute of Food and Agriculture Grant 2011-68002- 30029 (Triticeae-CAP) and 2012-67013-1940, Bill and Melinda Gates Foundation, Genome Prairie, Genome Canada, Saskatchewan Ministry of Agriculture, Western Grains Research Foundation, BBSRC, KSU Plant Biotechnology Center and Kansas Agricultural Experiment Station.