Explore the genetics of weedy traits using rice 3K database
Abstract Background Weedy rice, a conspecific weedy counterpart of the cultivated rice (Oryza sativa L.), has been problematic in rice-production area worldwide. Although we started to know about the origin of some weedy traits for some rice-growing regions, an overall assessment of weedy trait-related loci was not yet available. On the other hand, the advances in sequencing technologies, together with community efforts, have made publicly available a large amount of genomic data. Given the availability of public data and the need of “weedy” allele mining for a better management of weedy rice, the objective of the present study was to explore the genetic architecture of weedy traits based on publicly available data, mainly from the 3000 Rice Genome Project (3K-RGP). Results Based on the results of population structure analysis, we have selected 1378 individuals from four sub-populations (aus, indica, temperate japonica, tropical japonica) without admixed genomic composition for genome-wide association analysis (GWAS). Five traits were investigated: awn color, seed shattering, seed threshability, seed coat color, and seedling height. GWAS was conducted for each sub-population × trait combination and we have identified 66 population-specific trait-associated SNPs. Eleven significant SNPs fell into an annotated gene and four other SNPs were close to a putative candidate gene (± 25 kb). SNPs located in or close to Rc were particularly predictive of the occurrence of seed coat color and our results showed that different sub-populations required different SNPs for a better seed coat color prediction. We compared the data of 3K-RGP to a publicly available weedy rice dataset. The profile of allele frequency, phenotype-genotype segregation of target SNP, as well as GWAS results for the presence and absence of awns diverged between the two sets of data. Conclusions The genotype of trait-associated SNPs identified in this study, especially those located in or close to Rc, can be developed to diagnostic SNPs to trace the origin of weedy trait occurred in the field. The difference of results from the two publicly available datasets used in this study emphasized the importance of laboratory experiments to confirm the allele mining results based on publicly available data.