1000 genomes project
Recently Published Documents


TOTAL DOCUMENTS

164
(FIVE YEARS 56)

H-INDEX

25
(FIVE YEARS 6)

2022 ◽  
Vol 12 ◽  
Author(s):  
Lu Cao ◽  
Ruixue Zhang ◽  
Yirui Wang ◽  
Xia Hu ◽  
Liang Yong ◽  
...  

The important role of MHC in the pathogenesis of vitiligo and SLE has been confirmed in various populations. To map the most significant MHC variants associated with the risk of vitiligo and SLE, we conducted fine mapping analysis using 1117 vitiligo cases, 1046 SLE cases and 1693 healthy control subjects in the Han-MHC reference panel and 1000 Genomes Project phase 3. rs113465897 (P=1.03×10-13, OR=1.64, 95%CI =1.44–1.87) and rs3129898 (P=4.21×10-17, OR=1.93, 95%CI=1.66–2.25) were identified as being most strongly associated with vitiligo and SLE, respectively. Stepwise conditional analysis revealed additional independent signals at rs3130969(p=1.48×10-7, OR=0.69, 95%CI=0.60–0.79), HLA-DPB1*03:01 (p=1.07×10-6, OR=1.94, 95%CI=1.49–2.53) being linked to vitiligo and HLA-DQB1*0301 (P=4.53×10-7, OR=0.62, 95%CI=0.52-0.75) to SLE. Considering that epidemiological studies have confirmed comorbidities of vitiligo and SLE, we used the GCTA tool to analyse the genetic correlation between these two diseases in the HLA region, the correlation coefficient was 0.79 (P=5.99×10-10, SE=0.07), confirming their similar genetic backgrounds. Our findings highlight the value of the MHC region in vitiligo and SLE and provide a new perspective for comorbidities among autoimmune diseases.


Genes ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 44
Author(s):  
Iago Maceda ◽  
Oscar Lao

The 1000 Genomes Project (1000G) is one of the most popular whole genome sequencing datasets used in different genomics fields and has boosting our knowledge in medical and population genomics, among other fields. Recent studies have reported the presence of ghost mutation signals in the 1000G. Furthermore, studies have shown that these mutations can influence the outcomes of follow-up studies based on the genetic variation of 1000G, such as single nucleotide variants (SNV) imputation. While the overall effect of these ghost mutations can be considered negligible for common genetic variants in many populations, the potential bias remains unclear when studying low frequency genetic variants in the population. In this study, we analyze the effect of the sequencing center in predicted loss of function (LoF) alleles, the number of singletons, and the patterns of archaic introgression in the 1000G. Our results support previous studies showing that the sequencing center is associated with LoF and singletons independent of the population that is considered. Furthermore, we observed that patterns of archaic introgression were distorted for some populations depending on the sequencing center. When analyzing the frequency of SNPs showing extreme patterns of genotype differentiation among centers for CEU, YRI, CHB, and JPT, we observed that the magnitude of the sequencing batch effect was stronger at MAF < 0.2 and showed different profiles between CHB and the other populations. All these results suggest that data from 1000G must be interpreted with caution when considering statistics using variants at low frequency.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Fadilla Wahyudi ◽  
Farhang Aghakhanian ◽  
Sadequr Rahman ◽  
Yik-Ying Teo ◽  
Michał Szpak ◽  
...  

Abstract Background In population genomics, polymorphisms that are highly differentiated between geographically separated populations are often suggestive of Darwinian positive selection. Genomic scans have highlighted several such regions in African and non-African populations, but only a handful of these have functional data that clearly associates candidate variations driving the selection process. Fine-Mapping of Adaptive Variation (FineMAV) was developed to address this in a high-throughput manner using population based whole-genome sequences generated by the 1000 Genomes Project. It pinpoints positively selected genetic variants in sequencing data by prioritizing high frequency, population-specific and functional derived alleles. Results We developed a stand-alone software that implements the FineMAV statistic. To graphically visualise the FineMAV scores, it outputs the statistics as bigWig files, which is a common file format supported by many genome browsers. It is available as a command-line and graphical user interface. The software was tested by replicating the FineMAV scores obtained using 1000 Genomes Project African, European, East and South Asian populations and subsequently applied to whole-genome sequencing datasets from Singapore and China to highlight population specific variants that can be subsequently modelled. The software tool is publicly available at https://github.com/fadilla-wahyudi/finemav. Conclusions The software tool described here determines genome-wide FineMAV scores, using low or high-coverage whole-genome sequencing datasets, that can be used to prioritize a list of population specific, highly differentiated candidate variants for in vitro or in vivo functional screens. The tool displays these scores on the human genome browsers for easy visualisation, annotation and comparison between different genomic regions in worldwide human populations.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12294
Author(s):  
Neeraj Bharti ◽  
Ruma Banerjee ◽  
Archana Achalere ◽  
Sunitha Manjari Kasibhatla ◽  
Rajendra Joshi

Objectives Reliable identification of population-specific variants is important for building the single nucleotide polymorphism (SNP) profile. In this study, genomic variation using allele frequency differences of pharmacologically important genes for Gujarati Indians in Houston (GIH) and Indian Telugu in the U.K. (ITU) from the 1000 Genomes Project vis-à-vis global population data was studied to understand its role in drug response. Methods Joint genotyping approach was used to derive variants of GIH and ITU independently. SNPs of both these populations with significant allele frequency variation (minor allele frequency ≥ 0.05) with super-populations from the 1000 Genomes Project and gnomAD based on Chi-square distribution with p-value of ≤ 0.05 and Bonferroni’s multiple adjustment tests were identified. Population stratification and fixation index analysis was carried out to understand genetic differentiation. Functional annotation of variants was carried out using SnpEff, VEP and CADD score. Results Population stratification of VIP genes revealed four clusters viz., single cluster of GIH and ITU, one cluster each of East Asian, European, African populations and Admixed American was found to be admixed. A total of 13 SNPs belonging to ten pharmacogenes were identified to have significant allele frequency variation in both GIH and ITU populations as compared to one or more super-populations. These SNPs belong to VKORC1 (rs17708472, rs2359612, rs8050894) involved in Vitamin K cycle, cytochrome P450 isoforms CYP2C9 (rs1057910), CYP2B6 (rs3211371), CYP2A2 (rs4646425) and CYP2A4 (rs4646440); ATP-binding cassette (ABC) transporter ABCB1 (rs12720067), DPYD1 (rs12119882, rs56160474) involved in pyrimidine metabolism, methyltransferase COMT (rs9332377) and transcriptional factor NR1I2 (rs6785049). SNPs rs1544410 (VDR), rs2725264 (ABCG2), rs5215 and rs5219 (KCNJ11) share high fixation index (≥ 0.5) with either EAS/AFR populations. Missense variants rs1057910 (CYP2C9), rs1801028 (DRD2) and rs1138272 (GSTP1), rs116855232 (NUDT15); intronic variants rs1131341 (NQO1) and rs115349832 (DPYD) are identified to be ‘deleterious’. Conclusions Analysis of SNPs pertaining to pharmacogenes in GIH and ITU populations using population structure, fixation index and allele frequency variation provides a premise for understanding the role of genetic diversity in drug response in Asian Indians.


2021 ◽  
Author(s):  
Zhong Wang ◽  
Lei Sun ◽  
Andrew D Paterson

An unexpectedly high proportion of SNPs on the X chromosome in the 1000 Genomes Project phase 3 data were identified with significant sex differences in minor allele frequencies (sdMAF). sdMAF persisted for many of these SNPs in the recently released high coverage whole genome sequence, and it was consistent between the five super-populations. Among the 245,825 common biallelic SNPs in phase 3 data presumed to be high quality, 2,039 have genome-wide significant sdMAF (p-value <5e-8). sdMAF varied by location: (NPR)=0.83%, pseudo-autosomal region (PAR1)=0.29%, PAR2=13.1%, and PAR3=0.85% of SNPs had sdMAF, and they were clustered at the NPR-PAR boundaries, among others. sdMAF at the NPR-PAR boundaries are biologically expected due to sex-linkage, but have generally been ignored in association studies. For comparison, similar analyses found only 6, 1 and 0 SNPs with significant sdMAF on chromosomes 1, 7 and 22, respectively. Future X chromosome analyses need to take sdMAF into account.


2021 ◽  
Author(s):  
Hongyan Lu ◽  
Yuliang Wang ◽  
Zhanhao Zhang ◽  
Shishi Xing ◽  
Dandan Li ◽  
...  

Abstract IntroductionThe specificity of drug therapy in individuals and races has promoted the development and improvement of pharmacogenomics and precision medicine. While there is a few cognition on the minorities in China, especially in Lisu nationality from the Yunnan Province. Therefore, we performed the research to improve the role of pharmacogenomics in the Lisu population from the Yunnan province of China.Materials and MethodsIn our study, 54 variants of very important pharmacogenes (VIPs) selected from the PharmGKB database were genotyped in 199 unrelated and healthy Lisu adults from the Yunnan province of China, and then, genotyping data wtih χ2 test were analyzed.ResultsWe compared our date with those of other 26 populations from the 1000 Genomes Project, and acquired that the Lisu ethnicity is similar with the CDX(Chinese Dai in Xishuangbanna, China) and CHS(Southern Han Chinese, China). Furthermore, rs776746 (CYP3A5), rs1805123 (KCNH2), rs4291 (ACE), rs1051298 (SLC19A1) and rs1065852 (CYP2D6) were deemed as the most varying loci. The MAF of “G” at rs1805123 (KCNH2) in the Lisu population was the largest with the value of 51.0%.ConclusionsOur results show that there are significant differences in SNP (single nucleotide polymorphism) loci, supplementing the pharmacogenomic information of the Lisu population in Yunnan province, China, and can provide a theoretical basis for individualized medication in the future.


2021 ◽  
Vol 12 ◽  
Author(s):  
Gang Shi ◽  
Qingmin Kuang

With the advance of sequencing technology, an increasing number of populations have been sequenced to study the histories of worldwide populations, including their divergence, admixtures, migration, and effective sizes. The variants detected in sequencing studies are largely rare and mostly population specific. Population-specific variants are often recent mutations and are informative for revealing substructures and admixtures in populations; however, computational methods and tools to analyze them are still lacking. In this work, we propose using reference populations and single nucleotide polymorphisms (SNPs) specific to the reference populations. Ancestral information, the best linear unbiased estimator (BLUE) of the ancestral proportion, is proposed, which can be used to infer ancestral proportions in recently admixed target populations and measure the extent to which reference populations serve as good proxies for the admixing sources. Based on the same panel of SNPs, the ancestral information is comparable across samples from different studies and is not affected by genetic outliers, related samples, or the sample sizes of the admixed target populations. In addition, ancestral spectrum is useful for detecting genetic outliers or exploring co-ancestry between study samples and the reference populations. The methods are implemented in a program, Ancestral Spectrum Analyzer (ASA), and are applied in analyzing high-coverage sequencing data from the 1000 Genomes Project and the Human Genome Diversity Project (HGDP). In the analyses of American populations from the 1000 Genomes Project, we demonstrate that recent admixtures can be dissected from ancient admixtures by comparing ancestral spectra with and without indigenous Americans being included in the reference populations.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yong Wang ◽  
Shiya Song ◽  
Joshua G. Schraiber ◽  
Alisa Sedghifar ◽  
Jake K. Byrnes ◽  
...  

Abstract Background We present ARCHes, a fast and accurate haplotype-based approach for inferring an individual’s ancestry composition. Our approach works by modeling haplotype diversity from a large, admixed cohort of hundreds of thousands, then annotating those models with population information from reference panels of known ancestry. Results The running time of ARCHes does not depend on the size of a reference panel because training and testing are separate processes, and the inferred population-annotated haplotype models can be written to disk and reused to label large test sets in parallel (in our experiments, it averages less than one minute to assign ancestry from 32 populations using 10 CPU). We test ARCHes on public data from the 1000 Genomes Project and the Human Genome Diversity Project (HGDP) as well as simulated examples of known admixture. Conclusions Our results demonstrate that ARCHes outperforms RFMix at correctly assigning both global and local ancestry at finer population scales regardless of the amount of population admixture.


2021 ◽  
Author(s):  
Tamara Soledad Frontanilla ◽  
Guilherme Valle Silva ◽  
Jesus Ayala ◽  
Celso Teixeira Mendes

Accurate STR genotyping from next-generation sequencing (NGS) data has been challenging. Haplotype inference and phasing for STRs (HipSTR) was specifically developed to deal with genotyping errors and obtain reliable STR genotypes from whole-genome sequencing datasets. The objective of this investigation was to perform a comprehensive genotyping analysis of a set of STRs of broad forensic interest from the 1000 Genomes populations and release a reliable open-access STR database to the forensic genetics community. A set of 22 STR markers were analyzed using the CRAM files of the 1000 Genomes Project Phase 3 high-coverage (30x) dataset generated by the New York Genome Center (NYGC). HipSTR was used to call genotypes from 2,504 samples from 26 populations organized into five groups: African, East Asian, European, South Asian, and admixed American. The D21S11 marker could not be detected in the present study. Moreover, the Hardy-Weinberg equilibrium analysis, coupled with a comprehensive analysis of allele frequencies, revealed that HipSTR could not identify longer Penta E (and Penta D at a lesser extent) alleles. This issue is probably due to the limited length of sequencing reads available for genotype calling, resulting in heterozygote deficiency. Notwithstanding that, AMOVA, a clustering analysis using STRUCTURE, and a Principal Coordinates Analysis revealed a clear-cut separation between the four major ancestries sampled by the 1000 Genomes Consortium (AFR, EUR, EAS, SAS). Meanwhile, the AMOVA results corroborated previous reports that most of the variance is (97.12%) observed within populations. This set of analyses revealed that except for larger Penta D and Penta E alleles, allele frequencies and genotypes defined by HipSTR from the 1000 Genomes Project phase 3 data and offered as an open-access database are consistent and highly reliable.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Prashant Pandey ◽  
Yinjie Gao ◽  
Carl Kingsford

AbstractEfficiently scaling genomic variant search indexes to thousands of samples is computationally challenging due to the presence of multiple coordinate systems to avoid reference biases. We present VariantStore, a system that indexes genomic variants from multiple samples using a variation graph and enables variant queries across any sample-specific coordinate system. We show the scalability of VariantStore by indexing genomic variants from the TCGA project in 4 h and the 1000 Genomes project in 3 h. Querying for variants in a gene takes between 0.002 and 3 seconds using memory only 10% of the size of the full representation.


Sign in / Sign up

Export Citation Format

Share Document