scholarly journals Genetic-substructure and complex demographic history of South African Bantu speakers

2020 ◽  
Author(s):  
Dhriti Sengupta ◽  
Ananyo Choudhury ◽  
Cesar Fortes-Lima ◽  
Shaun Aron ◽  
Gavin Whitelaw ◽  
...  

AbstractSouth Eastern Bantu-speaking (SEB) groups constitute more than 80% of the population in South Africa. Despite clear linguistic and geographic diversity, the genetic differences between these groups have not been systematically investigated. Based on genome-wide data of over 5000 individuals, representing eight major SEB groups, we provide strong evidence for fine-scale population structure that broadly aligns with geographic distribution and is also congruent with linguistic phylogeny (separation of Nguni, Sotho-Tswana and Tsonga speakers). Although differential Khoe-San admixture plays a key role, the structure persists after Khoe-San ancestry-masking. The timing of admixture, levels of sex-biased gene flow and population size dynamics also highlight differences in the demographic histories of individual groups. The comparisons with five Iron Age farmer genomes further support genetic continuity over ∼400 years in certain regions of the country. Simulated trait genome-wide association studies further show that the observed population structure could have major implications for biomedical genomics research in South Africa.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Dhriti Sengupta ◽  
◽  
Ananyo Choudhury ◽  
Cesar Fortes-Lima ◽  
Shaun Aron ◽  
...  

AbstractSouth Eastern Bantu-speaking (SEB) groups constitute more than 80% of the population in South Africa. Despite clear linguistic and geographic diversity, the genetic differences between these groups have not been systematically investigated. Based on genome-wide data of over 5000 individuals, representing eight major SEB groups, we provide strong evidence for fine-scale population structure that broadly aligns with geographic distribution and is also congruent with linguistic phylogeny (separation of Nguni, Sotho-Tswana and Tsonga speakers). Although differential Khoe-San admixture plays a key role, the structure persists after Khoe-San ancestry-masking. The timing of admixture, levels of sex-biased gene flow and population size dynamics also highlight differences in the demographic histories of individual groups. The comparisons with five Iron Age farmer genomes further support genetic continuity over ~400 years in certain regions of the country. Simulated trait genome-wide association studies further show that the observed population structure could have major implications for biomedical genomics research in South Africa.


2020 ◽  
Vol 29 (16) ◽  
pp. 2803-2811
Author(s):  
James P Cook ◽  
Anubha Mahajan ◽  
Andrew P Morris

Abstract The UK Biobank is a prospective study of more than 500 000 participants, which has aggregated data from questionnaires, physical measures, biomarkers, imaging and follow-up for a wide range of health-related outcomes, together with genome-wide genotyping supplemented with high-density imputation. Previous studies have highlighted fine-scale population structure in the UK on a North-West to South-East cline, but the impact of unmeasured geographical confounding on genome-wide association studies (GWAS) of complex human traits in the UK Biobank has not been investigated. We considered 368 325 white British individuals from the UK Biobank and performed GWAS of their birth location. We demonstrate that widely used approaches to adjust for population structure, including principal component analysis and mixed modelling with a random effect for a genetic relationship matrix, cannot fully account for the fine-scale geographical confounding in the UK Biobank. We observe significant genetic correlation of birth location with a range of lifestyle-related traits, including body-mass index and fat mass, hypertension and lung function, even after adjustment for population structure. Variants driving associations with birth location are also strongly associated with many of these lifestyle-related traits after correction for population structure, indicating that there could be environmental factors that are confounded with geography that have not been adequately accounted for. Our findings highlight the need for caution in the interpretation of lifestyle-related trait GWAS in UK Biobank, particularly in loci demonstrating strong residual association with birth location.


2019 ◽  
Author(s):  
Grazyella M. Yoshida ◽  
Agustín Barria ◽  
Katharina Correa ◽  
Giovanna Cáceres ◽  
Ana Jedlicki ◽  
...  

AbstractNile tilapia (Oreochromis niloticus) is one of the most produced farmed fish in the world and represents an important source of protein for human consumption. Farmed Nile tilapia populations are increasingly based on genetically improved stocks, which have been established from admixed populations. To date, there is scarce information about the population genomics of farmed Nile tilapia, assessed by dense single nucleotide polymorphism (SNP) panels. The patterns of linkage disequilibrium (LD) may affect the success of genome-wide association studies (GWAS) and genomic selection and can also provide key information about demographic history of farmed Nile tilapia populations. The objectives of this study were to provide further knowledge about the population structure and LD patterns, as well as, estimate the effective population size (Ne) for three farmed Nile tilapia populations, one from Brazil (POP A) and two from Costa Rica (POP B and POP C). A total of 55, 56 and 57 individuals from POP A, POP B and POP C, respectively, were genotyped using a 50K SNP panel selected from a whole-genome sequencing (WGS) experiment. Two principal components explained about 20% of the total variation and clearly discriminated between the three populations. Population genetic structure analysis showed evidence of admixture, especially for POP C. The contemporary Ne values calculated based to LD values, ranged from 71 to 141. No differences were observed in the LD decay among populations, with a rapid decrease of r2 when increasing inter-marker distance. Average r2 between adjacent SNP pairs ranged from 0.03 to 0.18, 0.03 to 0.17 and 0.03 to 0.16 for POP A, POP B and POP C, respectively. Based on the number of independent chromosome segments in the Nile tilapia genome, at least 4.2 K SNP are required for the implementation of GWAS and genomic selection in farmed Nile tilapia populations.


2017 ◽  
Author(s):  
Takafumi Katsumura ◽  
Shoji Oda ◽  
Mitani Hiroshi ◽  
Hiroki Oota

AbstractMedaka is a model organism in medicine, genetics, developmental biology and population genetics. Lab stocks composed of more than 100 local wild populations are available for research in these fields. Thus, medaka represents a potentially excellent bioresource for screening disease-risk- and adaptation-related genes in genome-wide association studies. Although the genetic population structure should be known before performing such an analysis, a comprehensive study on the genome-wide diversity of wild medaka populations has not been performed. Here, we performed genotyping-by-sequencing (GBS) for 81 and 12 medakas captured from a bioresource and the wild, respectively. Based on the GBS data, we evaluated the genetic population structure and estimated the demographic parameters using an approximate Bayesian computation (ABC) framework. The autosomal data confirmed that there were substantial differences between local populations and supported our previously proposed hypothesis on medaka dispersal based on mitochondrial genome (mtDNA) data. A new finding was that a local group that was thought to be a hybrid between the northern and the southern Japanese groups was actually a sister group of the northern Japanese group. Thus, this paper presents the first population-genomic study of medaka and reveals its population structure and history based on autosomal diversity.


Science ◽  
2019 ◽  
Vol 363 (6432) ◽  
pp. 1230-1234 ◽  
Author(s):  
Iñigo Olalde ◽  
Swapan Mallick ◽  
Nick Patterson ◽  
Nadin Rohland ◽  
Vanessa Villalba-Mouco ◽  
...  

We assembled genome-wide data from 271 ancient Iberians, of whom 176 are from the largely unsampled period after 2000 BCE, thereby providing a high-resolution time transect of the Iberian Peninsula. We document high genetic substructure between northwestern and southeastern hunter-gatherers before the spread of farming. We reveal sporadic contacts between Iberia and North Africa by ~2500 BCE and, by ~2000 BCE, the replacement of 40% of Iberia’s ancestry and nearly 100% of its Y-chromosomes by people with Steppe ancestry. We show that, in the Iron Age, Steppe ancestry had spread not only into Indo-European–speaking regions but also into non-Indo-European–speaking ones, and we reveal that present-day Basques are best described as a typical Iron Age population without the admixture events that later affected the rest of Iberia. Additionally, we document how, beginning at least in the Roman period, the ancestry of the peninsula was transformed by gene flow from North Africa and the eastern Mediterranean.


2014 ◽  
Vol 618 ◽  
pp. 278-282
Author(s):  
Tao Peng ◽  
Hao Wang ◽  
Yi Ran Wang ◽  
Wen Wen Xie ◽  
Jia Wei Luo

With the completion of the international HapMap project and the development of high-throughput technologies, designing more effective epistasis detection algorithm for genome-wide data poses a significant challenge. This paper proposes a new method based on the Markov blanket to solve the limitations of the existing algorithm, such as a large false-positive proportion and low accuracy. The algorithm uses G2 to judge the strength of correlation between variables of self-adaptive remove strategy and SNP matching method; to effectively eliminate variables that are unrelated to the target, as well as weak correlation between variables; to significantly reduce the search space and time; to prevent unnecessary retrieval analysis; and to improve the accuracy of the detection algorithm to a certain extent.


Sign in / Sign up

Export Citation Format

Share Document