Genetic-substructure and complex demographic history of South African Bantu speakers

AbstractSouth Eastern Bantu-speaking (SEB) groups constitute more than 80% of the population in South Africa. Despite clear linguistic and geographic diversity, the genetic differences between these groups have not been systematically investigated. Based on genome-wide data of over 5000 individuals, representing eight major SEB groups, we provide strong evidence for fine-scale population structure that broadly aligns with geographic distribution and is also congruent with linguistic phylogeny (separation of Nguni, Sotho-Tswana and Tsonga speakers). Although differential Khoe-San admixture plays a key role, the structure persists after Khoe-San ancestry-masking. The timing of admixture, levels of sex-biased gene flow and population size dynamics also highlight differences in the demographic histories of individual groups. The comparisons with five Iron Age farmer genomes further support genetic continuity over ∼400 years in certain regions of the country. Simulated trait genome-wide association studies further show that the observed population structure could have major implications for biomedical genomics research in South Africa.

Download Full-text

Genetic substructure and complex demographic history of South African Bantu speakers

Nature Communications ◽

10.1038/s41467-021-22207-y ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Dhriti Sengupta ◽

◽

Ananyo Choudhury ◽

Cesar Fortes-Lima ◽

Shaun Aron ◽

...

Keyword(s):

South Africa ◽

Population Structure ◽

Iron Age ◽

Association Studies ◽

Demographic History ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Scale Population ◽

Genome Wide Data ◽

Genetic Substructure

AbstractSouth Eastern Bantu-speaking (SEB) groups constitute more than 80% of the population in South Africa. Despite clear linguistic and geographic diversity, the genetic differences between these groups have not been systematically investigated. Based on genome-wide data of over 5000 individuals, representing eight major SEB groups, we provide strong evidence for fine-scale population structure that broadly aligns with geographic distribution and is also congruent with linguistic phylogeny (separation of Nguni, Sotho-Tswana and Tsonga speakers). Although differential Khoe-San admixture plays a key role, the structure persists after Khoe-San ancestry-masking. The timing of admixture, levels of sex-biased gene flow and population size dynamics also highlight differences in the demographic histories of individual groups. The comparisons with five Iron Age farmer genomes further support genetic continuity over ~400 years in certain regions of the country. Simulated trait genome-wide association studies further show that the observed population structure could have major implications for biomedical genomics research in South Africa.

Download Full-text

Fine-scale population structure in the UK Biobank: implications for genome-wide association studies

Human Molecular Genetics ◽

10.1093/hmg/ddaa157 ◽

2020 ◽

Vol 29 (16) ◽

pp. 2803-2811

Author(s):

James P Cook ◽

Anubha Mahajan ◽

Andrew P Morris

Keyword(s):

Population Structure ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Fine Scale ◽

Uk Biobank ◽

Genome Wide ◽

Scale Population ◽

The Uk ◽

The Impact

Abstract The UK Biobank is a prospective study of more than 500 000 participants, which has aggregated data from questionnaires, physical measures, biomarkers, imaging and follow-up for a wide range of health-related outcomes, together with genome-wide genotyping supplemented with high-density imputation. Previous studies have highlighted fine-scale population structure in the UK on a North-West to South-East cline, but the impact of unmeasured geographical confounding on genome-wide association studies (GWAS) of complex human traits in the UK Biobank has not been investigated. We considered 368 325 white British individuals from the UK Biobank and performed GWAS of their birth location. We demonstrate that widely used approaches to adjust for population structure, including principal component analysis and mixed modelling with a random effect for a genetic relationship matrix, cannot fully account for the fine-scale geographical confounding in the UK Biobank. We observe significant genetic correlation of birth location with a range of lifestyle-related traits, including body-mass index and fat mass, hypertension and lung function, even after adjustment for population structure. Variants driving associations with birth location are also strongly associated with many of these lifestyle-related traits after correction for population structure, indicating that there could be environmental factors that are confounded with geography that have not been adequately accounted for. Our findings highlight the need for caution in the interpretation of lifestyle-related trait GWAS in UK Biobank, particularly in loci demonstrating strong residual association with birth location.

Download Full-text

Genome-wide patterns of population structure and linkage disequilibrium in farmed Nile tilapia (Oreochromis niloticus)

10.1101/519801 ◽

2019 ◽

Cited By ~ 3

Author(s):

Grazyella M. Yoshida ◽

Agustín Barria ◽

Katharina Correa ◽

Giovanna Cáceres ◽

Ana Jedlicki ◽

...

Keyword(s):

Population Structure ◽

Linkage Disequilibrium ◽

Genomic Selection ◽

Oreochromis Niloticus ◽

Nile Tilapia ◽

Population Genomics ◽

Association Studies ◽

Demographic History ◽

Genome Wide Association Studies ◽

Genome Wide

AbstractNile tilapia (Oreochromis niloticus) is one of the most produced farmed fish in the world and represents an important source of protein for human consumption. Farmed Nile tilapia populations are increasingly based on genetically improved stocks, which have been established from admixed populations. To date, there is scarce information about the population genomics of farmed Nile tilapia, assessed by dense single nucleotide polymorphism (SNP) panels. The patterns of linkage disequilibrium (LD) may affect the success of genome-wide association studies (GWAS) and genomic selection and can also provide key information about demographic history of farmed Nile tilapia populations. The objectives of this study were to provide further knowledge about the population structure and LD patterns, as well as, estimate the effective population size (Ne) for three farmed Nile tilapia populations, one from Brazil (POP A) and two from Costa Rica (POP B and POP C). A total of 55, 56 and 57 individuals from POP A, POP B and POP C, respectively, were genotyped using a 50K SNP panel selected from a whole-genome sequencing (WGS) experiment. Two principal components explained about 20% of the total variation and clearly discriminated between the three populations. Population genetic structure analysis showed evidence of admixture, especially for POP C. The contemporary Ne values calculated based to LD values, ranged from 71 to 141. No differences were observed in the LD decay among populations, with a rapid decrease of r2 when increasing inter-marker distance. Average r2 between adjacent SNP pairs ranged from 0.03 to 0.18, 0.03 to 0.17 and 0.03 to 0.16 for POP A, POP B and POP C, respectively. Based on the number of independent chromosome segments in the Nile tilapia genome, at least 4.2 K SNP are required for the implementation of GWAS and genomic selection in farmed Nile tilapia populations.

Download Full-text

Medaka population genome structure and demographic history described via genotyping-by-sequencing

10.1101/233411 ◽

2017 ◽

Author(s):

Takafumi Katsumura ◽

Shoji Oda ◽

Mitani Hiroshi ◽

Hiroki Oota

Keyword(s):

Population Structure ◽

Disease Risk ◽

Association Studies ◽

Demographic History ◽

Genotyping By Sequencing ◽

Genetic Population Structure ◽

Genome Wide Association Studies ◽

Genetic Population ◽

Genome Wide ◽

Genomic Study

AbstractMedaka is a model organism in medicine, genetics, developmental biology and population genetics. Lab stocks composed of more than 100 local wild populations are available for research in these fields. Thus, medaka represents a potentially excellent bioresource for screening disease-risk- and adaptation-related genes in genome-wide association studies. Although the genetic population structure should be known before performing such an analysis, a comprehensive study on the genome-wide diversity of wild medaka populations has not been performed. Here, we performed genotyping-by-sequencing (GBS) for 81 and 12 medakas captured from a bioresource and the wild, respectively. Based on the GBS data, we evaluated the genetic population structure and estimated the demographic parameters using an approximate Bayesian computation (ABC) framework. The autosomal data confirmed that there were substantial differences between local populations and supported our previously proposed hypothesis on medaka dispersal based on mitochondrial genome (mtDNA) data. A new finding was that a local group that was thought to be a hybrid between the northern and the southern Japanese groups was actually a sister group of the northern Japanese group. Thus, this paper presents the first population-genomic study of medaka and reveals its population structure and history based on autosomal diversity.

Download Full-text

Effects of Population Structure in Genome-wide Association Studies

Analysis of Complex Disease Association Studies ◽

10.1016/b978-0-12-375142-3.10009-4 ◽

2011 ◽

pp. 123-156 ◽

Cited By ~ 1

Author(s):

Yurii S. Aulchenko

Keyword(s):

Population Structure ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

The genomic history of the Iberian Peninsula over the past 8000 years

Science ◽

10.1126/science.aav4040 ◽

2019 ◽

Vol 363 (6432) ◽

pp. 1230-1234 ◽

Cited By ~ 84

Author(s):

Iñigo Olalde ◽

Swapan Mallick ◽

Nick Patterson ◽

Nadin Rohland ◽

Vanessa Villalba-Mouco ◽

...

Keyword(s):

Iberian Peninsula ◽

Iron Age ◽

North Africa ◽

Eastern Mediterranean ◽

Hunter Gatherers ◽

Y Chromosomes ◽

Genome Wide ◽

Genome Wide Data ◽

History Of ◽

Genetic Substructure

We assembled genome-wide data from 271 ancient Iberians, of whom 176 are from the largely unsampled period after 2000 BCE, thereby providing a high-resolution time transect of the Iberian Peninsula. We document high genetic substructure between northwestern and southeastern hunter-gatherers before the spread of farming. We reveal sporadic contacts between Iberia and North Africa by ~2500 BCE and, by ~2000 BCE, the replacement of 40% of Iberia’s ancestry and nearly 100% of its Y-chromosomes by people with Steppe ancestry. We show that, in the Iron Age, Steppe ancestry had spread not only into Indo-European–speaking regions but also into non-Indo-European–speaking ones, and we reveal that present-day Basques are best described as a typical Iron Age population without the admixture events that later affected the rest of Iberia. Additionally, we document how, beginning at least in the Roman period, the ancestry of the peninsula was transformed by gene flow from North Africa and the eastern Mediterranean.

Download Full-text

Impacts of Population Structure and Analytical Models in Genome-Wide Association Studies of Complex Traits in Forest Trees: A Case Study in Eucalyptus globulus

PLoS ONE ◽

10.1371/journal.pone.0081267 ◽

2013 ◽

Vol 8 (11) ◽

pp. e81267 ◽

Cited By ~ 40

Author(s):

Eduardo P. Cappa ◽

Yousry A. El-Kassaby ◽

Martín N. Garcia ◽

Cintia Acuña ◽

Nuno M. G. Borralho ◽

...

Keyword(s):

Population Structure ◽

Complex Traits ◽

Eucalyptus Globulus ◽

Association Studies ◽

Genome Wide Association ◽

Analytical Models ◽

Genome Wide Association Studies ◽

Forest Trees ◽

Genome Wide

Download Full-text

A-MBED: Adaptive Markov Blanket Method for Epitasis Detection in Genome-Wide Association Studies

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.618.278 ◽

2014 ◽

Vol 618 ◽

pp. 278-282

Author(s):

Tao Peng ◽

Hao Wang ◽

Yi Ran Wang ◽

Wen Wen Xie ◽

Jia Wei Luo

Keyword(s):

Association Studies ◽

Search Space ◽

Detection Algorithm ◽

International Hapmap Project ◽

Genome Wide Association Studies ◽

Weak Correlation ◽

Matching Method ◽

Markov Blanket ◽

Genome Wide ◽

Genome Wide Data

With the completion of the international HapMap project and the development of high-throughput technologies, designing more effective epistasis detection algorithm for genome-wide data poses a significant challenge. This paper proposes a new method based on the Markov blanket to solve the limitations of the existing algorithm, such as a large false-positive proportion and low accuracy. The algorithm uses G2 to judge the strength of correlation between variables of self-adaptive remove strategy and SNP matching method; to effectively eliminate variables that are unrelated to the target, as well as weak correlation between variables; to significantly reduce the search space and time; to prevent unnecessary retrieval analysis; and to improve the accuracy of the detection algorithm to a certain extent.

Download Full-text