A Joint Use of Pooling And Imputation For Genotyping SNPs

Mapping Intimacies ◽

10.21203/rs.3.rs-1131930/v1 ◽

2021 ◽

Author(s):

Camille Clouard ◽

Kristiina Ausmees ◽

Carl Nettelblad

Keyword(s):

Large Scale ◽

Group Testing ◽

Snp Genotyping ◽

Genotype Imputation ◽

Model Organisms ◽

Limiting Factor ◽

Pooling Design ◽

Pooled Data ◽

Human Data ◽

Genotype Frequencies

Abstract Background: Despite continuing technological advances, the cost for large-scale genotyping of a high number of samples can be prohibitive. The purpose of this study is to design a cost-saving strategy for SNP genotyping. We suggest making use of pooling, a group testing technique, to drop the amount of SNP arrays needed. We believe that this will be of the greatest importance for non-model organisms with more limited resources in terms of cost-efficient large-scale chips and high-quality reference genomes, such as application in wildlife monitoring, plant and animal breeding, but it is in essence species-agnostic. The proposed approach consists in grouping and mixing individual DNA samples into pools before testing these pools on bead-chips, such that the number of pools is less than the number of individual samples. We present a statistical estimation algorithm, based on the pooling outcomes, for inferring marker-wise the most likely genotype of every sample in each pool. Finally, we input these estimated genotypes into existing imputation algorithms. We compare the imputation performance from pooled data with the Beagle algorithm, and a local likelihood-aware phasing algorithm closely modeled on MaCH that we implemented. Results: We conduct simulations based on human data from the 1000 Genomes Project, to aid comparison with other imputation studies. Based on the simulated data, we find that pooling impacts the genotype frequencies of the directly identifiable markers, without imputation. We also demonstrate how a combinatorial estimation of the genotype probabilities from the pooling design can improve the prediction performance of imputation models. Our algorithm achieves 93% concordance in predicting unassayed markers from pooled data, thus it outperforms the Beagle imputation model which reaches 80% concordance. We observe that the pooling design gives higher concordance for the rare variants than traditional low-density to high-density imputation commonly used for cost-effective genotyping of large cohorts. Conclusions: We present promising results for combining a pooling scheme for SNP genotyping with computational genotype imputation, as demonstrated in simulations on human data, while using half the number of assays needed for sample-wise genotyping. These results could find potential applications in any context where the genotyping costs form a limiting factor on the study size, such as in marker-assisted selection in plant breeding.

Download Full-text

Cutting testing costs by the pooling design

Vojnotehnicki glasnik ◽

10.5937/vojtehg68-28078 ◽

2020 ◽

Vol 68 (4) ◽

pp. 743-759

Author(s):

Dimitrije Čvokić

Keyword(s):

Large Scale ◽

Group Testing ◽

Current Situation ◽

Resource Usage ◽

Pooling Design ◽

Splitting Algorithm ◽

The Matrix ◽

Binary Splitting ◽

Testing Algorithms ◽

Group Testing Algorithms

Introduction/purpose: The purpose of group testing algorithms is to provide a more rational resource usage. Therefore, it is expected to improve the efficiency of large-scale COVID-19 screening as well. Methods: Two variants of non-adaptive group testing approaches are presented: Hwang's generalized binary-splitting algorithm and the matrix strategy. Results: The positive and negative sides of both approaches are discussed. Also, the estimations of the maximum number of tests are given. The matrix strategy is presented with a particular modification which reduces the corresponding estimation of the maximum number of tests and which does not affect the complexity of the procedure. This modification can be interesting from the applicability viewpoint. Conclusion: Taking into account the current situation, it makes sense to consider these methods in order to achieve some resource cuts in testing, thus making the epidemiological measures more efficient than they are now.

Download Full-text

An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms

Comparative and Functional Genomics ◽

10.1155/2007/35604 ◽

2007 ◽

Vol 2007 ◽

pp. 1-7 ◽

Cited By ~ 1

Author(s):

B. Jayashree ◽

Manindra S. Hanspal ◽

Rajgopal Srinivasan ◽

R. Vigneshwaran ◽

Rajeev K. Varshney ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

Open Source ◽

Open Source Software ◽

Large Scale ◽

Sequence Data ◽

Snp Genotyping ◽

Model Organisms ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Web Interfaces

The large amounts of EST sequence data available from a single species of an organism as well as for several species within a genus provide an easy source of identification of intra- and interspecies single nucleotide polymorphisms (SNPs). In the case of model organisms, the data available are numerous, given the degree of redundancy in the deposited EST data. There are several available bioinformatics tools that can be used to mine this data; however, using them requires a certain level of expertise: the tools have to be used sequentially with accompanying format conversion and steps like clustering and assembly of sequences become time-intensive jobs even for moderately sized datasets. We report here a pipeline of open source software extended to run on multiple CPU architectures that can be used to mine large EST datasets for SNPs and identify restriction sites for assaying the SNPs so that cost-effective CAPS assays can be developed for SNP genotyping in genetics and breeding applications. At the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), the pipeline has been implemented to run on a Paracel high-performance system consisting of four dual AMD Opteron processors running Linux with MPICH. The pipeline can be accessed through user-friendly web interfaces at http://hpc.icrisat.cgiar.org/PBSWeb and is available on request for academic use. We have validated the developed pipeline by mining chickpea ESTs for interspecies SNPs, development of CAPS assays for SNP genotyping, and confirmation of restriction digestion pattern at the sequence level.

Download Full-text

Strategies for single nucleotide polymorphism (SNP) genotyping to enhance genotype imputation in Gyr (Bos indicus) dairy cattle: Comparison of commercially available SNP chips

Journal of Dairy Science ◽

10.3168/jds.2014-9213 ◽

2015 ◽

Vol 98 (7) ◽

pp. 4969-4989 ◽

Cited By ~ 20

Author(s):

S.A. Boison ◽

D.J.A. Santos ◽

A.H.T. Utsunomiya ◽

R. Carvalheiro ◽

H.H.R. Neves ◽

...

Keyword(s):

Single Nucleotide Polymorphism ◽

Dairy Cattle ◽

Bos Indicus ◽

Snp Genotyping ◽

Genotype Imputation ◽

Nucleotide Polymorphism ◽

Single Nucleotide

Download Full-text

Current State of (Dis)Integration: Public Health and Fusion Centers

Journal of Homeland Security and Emergency Management ◽

10.1515/jhsem-2018-0076 ◽

2020 ◽

Vol 17 (2) ◽

Author(s):

Cody Minks ◽

Anke Richter

Keyword(s):

Public Health ◽

Law Enforcement ◽

Information Sharing ◽

Homeland Security ◽

Large Scale ◽

Ad Hoc ◽

Limiting Factor ◽

Fusion Centers ◽

Public Health Emergencies ◽

Public Health Information

AbstractObjectiveResponding to large-scale public health emergencies relies heavily on planning and collaboration between law enforcement and public health officials. This study examines the current level of information sharing and integration between these domains by measuring the inclusion of public health in the law enforcement functions of fusion centers.MethodsSurvey of all fusion centers, with a 29.9% response rate.ResultsOnly one of the 23 responding fusion centers had true public health inclusion, a decrease from research conducted in 2007. Information sharing is primarily limited to information flowing out of the fusion center, with little public health information coming in. Most of the collaboration is done on a personal, informal, ad-hoc basis. There remains a large misunderstanding of roles, capabilities, and regulations by all parties (fusion centers and public health). The majority of the parties appear to be willing to work together, but there but there is no forward momentum to make these desires a reality. Funding and staffing issues seem to be the limiting factor for integration.ConclusionThese problems need to be urgently addressed to increase public health preparedness and enable a decisive and beneficial response to public health emergencies involving a homeland security response.

Download Full-text

Towards A Multi-FPGA Infrared Simulator

The Journal of Defense Modeling and Simulation Applications Methodology Technology ◽

10.1177/154851290700400404 ◽

2007 ◽

Vol 4 (4) ◽

pp. 343-355 ◽

Cited By ~ 1

Author(s):

Vinay Sriram ◽

David Kearney

Keyword(s):

Homeland Security ◽

Reconfigurable Computing ◽

High Speed ◽

High Performance ◽

Large Scale ◽

Computation Time ◽

Ccd Camera ◽

Hardware Acceleration ◽

Limiting Factor ◽

Scene Simulation

High speed infrared (IR) scene simulation is used extensively in defense and homeland security to test sensitivity of IR cameras and accuracy of IR threat detection and tracking algorithms used commonly in IR missile approach warning systems (MAWS). A typical MAWS requires an input scene rate of over 100 scenes/second. Infrared scene simulations typically take 32 minutes to simulate a single IR scene that accounts for effects of atmospheric turbulence, refraction, optical blurring and charge-coupled device (CCD) camera electronic noise on a Pentium 4 (2.8GHz) dual core processor [7]. Thus, in IR scene simulation, the processing power of modern computers is a limiting factor. In this paper we report our research to accelerate IR scene simulation using high performance reconfigurable computing. We constructed a multi Field Programmable Gate Array (FPGA) hardware acceleration platform and accelerated a key computationally intensive IR algorithm over the hardware acceleration platform. We were successful in reducing the computation time of IR scene simulation by over 36%. This research acts as a unique case study for accelerating large scale defense simulations using a high performance multi-FPGA reconfigurable computer.

Download Full-text

MIC-Drop: A platform for large-scale in vivo CRISPR screens

Science ◽

10.1126/science.abi8870 ◽

2021 ◽

pp. eabi8870

Author(s):

Saba Parvez ◽

Chelsea Herdman ◽

Manu Beerens ◽

Korak Chakraborti ◽

Zachary P. Harmer ◽

...

Keyword(s):

Large Scale ◽

Cultured Cells ◽

Cardiac Development ◽

Droplet Microfluidics ◽

Model Organisms ◽

Genetic Screens ◽

Large Numbers ◽

And Function ◽

Genome Scale

CRISPR-Cas9 can be scaled up for large-scale screens in cultured cells, but CRISPR screens in animals have been challenging because generating, validating, and keeping track of large numbers of mutant animals is prohibitive. Here, we report Multiplexed Intermixed CRISPR Droplets (MIC-Drop), a platform combining droplet microfluidics, single-needle en masse CRISPR ribonucleoprotein injections, and DNA barcoding to enable large-scale functional genetic screens in zebrafish. The platform can efficiently identify genes responsible for morphological or behavioral phenotypes. In one application, we show MIC-Drop can identify small molecule targets. Furthermore, in a MIC-Drop screen of 188 poorly characterized genes, we discover several genes important for cardiac development and function. With the potential to scale to thousands of genes, MIC-Drop enables genome-scale reverse-genetic screens in model organisms.

Download Full-text

The daunting polygenicity of mental illness: making a new map

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2017.0031 ◽

2018 ◽

Vol 373 (1742) ◽

pp. 20170031 ◽

Cited By ~ 18

Author(s):

Steven E. Hyman

Keyword(s):

Mental Health ◽

Large Scale ◽

Computational Models ◽

Model Organisms ◽

Human Biology ◽

Genetic Studies ◽

Computational Tools ◽

Experimental Systems ◽

Terra Incognita ◽

Phenotypic Complexity

An epochal opportunity to elucidate the pathogenic mechanisms of psychiatric disorders has emerged from advances in genomic technology, new computational tools and the growth of international consortia committed to data sharing. The resulting large-scale, unbiased genetic studies have begun to yield new biological insights and with them the hope that a half century of stasis in psychiatric therapeutics will come to an end. Yet a sobering picture is coming into view; it reveals daunting genetic and phenotypic complexity portending enormous challenges for neurobiology. Successful exploitation of results from genetics will require eschewal of long-successful reductionist approaches to investigation of gene function, a commitment to supplanting much research now conducted in model organisms with human biology, and development of new experimental systems and computational models to analyse polygenic causal influences. In short, psychiatric neuroscience must develop a new scientific map to guide investigation through a polygenic terra incognita . This article is part of a discussion meeting issue ‘Of mice and mental health: facilitating dialogue between basic and clinical neuroscientists’.

Download Full-text

An International Campaign for Agricultural and Livestock Genomics (CALG)

Asia-Pacific Biotech News ◽

10.1142/s0219030302001970 ◽

2002 ◽

Vol 06 (24) ◽

pp. 958-965

Author(s):

Jun Yu ◽

Jian Wang ◽

Huanming Yang

Keyword(s):

Large Scale ◽

Cost Effective ◽

Model Organisms ◽

Environmental Biology ◽

Cdna Sequences ◽

Governmental Agencies ◽

Technology Innovations ◽

A Genome ◽

Starting Point ◽

The Cost

A coordinated international effort to sequence agricultural and livestock genomes has come to its time. While human genome and genomes of many model organisms (related to human health and basic biological interests) have been sequenced or plugged in the sequencing pipelines, agronomically important crop and livestock genomes have not been given high enough priority. Although we are facing many challenges in policy-making, grant funding, regional task emphasis, research community consensus and technology innovations, many initiatives are being announced and formulated based on the cost-effective and large-scale sequencing procedure, known as whole genome shotgun (WGS) sequencing that produces draft sequences covering a genome from 95 percent to 99 percent. Identified genes from such draft sequences, coupled with other resources, such as molecular markers, large-insert clones and cDNA sequences, provide ample information and tools to further our knowledge in agricultural and environmental biology in the genome era that just comes to its accelerated period. If the campaign succeeds, molecular biologists, geneticists and field biologists from all countries, rich or poor, would be brought to the same starting point and expect another astronomical increase of basic genomic information, ready to convert effectively into knowledge that will ultimately change our lives and environment into a greater and better future. We call upon national and international governmental agencies and organizations as well as research foundations to support this unprecedented movement.

Download Full-text

Energy Production Benefits by Wind and Wave Energies for the Autonomous System of Crete

Energies ◽

10.3390/en11102741 ◽

2018 ◽

Vol 11 (10) ◽

pp. 2741 ◽

Cited By ~ 8

Author(s):

George Lavidas ◽

Vengatesan Venugopal

Keyword(s):

Wind Turbine ◽

Wave Energy ◽

Energy Production ◽

Large Scale ◽

Capital Expenditure ◽

Temporal Correlation ◽

Clean Energy ◽

Limiting Factor ◽

Spatio Temporal ◽

Visual Impacts

At autonomous electricity grids Renewable Energy (RE) contributes significantly to energy production. Offshore resources benefit from higher energy density, smaller visual impacts, and higher availability levels. Offshore locations at the West of Crete obtain wind availability ≈80%, combining this with the installation potential for large scale modern wind turbines (rated power) then expected annual benefits are immense. Temporal variability of production is a limiting factor for wider adaptation of large offshore farms. To this end multi-generation with wave energy can alleviate issues of non-generation for wind. Spatio-temporal correlation of wind and wave energy production exhibit that wind and wave hybrid stations can contribute significant amounts of clean energy, while at the same time reducing spatial constrains and public acceptance issues. Offshore technologies can be combined as co-located or not, altering contribution profiles of wave energy to non-operating wind turbine production. In this study a co-located option contributes up to 626 h per annum, while a non co-located solution is found to complement over 4000 h of a non-operative wind turbine. Findings indicate the opportunities associated not only in terms of capital expenditure reduction, but also in the ever important issue of renewable variability and grid stability.

Download Full-text

Influência da vernalização de semente na produção, crescimento e desenvolvimento de plantas de lisianthus

Semina Ciências Agrárias ◽

10.5433/1679-0359.2018v39n6p2325 ◽

2018 ◽

Vol 39 (6) ◽

pp. 2325 ◽

Cited By ~ 1

Author(s):

Maria Yumbla-Orbes ◽

José Geraldo Barbosa ◽

Wagner Campos Otoni ◽

Marcel Santos Montezano ◽

José Antônio Saraiva Grossi ◽

...

Keyword(s):

Large Scale ◽

Limiting Factor ◽

Scale Production ◽

Cut Flower ◽

Cut Flowers ◽

Flower Production ◽

Commercial Scale ◽

Large Scale Production ◽

And Control

Flowering induction and control is a limiting factor when commercially producing cut flowers of lisianthus and seed exposure to low temperatures, a physiological event called vernalization, induces the differentiation of vegetative buds to reproductive buds, contributing to a flowering that is uniform and has quality. The objective of this study was to evaluate the influence of seed vernalization in three cultivars of lisianthus (Excalibur, Echo and Mariachi) for 12, 24, 36 and 48 days at temperatures of 5, 10 and 15°C, in the production and quality of buds, making this technology feasible to large-scale production. During cultivation it was observed that the lower the temperature and higher the vernalization period, the lower the cycle and the greater the number of plants induced to flowering for all three cultivars, and those are important features in the context of flower production in a commercial scale. The seeds subjected to vernalization originated plants that produce flower stems within the standards required by the market, showing that vernalization was efficient to induce flowering without affecting the quality of the buds. To produce lisianthus as a cut flower of quality, it is recommended seed vernalization of Mariachi and Echo cultivars for 24 days at 5°C and Excalibur for 36 days at 5°C.

Download Full-text