The Impact of Whole Genome Sequence Data on Drug Discovery—A Malaria Case Study

Abstract Insights into genetic loci which are under selection and their functional roles contribute to increased understanding of the patterns of phenotypic variation we observe today. The availability of whole-genome sequence data, for humans and other species, provides opportunities to investigate adaptation and evolution at unprecedented resolution. Many analytical methods have been developed to interrogate these large data sets and characterize signatures of selection in the genome. We review here recently developed methods and consider the impact of increased computing power and data availability on the detection of selection signatures. Consideration of demography, recombination and other confounding factors is important, and use of a range of methods in combination is a powerful route to resolving different forms of selection in genome sequence data. Overall, a substantial improvement in methods for application to whole-genome sequencing is evident, although further work is required to develop robust and computationally efficient approaches which may increase reproducibility across studies.

Download Full-text

An evaluation of alternative methods for constructing phylogenies from whole genome sequence data: a case study withSalmonella

PeerJ ◽

10.7717/peerj.620 ◽

2014 ◽

Vol 2 ◽

pp. e620 ◽

Cited By ~ 34

Author(s):

James B. Pettengill ◽

Yan Luo ◽

Steven Davis ◽

Yi Chen ◽

Narjol Gonzalez-Escalona ◽

...

Keyword(s):

Genome Sequence ◽

Sequence Data ◽

Alternative Methods ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequence Data

Download Full-text

Peer Review #2 of "An evaluation of alternative methods for constructing phylogenies from whole genome sequence data: a case study with Salmonella (v0.2)"

10.7287/peerj.620v0.2/reviews/2 ◽

2014 ◽

Author(s):

NJ Loman

Keyword(s):

Peer Review ◽

Genome Sequence ◽

Sequence Data ◽

Alternative Methods ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequence Data

Download Full-text

Peer Review #3 of "An evaluation of alternative methods for constructing phylogenies from whole genome sequence data: a case study with Salmonella (v0.2)"

10.7287/peerj.620v0.2/reviews/3 ◽

2014 ◽

Author(s):

K Holt

Keyword(s):

Peer Review ◽

Genome Sequence ◽

Sequence Data ◽

Alternative Methods ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequence Data

Download Full-text

Evaluation of sequencing strategies for whole-genome imputation with hybrid peeling

10.1101/824631 ◽

2019 ◽

Cited By ~ 1

Author(s):

Roger Ros-Freixedes ◽

Andrew Whalen ◽

Gregor Gorjanc ◽

Alan J Mileham ◽

John M Hickey

Keyword(s):

Genome Sequence ◽

Sequence Data ◽

Imputation Accuracy ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequence Data ◽

Uniform Coverage ◽

Sequencing Strategy ◽

Four Levels ◽

The Impact

AbstractBackgroundFor assembling large whole-genome sequence datasets to be used routinely in research and breeding, the sequencing strategy should be adapted to the methods that will later be used for variant discovery and imputation. In this study we used simulation to explore the impact that the sequencing strategy and level of sequencing investment have on the overall accuracy of imputation using hybrid peeling, a pedigree-based imputation method well-suited for large livestock populations.MethodsWe simulated marker array and whole-genome sequence data for fifteen populations with simulated or real pedigrees that had different structures. In these populations we evaluated the effect on imputation accuracy of seven methods for selecting which individuals to sequence, the generation of the pedigree to which the sequenced individuals belonged, the use of variable or uniform coverage, and the trade-off between the number of sequenced individuals and their sequencing coverage. For each population we considered four levels of investment in sequencing that were proportional to the size of the population.ResultsImputation accuracy largely depended on pedigree depth. The distribution of the sequenced individuals across the generations of the pedigree underlay the performance of the different methods used to select individuals to sequence. Additionally, it was critical to balance high imputation accuracy in early generations as well as in late generations. Imputation accuracy was highest with a uniform coverage across the sequenced individuals of around 2x rather than variable coverage. An investment equivalent to the cost of sequencing 2% of the population at 2x provided high imputation accuracy. The gain in imputation accuracy from additional investment diminished with larger populations and larger levels of investment. However, to achieve the same imputation accuracy, a proportionally greater investment must be used in the smaller populations compared to the larger ones.ConclusionsSuitable sequencing strategies for subsequent imputation with hybrid peeling involve sequencing around 2% of the population at a uniform coverage around 2x, distributed preferably from the third generation of the pedigree onwards. Such sequencing strategies are beneficial for generating whole-genome sequence data in populations with deep pedigrees of closely related individuals.

Download Full-text

Faculty Opinions recommendation of Optimal algorithms for haplotype assembly from whole-genome sequence data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.13339986.14707085 ◽

2011 ◽

Author(s):

Alejandro Schaffer

Keyword(s):

Genome Sequence ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Optimal Algorithms ◽

Genome Sequence Data ◽

Haplotype Assembly

Download Full-text

TIGER: inferring DNA replication timing from whole-genome sequence data

Bioinformatics ◽

10.1093/bioinformatics/btab166 ◽

2021 ◽

Cited By ~ 1

Author(s):

Amnon Koren ◽

Dashiell J Massey ◽

Alexa N Bracci

Keyword(s):

Dna Replication ◽

Genome Sequence ◽

Genomic Dna ◽

Sequence Data ◽

Replication Timing ◽

Whole Genome Sequence ◽

Supplementary Information ◽

Whole Genome ◽

Genome Sequence Data ◽

Dna Replication Timing

Abstract Motivation Genomic DNA replicates according to a reproducible spatiotemporal program, with some loci replicating early in S phase while others replicate late. Despite being a central cellular process, DNA replication timing studies have been limited in scale due to technical challenges. Results We present TIGER (Timing Inferred from Genome Replication), a computational approach for extracting DNA replication timing information from whole genome sequence data obtained from proliferating cell samples. The presence of replicating cells in a biological specimen leads to non-uniform representation of genomic DNA that depends on the timing of replication of different genomic loci. Replication dynamics can hence be observed in genome sequence data by analyzing DNA copy number along chromosomes while accounting for other sources of sequence coverage variation. TIGER is applicable to any species with a contiguous genome assembly and rivals the quality of experimental measurements of DNA replication timing. It provides a straightforward approach for measuring replication timing and can readily be applied at scale. Availability and Implementation TIGER is available at https://github.com/TheKorenLab/TIGER. Supplementary information Supplementary data are available at Bioinformatics online

Download Full-text

Whole genome sequence data of Bacillus australimaris strain B28A, isolated from Marine Water in India

Data in Brief ◽

10.1016/j.dib.2021.107240 ◽

2021 ◽

pp. 107240

Author(s):

Wael Ali Mohammed Hadi ◽

Boby T Edwin ◽

A Jayakumaran Nair

Keyword(s):

Genome Sequence ◽

Sequence Data ◽

Marine Water ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequence Data

Download Full-text