Ehapp2: Estimate haplotype frequencies from pooled sequencing data with prior database information

Chang-Chang Cao; Xiao Sun

doi:10.1142/s0219720016500177

Ehapp2: Estimate haplotype frequencies from pooled sequencing data with prior database information

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720016500177 ◽

2016 ◽

Vol 14 (04) ◽

pp. 1650017

Author(s):

Chang-Chang Cao ◽

Xiao Sun

Keyword(s):

Large Scale ◽

Linear Equations ◽

Cost Effective ◽

Relative Difference ◽

Sequencing Error ◽

Sequencing Data ◽

Sequencing Errors ◽

Pooled Sequencing ◽

Haplotype Frequencies ◽

The Cost

To reduce the cost of large-scale re-sequencing, multiple individuals are pooled together and sequenced called pooled sequencing. Pooled sequencing could provide a cost-effective alternative to sequencing individuals separately. To facilitate the application of pooled sequencing in haplotype-based diseases association analysis, the critical procedure is to accurately estimate haplotype frequencies from pooled samples. Here we present Ehapp2 for estimating haplotype frequencies from pooled sequencing data by utilizing a database which provides prior information of known haplotypes. We first translate the problem of estimating frequency for each haplotype into finding a sparse solution for a system of linear equations, where the NNREG algorithm is employed to achieve the solution. Simulation experiments reveal that Ehapp2 is robust to sequencing errors and able to estimate the frequencies of haplotypes with less than 3% average relative difference for pooled sequencing of mixture of real Drosophila haplotypes with 50× total coverage even when the sequencing error rate is as high as 0.05. Owing to the strategy that proportions for local haplotypes spanning multiple SNPs are accurately calculated first, Ehapp2 retains excellent estimation for recombinant haplotypes resulting from chromosomal crossover. Comparisons with present methods reveal that Ehapp2 is state-of-the-art for many sequencing study designs and more suitable for current massive parallel sequencing.

Download Full-text

Handling of targeted amplicon sequencing data focusing on index hopping and demultiplexing using a nested metabarcoding approach in ecology

Scientific Reports ◽

10.1038/s41598-021-98018-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yasemin Guenay-Greunke ◽

David A. Bohan ◽

Michael Traugott ◽

Corinna Wallinger

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Cost Effective ◽

Amplicon Sequencing ◽

Sequencing Depth ◽

Sequencing Error ◽

Sequencing Data ◽

Large Sample ◽

Sequencing Errors ◽

Plant Feeding

AbstractHigh-throughput sequencing platforms are increasingly being used for targeted amplicon sequencing because they enable cost-effective sequencing of large sample sets. For meaningful interpretation of targeted amplicon sequencing data and comparison between studies, it is critical that bioinformatic analyses do not introduce artefacts and rely on detailed protocols to ensure that all methods are properly performed and documented. The analysis of large sample sets and the use of predefined indexes create challenges, such as adjusting the sequencing depth across samples and taking sequencing errors or index hopping into account. However, the potential biases these factors introduce to high-throughput amplicon sequencing data sets and how they may be overcome have rarely been addressed. On the example of a nested metabarcoding analysis of 1920 carabid beetle regurgitates to assess plant feeding, we investigated: (i) the variation in sequencing depth of individually tagged samples and the effect of library preparation on the data output; (ii) the influence of sequencing errors within index regions and its consequences for demultiplexing; and (iii) the effect of index hopping. Our results demonstrate that despite library quantification, large variation in read counts and sequencing depth occurred among samples and that the sequencing error rate in bioinformatic software is essential for accurate adapter/primer trimming and demultiplexing. Moreover, setting an index hopping threshold to avoid incorrect assignment of samples is highly recommended.

Download Full-text

A Review on Terahertz Technologies Accelerated by Silicon Photonics

Nanomaterials ◽

10.3390/nano11071646 ◽

2021 ◽

Vol 11 (7) ◽

pp. 1646

Author(s):

Jingya Xie ◽

Wangcheng Ye ◽

Linjie Zhou ◽

Xuguang Guo ◽

Xiaofei Zang ◽

...

Keyword(s):

Integrated Circuit ◽

Silicon Photonics ◽

Large Scale ◽

Cost Effective ◽

Single Chip ◽

Photonic Integrated Circuit ◽

Hybrid Silicon ◽

The Cost ◽

Silicon Based ◽

Thz Applications

In the last couple of decades, terahertz (THz) technologies, which lie in the frequency gap between the infrared and microwaves, have been greatly enhanced and investigated due to possible opportunities in a plethora of THz applications, such as imaging, security, and wireless communications. Photonics has led the way to the generation, modulation, and detection of THz waves such as the photomixing technique. In tandem with these investigations, researchers have been exploring ways to use silicon photonics technologies for THz applications to leverage the cost-effective large-scale fabrication and integration opportunities that it would enable. Although silicon photonics has enabled the implementation of a large number of optical components for practical use, for THz integrated systems, we still face several challenges associated with high-quality hybrid silicon lasers, conversion efficiency, device integration, and fabrication. This paper provides an overview of recent progress in THz technologies based on silicon photonics or hybrid silicon photonics, including THz generation, detection, phase modulation, intensity modulation, and passive components. As silicon-based electronic and photonic circuits are further approaching THz frequencies, one single chip with electronics, photonics, and THz functions seems inevitable, resulting in the ultimate dream of a THz electronic–photonic integrated circuit.

Download Full-text

An International Campaign for Agricultural and Livestock Genomics (CALG)

Asia-Pacific Biotech News ◽

10.1142/s0219030302001970 ◽

2002 ◽

Vol 06 (24) ◽

pp. 958-965

Author(s):

Jun Yu ◽

Jian Wang ◽

Huanming Yang

Keyword(s):

Large Scale ◽

Cost Effective ◽

Model Organisms ◽

Environmental Biology ◽

Cdna Sequences ◽

Governmental Agencies ◽

Technology Innovations ◽

A Genome ◽

Starting Point ◽

The Cost

A coordinated international effort to sequence agricultural and livestock genomes has come to its time. While human genome and genomes of many model organisms (related to human health and basic biological interests) have been sequenced or plugged in the sequencing pipelines, agronomically important crop and livestock genomes have not been given high enough priority. Although we are facing many challenges in policy-making, grant funding, regional task emphasis, research community consensus and technology innovations, many initiatives are being announced and formulated based on the cost-effective and large-scale sequencing procedure, known as whole genome shotgun (WGS) sequencing that produces draft sequences covering a genome from 95 percent to 99 percent. Identified genes from such draft sequences, coupled with other resources, such as molecular markers, large-insert clones and cDNA sequences, provide ample information and tools to further our knowledge in agricultural and environmental biology in the genome era that just comes to its accelerated period. If the campaign succeeds, molecular biologists, geneticists and field biologists from all countries, rich or poor, would be brought to the same starting point and expect another astronomical increase of basic genomic information, ready to convert effectively into knowledge that will ultimately change our lives and environment into a greater and better future. We call upon national and international governmental agencies and organizations as well as research foundations to support this unprecedented movement.

Download Full-text

Cost-Effectiveness Analysis of a Large-Scale Crèche Intervention to Prevent Child Drowning in Rural Bangladesh

10.21203/rs.3.rs-402414/v1 ◽

2021 ◽

Author(s):

Y. Natalia Alfonso ◽

Adnan A Hyder ◽

Olakunle Alonge ◽

Shumona Sharmin Salam ◽

Kamran Baset ◽

...

Keyword(s):

Cost Effectiveness ◽

Societal Perspective ◽

Large Scale ◽

Operating Cost ◽

Cost Effective ◽

Economic Benefits ◽

Program Cost ◽

Rural Bangladesh ◽

Life Years ◽

The Cost

Abstract Drowning is the leading cause of death among children 12-59 months old in rural Bangladesh. This study evaluated the cost-effectiveness of a large-scale crèche intervention in preventing child drowning. Estimates of the effectiveness of the crèches was based on prior studies and the program cost was assessed using monthly program expenditures captured prospectively throughout the study period from two different implementing agencies. The study evaluated the cost-effectiveness from both a program and societal perspective. Results showed that from the program perspective the annual operating cost of a crèche was $416.35 (95%C.I.: $222 to $576), the annual cost per child was $16 (95%C.I.: $9 to $22) and the incremental-cost-effectiveness ratio (ICER) per life saved with the crèches was $17,803 (95%C.I.: $9,051 to $27,625). From the societal perspective (including parents time valued) the ICER per life saved was -$176,62 (95%C.I.: -$347,091 to -$67,684)—meaning crèches generated net economic benefits per child enrolled. Based on the ICER per disability-adjusted-life years averted from the societal perspective (excluding parents time), $2,020, the crèche intervention was cost-effective even when the societal economic benefits were ignored. Based on the evidence, the creche intervention has great potential for reducing child drowning at a cost that is reasonable.

Download Full-text

A review of gasification of bio-oil for gas production

Sustainable Energy & Fuels ◽

10.1039/c8se00553b ◽

2019 ◽

Vol 3 (7) ◽

pp. 1600-1622 ◽

Cited By ~ 4

Author(s):

Ji-Lu Zheng ◽

Ya-Hong Zhu ◽

Ming-Qiang Zhu ◽

Kang Kang ◽

Run-Cang Sun

Keyword(s):

Large Scale ◽

Cost Effective ◽

Gas Production ◽

Commercial Production ◽

Advanced Fuels ◽

Bio Oil ◽

Effective Transport ◽

The Cost

The commercial production of advanced fuels based on bio-oil gasification could be promising because the cost-effective transport of bio-oil could promote large-scale implementation of this biomass technology.

Download Full-text

Variant calling and quality control of large-scale human genome sequencing data

Emerging Topics in Life Sciences ◽

10.1042/etls20190007 ◽

2019 ◽

Vol 3 (4) ◽

pp. 399-409 ◽

Cited By ~ 1

Author(s):

Brandon Jew ◽

Jae Hoon Sul

Keyword(s):

Quality Control ◽

Genome Sequencing ◽

Genetic Variants ◽

Large Scale ◽

Variant Calling ◽

Sequencing Data ◽

Computational Approaches ◽

Sequencing Errors ◽

Human Genome Sequencing ◽

Number Of Individuals

Abstract Next-generation sequencing has allowed genetic studies to collect genome sequencing data from a large number of individuals. However, raw sequencing data are not usually interpretable due to fragmentation of the genome and technical biases; therefore, analysis of these data requires many computational approaches. First, for each sequenced individual, sequencing data are aligned and further processed to account for technical biases. Then, variant calling is performed to obtain information on the positions of genetic variants and their corresponding genotypes. Quality control (QC) is applied to identify individuals and genetic variants with sequencing errors. These procedures are necessary to generate accurate variant calls from sequencing data, and many computational approaches have been developed for these tasks. This review will focus on current widely used approaches for variant calling and QC.

Download Full-text

Validation of variants using cost effective highresolution melting (HRM) analysis predicted from target re-sequencing in Eucalyptus

Acta Botanica Croatica ◽

10.37427/botcro-2020-019 ◽

2020 ◽

Vol 79 (2) ◽

pp. 105-113

Author(s):

Abdul Bari Muneera Parveen ◽

Divya Lakshmanan ◽

Modhumita Ghosh Dasgupta

Keyword(s):

Next Generation Sequencing ◽

Large Scale ◽

Sequence Data ◽

Cost Effective ◽

Nucleotide Polymorphisms ◽

Next Generation ◽

Time Saving ◽

Hrm Analysis ◽

The Cost ◽

Generation Sequencing

The advent of next-generation sequencing has facilitated large-scale discovery and mapping of genomic variants for high-throughput genotyping. Several research groups working in tree species are presently employing next generation sequencing (NGS) platforms for marker discovery, since it is a cost effective and time saving strategy. However, most trees lack a chromosome level genome map and validation of variants for downstream application becomes obligatory. The cost associated with identifying potential variants from the enormous amount of sequence data is a major limitation. In the present study, high resolution melting (HRM) analysis was optimized for rapid validation of single nucleotide polymorphisms (SNPs), insertions or deletions (InDels) and simple sequence repeats (SSRs) predicted from exome sequencing of parents and hybrids of Eucalyptus tereticornis Sm. ? Eucalyptus grandis Hill ex Maiden generated from controlled hybridization. The cost per data point was less than 0.5 USD, providing great flexibility in terms of cost and sensitivity, when compared to other validation methods. The sensitivity of this technology in variant detection can be extended to other applications including Bar-HRM for species authentication and TILLING for detection of mutants.

Download Full-text

A cost effective approach to stormwater management? source control and distributed storage

Water Science & Technology ◽

10.2166/wst.1997.0684 ◽

1997 ◽

Vol 36 (8-9) ◽

pp. 307-311 ◽

Cited By ~ 8

Author(s):

R. Y. G. Andoh ◽

C. Declerck

Keyword(s):

Large Scale ◽

Traditional Approach ◽

Distributed Storage ◽

Cost Effective ◽

Source Control ◽

Slowing Down ◽

Surface Areas ◽

Cost Effective Approach ◽

Receiving Waters ◽

The Cost

Rapid urbanisation and its consequent increase in impermeable surface areas and changes in land use has generally resulted in problems of flooding and heavy pollution of urban streams and other receiving waters. This has often been coupled with ground water depletion and a threat to water resources. The first part of this paper presents an alternative drainage philosophy and strategy which mimics nature's way by slowing down (attenuating) the movement of urban runoff. This approach results in cost-effective, affordable and sustainable drainage schemes. The alternative strategy can be described as one of prevention rather than cure by effecting controls closer to source rather than the traditional approach which results in the transfer of problems downstream, resulting in its cumulation and the need for large scale, centralised control. The second part describes a research project which has been launched in order to quantify the cost and operational benefits of source control and distributed storage. Details of the methodology of the modelling and simulation processes which are being followed to achieve this target are presented.

Download Full-text

Accurate estimation of haplotype frequency from pooled sequencing data and cost-effective identification of rare haplotype carriers by overlapping pool sequencing

Bioinformatics ◽

10.1093/bioinformatics/btu670 ◽

2014 ◽

Vol 31 (4) ◽

pp. 515-522 ◽

Cited By ~ 8

Author(s):

Chang-Chang Cao ◽

Xiao Sun

Keyword(s):

Haplotype Frequency ◽

Cost Effective ◽

Accurate Estimation ◽

Sequencing Data ◽

Rare Haplotype ◽

Pooled Sequencing

Download Full-text

Seismic Vulnerability Assessment of Liquid Storage Tanks Isolated by Sliding-Based Systems

Advances in Civil Engineering ◽

10.1155/2018/5304245 ◽

2018 ◽

Vol 2018 ◽

pp. 1-14 ◽

Cited By ~ 4

Author(s):

Alexandros Tsipianitis ◽

Yiannis Tsompanakis

Keyword(s):

Large Scale ◽

Seismic Vulnerability ◽

Base Isolation ◽

Cost Effective ◽

Storage Tanks ◽

Modes Of Failure ◽

Liquid Storage ◽

Near Fault Earthquakes ◽

Friction Pendulum ◽

The Cost

Liquid-filled tanks are effective storage infrastructure for water, oil, and liquefied natural gas (LNG). Many such large-scale tanks are located in regions with high seismicity. Therefore, very frequently base isolation technology has to be adopted to reduce the dynamic distress of storage tanks, preventing the structure from typical modes of failure, such as elephant-foot buckling, diamond-shaped buckling, and roof damage caused by liquid sloshing. The cost-effective seismic design of base-isolated liquid storage tanks can be achieved by adopting performance-based design (PBD) principles. In this work, the focus is given on sliding-based systems, namely, single friction pendulum bearings (SFPBs), triple friction pendulum bearings (TFPBs), and mainly on the recently developed quintuple friction pendulum bearings (QFPBs). More specifically, the study is focused on the fragility analysis of tanks isolated by sliding-bearings, emphasizing on isolators’ displacements due to near-fault earthquakes. In addition, a surrogate model has been developed for simulating the dynamic response of the superstructure (tank and liquid content) to achieve an optimal balance between computational efficiency and accuracy.

Download Full-text