doepipeline: a systematic approach to optimizing multi-level and multi-step data processing workflows

AbstractBackgroundSelecting the proper parameter settings for bioinformatic software tools is challenging. Not only will each parameter have an individual effect on the outcome, but there are also potential interaction effects between parameters. Both of these effects may be difficult to predict. To make the situation even more complex, multiple tools may be run in a sequential pipeline where the final output depends on the parameter configuration for each tool in the pipeline. Because of the complexity and difficulty of predicting outcomes, in practice parameters are often left at default settings or set based on personal or peer experience obtained in a trial and error fashion. To allow for the reliable and efficient selection of parameters for bioinformatic pipelines, a systematic approach is needed.ResultsWe presentdoepipeline, a novel approach to optimizing bioinformatic software parameters, based on core concepts of the Design of Experiments methodology and recent advances in subset designs. Optimal parameter settings are first approximated in a screening phase using a subset design that efficiently spans the entire search space, then optimized in the subsequent phase using response surface designs and OLS modeling.doepipelinewas used to optimize parameters in four use cases; 1) de-novo assembly, 2) scaffolding of a fragmented genome assembly, 3) k-mer taxonomic classification of Oxford Nanopore Technologies MinION reads, and 4) genetic variant calling. In all four cases,doepipelinefound parameter settings that produced a better outcome with respect to the characteristic measured when compared to using default values. Our approach is implemented and available in the Python packagedoepipeline.ConclusionsOur proposed methodology provides a systematic and robust framework for optimizing software parameter settings, in contrast to labor- and time-intensive manual parameter tweaking. Implementation indoepipelinemakes our methodology accessible and user-friendly, and allows for automatic optimization of tools in a wide range of cases. The source code ofdoepipelineis available athttps://github.com/clicumu/doepipelineand it can be installed through conda-forge.

Download Full-text

doepipeline: a systematic approach to optimizing multi-level and multi-step data processing workflows

BMC Bioinformatics ◽

10.1186/s12859-019-3091-z ◽

2019 ◽

Vol 20 (1) ◽

Author(s):

Daniel Svensson ◽

Rickard Sjögren ◽

David Sundell ◽

Andreas Sjödin ◽

Johan Trygg

Keyword(s):

Systematic Approach ◽

De Novo ◽

Optimal Parameter ◽

Variant Calling ◽

Search Space ◽

Potential Interaction ◽

Novel Approach ◽

Practice Parameters ◽

Wide Range ◽

Response Surface Designs

Abstract Background Selecting the proper parameter settings for bioinformatic software tools is challenging. Not only will each parameter have an individual effect on the outcome, but there are also potential interaction effects between parameters. Both of these effects may be difficult to predict. To make the situation even more complex, multiple tools may be run in a sequential pipeline where the final output depends on the parameter configuration for each tool in the pipeline. Because of the complexity and difficulty of predicting outcomes, in practice parameters are often left at default settings or set based on personal or peer experience obtained in a trial and error fashion. To allow for the reliable and efficient selection of parameters for bioinformatic pipelines, a systematic approach is needed. Results We present doepipeline, a novel approach to optimizing bioinformatic software parameters, based on core concepts of the Design of Experiments methodology and recent advances in subset designs. Optimal parameter settings are first approximated in a screening phase using a subset design that efficiently spans the entire search space, then optimized in the subsequent phase using response surface designs and OLS modeling. Doepipeline was used to optimize parameters in four use cases; 1) de-novo assembly, 2) scaffolding of a fragmented genome assembly, 3) k-mer taxonomic classification of Oxford Nanopore Technologies MinION reads, and 4) genetic variant calling. In all four cases, doepipeline found parameter settings that produced a better outcome with respect to the characteristic measured when compared to using default values. Our approach is implemented and available in the Python package doepipeline. Conclusions Our proposed methodology provides a systematic and robust framework for optimizing software parameter settings, in contrast to labor- and time-intensive manual parameter tweaking. Implementation in doepipeline makes our methodology accessible and user-friendly, and allows for automatic optimization of tools in a wide range of cases. The source code of doepipeline is available at https://github.com/clicumu/doepipeline and it can be installed through conda-forge.

Download Full-text

Canvas SPW: calling de novo copy number variants in pedigrees

10.1101/121939 ◽

2017 ◽

Author(s):

Sergii Ivakhno ◽

Eric Roller ◽

Camilla Colombo ◽

Philip Tedder ◽

Anthony J. Cox

Keyword(s):

Copy Number ◽

De Novo ◽

Late Onset ◽

Genetic Diseases ◽

Copy Number Variants ◽

Variant Calling ◽

Supplementary Information ◽

Sequencing Data ◽

Pedigree Structure ◽

Wide Range

AbstractMotivationWhole genome sequencing is becoming a diagnostics of choice for the identification of rare inherited and de novo copy number variants in families with various pediatric and late-onset genetic diseases. However, joint variant calling in pedigrees is hampered by the complexity of consensus breakpoint alignment across samples within an arbitrary pedigree structure.ResultsWe have developed a new tool, Canvas SPW, for the identification of inherited and de novo copy number variants from pedigree sequencing data. Canvas SPW supports a number of family structures and provides a wide range of scoring and filtering options to automate and streamline identification of de novo variants.AvailabilityCanvas SPW is available for download from https://github.com/Illumina/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

Search for mutations of the interferon-induced transmembrane protein 5 (IFITM5) gene in patients with osteogenesis imperfecta

Nauchno-prakticheskii zhurnal «Medicinskaia genetika» ◽

10.25557/2073-7998.2019.10.21-29 ◽

2019 ◽

pp. 21-29

Author(s):

А.Р. Зарипова ◽

Л.Р. Нургалиева ◽

А.В. Тюрин ◽

И.Р. Минниахметов ◽

Р.И. Хусаинова

Keyword(s):

Osteogenesis Imperfecta ◽

De Novo ◽

Transmembrane Protein ◽

Clinical Signs ◽

Clinical Manifestations ◽

Heterozygous Mutation ◽

Type I ◽

New Genes ◽

Wide Range ◽

Type V

Проведено исследование гена интерферон индуцированного трансмембранного белка 5 (IFITM5) у 99 пациентов с несовершенным остеогенезом (НО) из 86 неродственных семей. НО - клинически и генетически гетерогенное наследственное заболевание соединительной ткани, основное клиническое проявление которого - множественные переломы, начиная с неонатального периода жизни, зачастую приводящие к инвалидизации с детского возраста. К основным клиническим признакам НО относятся голубые склеры, потеря слуха, аномалия дентина, повышенная ломкость костей, нарушения роста и осанки с развитием характерных инвалидизирующих деформаций костей и сопутствующих проблем, включающих дыхательные, неврологические, сердечные, почечные нарушения. НО встречается как у мужчин, так и у женщин. До сих пор не определена степень генетической гетерогенности заболевания. На сегодняшний день известно 20 генов, вовлеченных в патогенез НО, и исследователи разных стран продолжают искать новые гены. В последнее десятилетие стало известно, что аутосомно-рецессивные, аутосомно-доминантные и Х-сцепленные мутации в широком спектре генов, кодирующих белки, которые участвуют в синтезе коллагена I типа, его процессинге, секреции и посттрансляционной модификации, а также в белках, которые регулируют дифференцировку и активность костеобразующих клеток, вызывают НО. Мутации в гене IFITM5, также называемом BRIL (bone-restricted IFITM-like protein), участвующем в формировании остеобластов, приводят к развитию НО типа V. До 5% пациентов имеют НО типа V, который характеризуется образованием гиперпластического каллуса после переломов, кальцификацией межкостной мембраны предплечья и сетчатым рисунком ламелирования, наблюдаемого при гистологическом исследовании кости. В 2012 г. гетерозиготная мутация (c.-14C> T) в 5’-нетранслируемой области (UTR) гена IFITM5 была идентифицирована как основная причина НО V типа. В представленной работе проведен анализ гена IFITM5 и идентифицирована мутация c.-14C>T, возникшая de novo, у одного пациента с НО, которому впоследствии был установлен V тип заболевания. Также выявлены три известных полиморфных варианта: rs57285449; c.80G>C (p.Gly27Ala) и rs2293745; c.187-45C>T и rs755971385 c.279G>A (p.Thr93=) и один ранее не описанный вариант: c.128G>A (p.Ser43Asn) AGC>AAC (S/D), которые не являются патогенными. В статье уделяется внимание особенностям клинических проявлений НО V типа и рекомендуется определение мутации c.-14C>T в гене IFITM5 при подозрении на данную форму заболевания. A study was made of interferon-induced transmembrane protein 5 gene (IFITM5) in 99 patients with osteogenesis imperfecta (OI) from 86 unrelated families and a search for pathogenic gene variants involved in the formation of the disease phenotype. OI is a clinically and genetically heterogeneous hereditary disease of the connective tissue, the main clinical manifestation of which is multiple fractures, starting from the natal period of life, often leading to disability from childhood. The main clinical signs of OI include blue sclera, hearing loss, anomaly of dentin, increased fragility of bones, impaired growth and posture, with the development of characteristic disabling bone deformities and associated problems, including respiratory, neurological, cardiac, and renal disorders. OI occurs in both men and women. The degree of genetic heterogeneity of the disease has not yet been determined. To date, 20 genes are known to be involved in the pathogenesis of OI, and researchers from different countries continue to search for new genes. In the last decade, it has become known that autosomal recessive, autosomal dominant and X-linked mutations in a wide range of genes encoding proteins that are involved in the synthesis of type I collagen, its processing, secretion and post-translational modification, as well as in proteins that regulate the differentiation and activity of bone-forming cells cause OI. Mutations in the IFITM5 gene, also called BRIL (bone-restricted IFITM-like protein), involved in the formation of osteoblasts, lead to the development of OI type V. Up to 5% of patients have OI type V, which is characterized by the formation of a hyperplastic callus after fractures, calcification of the interosseous membrane of the forearm, and a mesh lamellar pattern observed during histological examination of the bone. In 2012, a heterozygous mutation (c.-14C> T) in the 5’-untranslated region (UTR) of the IFITM5 gene was identified as the main cause of OI type V. In the present work, the IFITM5 gene was analyzed and the de novo c.-14C> T mutation was identified in one patient with OI who was subsequently diagnosed with type V of the disease. Three known polymorphic variants were also identified: rs57285449; c.80G> C (p.Gly27Ala) and rs2293745; c.187-45C> T and rs755971385 c.279G> A (p.Thr93 =) and one previously undescribed variant: c.128G> A (p.Ser43Asn) AGC> AAC (S / D), which were not pathogenic. The article focuses on the features of the clinical manifestations of OI type V, and it is recommended to determine the c.-14C> T mutation in the IFITM5 gene if this form of the disease is suspected.

Download Full-text

A Plug-and-Play Approach for the De Novo Generation of Dually Functionalised Bispecifics

10.26434/chemrxiv.8068184.v1 ◽

2019 ◽

Author(s):

Antoine Maruani ◽

Peter A. Szijj ◽

Calise Bahou ◽

João C. F. Nogueira ◽

Stephen Caddick ◽

...

Keyword(s):

De Novo ◽

Therapeutic Index ◽

Antibody Fragments ◽

Bispecific Antibodies ◽

Full Potential ◽

Mechanisms Of Resistance ◽

Large Excess ◽

Chemical Methods ◽

New Class ◽

Novel Approach

Diseases are multifactorial, with redundancies and synergies between various pathways. However, most of the antibody-based therapeutics in clinical trials and on the market interact with only one target thus limiting their efficacy. The targeting of multiple epitopes could improve the therapeutic index of treatment and counteract mechanisms of resistance. To this effect, a new class of therapeutics emerged: bispecific antibodies.Bispecific formation using chemical methods is rare and low yielding and/or requires a large excess of one of the two proteins to avoid homodimerisation. In order for chemically prepared bispecifics to deliver their full potential, high-yielding, modular and reliable cross-linking technologies are required. Herein, we describe a novel approach not only for the rapid and high-yielding chemical generation of bispecific antibodies from native antibody fragments, but also for the site-specific dual functionalisation of the resulting bioconjugates. Based on orthogonal clickable functional groups, this strategy enables the assembly of functionalised bispecifics with controlled loading in a modular and convergent manner.

Download Full-text

Cerebellotrigeminal Dermal Dysplasia (Gómez-López-Hernández Syndrome)

Journal of Pediatric Neurology ◽

10.1055/s-0038-1667021 ◽

2018 ◽

Vol 16 (05) ◽

pp. 362-368 ◽

Cited By ~ 1

Author(s):

Federica Sullo ◽

Agata Polizzi ◽

Stefano Catanzaro ◽

Selene Mantegna ◽

Francesco Lacarrubba ◽

...

Keyword(s):

De Novo ◽

Chromosomal Rearrangements ◽

Number Of Patients ◽

Wide Range ◽

Visual Problems ◽

Neurodevelopmental Abnormalities ◽

Clinical Triad ◽

High Degree ◽

Motor Handicap

Cerebellotrigeminal dermal (CTD) dysplasia is a rare neurocutaneous disorder characterized by a triad of symptoms: bilateral parieto-occipital alopecia, facial anesthesia in the trigeminal area, and rhombencephalosynapsis (RES), confirmed by cranial magnetic resonance imaging. CTD dysplasia is also known as Gómez-López-Hernández syndrome. So far, only 35 cases have been described with varying symptomatology. The etiology remains unknown. Either spontaneous dominant mutations or de novo chromosomal rearrangements have been proposed as possible explanations. In addition to its clinical triad of RES, parietal alopecia, and trigeminal anesthesia, CTD dysplasia is associated with a wide range of phenotypic and neurodevelopmental abnormalities.Treatment is symptomatic and includes physical rehabilitation, special education, dental care, and ocular protection against self-induced corneal trauma that causes ulcers and, later, corneal opacification. The prognosis is correlated to the mental development, motor handicap, corneal–facial anesthesia, and visual problems. Follow-up on a large number of patients with CTD dysplasia has never been reported and experience is limited to few cases to date. High degree of suspicion in a child presenting with characteristic alopecia and RES has a great importance in diagnosis of this syndrome.

Download Full-text

Responsive and Personalized Web Layouts with Integer Programming

Proceedings of the ACM on Human-Computer Interaction ◽

10.1145/3461735 ◽

2021 ◽

Vol 5 (EICS) ◽

pp. 1-23

Author(s):

Markku Laine ◽

Yu Zhang ◽

Simo Santala ◽

Jussi P. P. Jokinen ◽

Antti Oulasvirta

Keyword(s):

Integer Programming ◽

Web Design ◽

Web Pages ◽

Web Page Design ◽

Automated Generation ◽

Novel Approach ◽

Wide Range ◽

Responsive Design ◽

Responsive Web Design ◽

Page Design

Over the past decade, responsive web design (RWD) has become the de facto standard for adapting web pages to a wide range of devices used for browsing. While RWD has improved the usability of web pages, it is not without drawbacks and limitations: designers and developers must manually design the web layouts for multiple screen sizes and implement associated adaptation rules, and its "one responsive design fits all" approach lacks support for personalization. This paper presents a novel approach for automated generation of responsive and personalized web layouts. Given an existing web page design and preferences related to design objectives, our integer programming -based optimizer generates a consistent set of web designs. Where relevant data is available, these can be further automatically personalized for the user and browsing device. The paper includes presentation of techniques for runtime adaptation of the designs generated into a fully responsive grid layout for web browsing. Results from our ratings-based online studies with end users (N = 86) and designers (N = 64) show that the proposed approach can automatically create high-quality responsive web layouts for a variety of real-world websites.

Download Full-text

R1352Q CACNA1A Variant in a Patient with Sporadic Hemiplegic Migraine, Ataxia, Seizures and Cerebral Oedema: A Case Report

Case Reports in Neurology ◽

10.1159/000512275 ◽

2021 ◽

pp. 123-130

Author(s):

Anker Stubberud ◽

Emer O’Connor ◽

Erling Tronvik ◽

Henry Houlden ◽

Manjit Matharu

Keyword(s):

Mental Retardation ◽

Case Report ◽

Genetic Testing ◽

Head Trauma ◽

Cerebral Oedema ◽

De Novo ◽

Minor Head Trauma ◽

Hemiplegic Migraine ◽

Wide Range ◽

History Of

Mutations in the CACNA1A gene show a wide range of neurological phenotypes including hemiplegic migraine, ataxia, mental retardation and epilepsy. In some cases, hemiplegic migraine attacks can be triggered by minor head trauma and culminate in encephalopathy and cerebral oedema. A 37-year-old male without a family history of complex migraine experienced hemiplegic migraine attacks from childhood. The attacks were usually triggered by minor head trauma, and on several occasions complicated with encephalopathy and cerebral oedema. Genetic testing of the proband and unaffected parents revealed a de novo heterozygous nucleotide missense mutation in exon 25 of the CACNA1A gene (c.4055G>A, p.R1352Q). The R1352Q CACNA1A variant shares the phenotype with other described CACNA1A mutations and highlights the interesting association of trauma as a precipitant for hemiplegic migraine. Subjects with early-onset sporadic hemiplegic migraine triggered by minor head injury or associated with seizures, ataxia or episodes of encephalopathy should be screened for mutations. These patients should also be advised to avoid activities that may result in head trauma, and anticonvulsants should be considered as prophylactic migraine therapy.

Download Full-text

Discovery of novel community-relevant small proteins in a simplified human intestinal microbiome

Microbiome ◽

10.1186/s40168-020-00981-z ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Hannes Petruschke ◽

Christian Schori ◽

Sebastian Canzler ◽

Sarah Riesbeck ◽

Anja Poehlein ◽

...

Keyword(s):

Microbial Communities ◽

Intestinal Microbiota ◽

De Novo ◽

Bacterial Species ◽

Intestinal Microbiome ◽

Single Strain ◽

Small Proteins ◽

Human Intestinal Microbiota ◽

Wide Range

Abstract Background The intestinal microbiota plays a crucial role in protecting the host from pathogenic microbes, modulating immunity and regulating metabolic processes. We studied the simplified human intestinal microbiota (SIHUMIx) consisting of eight bacterial species with a particular focus on the discovery of novel small proteins with less than 100 amino acids (= sProteins), some of which may contribute to shape the simplified human intestinal microbiota. Although sProteins carry out a wide range of important functions, they are still often missed in genome annotations, and little is known about their structure and function in individual microbes and especially in microbial communities. Results We created a multi-species integrated proteogenomics search database (iPtgxDB) to enable a comprehensive identification of novel sProteins. Six of the eight SIHUMIx species, for which no complete genomes were available, were sequenced and de novo assembled. Several proteomics approaches including two earlier optimized sProtein enrichment strategies were applied to specifically increase the chances for novel sProtein discovery. The search of tandem mass spectrometry (MS/MS) data against the multi-species iPtgxDB enabled the identification of 31 novel sProteins, of which the expression of 30 was supported by metatranscriptomics data. Using synthetic peptides, we were able to validate the expression of 25 novel sProteins. The comparison of sProtein expression in each single strain versus a multi-species community cultivation showed that six of these sProteins were only identified in the SIHUMIx community indicating a potentially important role of sProteins in the organization of microbial communities. Two of these novel sProteins have a potential antimicrobial function. Metabolic modelling revealed that a third sProtein is located in a genomic region encoding several enzymes relevant for the community metabolism within SIHUMIx. Conclusions We outline an integrated experimental and bioinformatics workflow for the discovery of novel sProteins in a simplified intestinal model system that can be generically applied to other microbial communities. The further analysis of novel sProteins uniquely expressed in the SIHUMIx multi-species community is expected to enable new insights into the role of sProteins on the functionality of bacterial communities such as those of the human intestinal tract.

Download Full-text

Practice Guidelines: How Good are Medicine's New Recipes?

The Journal of Law Medicine & Ethics ◽

10.1111/j.1748-720x.1995.tb01329.x ◽

1995 ◽

Vol 23 (1) ◽

pp. 47-48 ◽

Cited By ~ 1

Author(s):

Alexander Morgan Capron

Keyword(s):

Health Care ◽

Nonprofit Organizations ◽

Medical Knowledge ◽

Medical Specialty ◽

Therapeutic Interventions ◽

Medical Interventions ◽

Practice Parameters ◽

Wide Range ◽

Standard Of Practice

Over the last decade, standards for when and how to undertake a wide range of medical interventions have poured forth from medical specialty groups, commercial and nonprofit organizations, and state and federal panels. Known by a variety of names—from practice parameters to clinical guidelines—and intended for a range of purposes—from diminishing the incidence of maloccurences in hospitals to cutting the costs of health care—these guidelines share one important feature: the intention of decreasing the range of variation in medical practice. Such standardization immediately appeals to anyone interested in improving the quality of health care and, in particular, reducing inappropriate medical interventions, in light of the difficulties for a conscientious physician today in adhering to the best standard of practice when faced with ever increasing medical knowledge and the growing number and complexity of diagnostic, preventive, and therapeutic interventions.

Download Full-text

Density Guarantee on Finding Multiple Subgraphs and Subtensors

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3446668 ◽

2021 ◽

Vol 15 (5) ◽

pp. 1-32

Author(s):

Quang-huy Duong ◽

Heri Ramampiaro ◽

Kjetil Nørvåg ◽

Thu-lan Dam

Keyword(s):

Lower Bound ◽

State Of The Art ◽

The State ◽

The Other ◽

Exact Methods ◽

Practical Solution ◽

Novel Approach ◽

Wide Range ◽

Real World Datasets ◽

Tensor Data

Dense subregion (subgraph & subtensor) detection is a well-studied area, with a wide range of applications, and numerous efficient approaches and algorithms have been proposed. Approximation approaches are commonly used for detecting dense subregions due to the complexity of the exact methods. Existing algorithms are generally efficient for dense subtensor and subgraph detection, and can perform well in many applications. However, most of the existing works utilize the state-or-the-art greedy 2-approximation algorithm to capably provide solutions with a loose theoretical density guarantee. The main drawback of most of these algorithms is that they can estimate only one subtensor, or subgraph, at a time, with a low guarantee on its density. While some methods can, on the other hand, estimate multiple subtensors, they can give a guarantee on the density with respect to the input tensor for the first estimated subsensor only. We address these drawbacks by providing both theoretical and practical solution for estimating multiple dense subtensors in tensor data and giving a higher lower bound of the density. In particular, we guarantee and prove a higher bound of the lower-bound density of the estimated subgraph and subtensors. We also propose a novel approach to show that there are multiple dense subtensors with a guarantee on its density that is greater than the lower bound used in the state-of-the-art algorithms. We evaluate our approach with extensive experiments on several real-world datasets, which demonstrates its efficiency and feasibility.

Download Full-text