technical variability
Recently Published Documents


TOTAL DOCUMENTS

60
(FIVE YEARS 23)

H-INDEX

18
(FIVE YEARS 2)

2022 ◽  
Author(s):  
Sofya Lipnitskaya ◽  
Yang Shen ◽  
Stefan Legewie ◽  
Holger Klein ◽  
Kolja Becker

Abstract Background: Recent studies in the area of transcriptomics performed on single-cell and population levels reveal noticeable variability in gene expression measurements provided by different RNA sequencing technologies. Due to increased noise and complexity of single-cell RNA-Seq (scRNA-Seq) data over the bulk experiment, there is a substantial number of variably-expressed genes and so-called dropouts, challenging the subsequent computational analysis and potentially leading to false positive discoveries. In order to investigate factors affecting technical variability between RNA sequencing experiments of different technologies, we performed a systematic assessment of single-cell and bulk RNA-Seq data, which have undergone the same pre-processing and sample preparation procedures. Results: Our analysis indicates that variability between gene expression measurements as well as dropout events are not exclusively caused by biological variability, low expression levels, or random variation. Furthermore, we propose FAVSeq, a machine learning-assisted pipeline for detection of factors contributing to gene expression variability in matched RNA-Seq data provided by two technologies. Based on the analysis of the matched bulk and single-cell dataset, we found the 3'-UTR and transcript lengths as the most relevant effectors of the observed variation between RNA-Seq experiments, while the same factors together with cellular compartments were shown to be associated with dropouts. Conclusions: Here, we investigated the sources of variation in RNA-Seq profiles of matched single-cell and bulk experiments. In addition, we proposed the FAVSeq pipeline for analyzing multimodal RNA sequencing data, which allowed to identify factors affecting quantitative difference in gene expression measurements as well as the presence of dropouts. Hereby, the derived knowledge can be employed further in order to improve the interpretation of RNA-Seq data and identify genes that can be affected by assay-based deviations. Source code is available under the MIT license at https://github.com/slipnitskaya/FAVSeq.


2022 ◽  
Author(s):  
Sofya Lipnitskaya ◽  
Yang Shen ◽  
Stefan Legewie ◽  
Holger Klein ◽  
Kolja Becker

Background: Recent studies in the area of transcriptomics performed on single-cell and population levels reveal noticeable variability in gene expression measurements provided by different RNA sequencing technologies. Due to increased noise and complexity of single-cell RNA-Seq (scRNA-Seq) data over the bulk experiment, there is a substantial number of variably-expressed genes and so-called dropouts, challenging the subsequent computational analysis and potentially leading to false positive discoveries. In order to investigate factors affecting technical variability between RNA sequencing experiments of different technologies, we performed a systematic assessment of single-cell and bulk RNA-Seq data, which have undergone the same pre-processing and sample preparation procedures. Results: Our analysis indicates that variability between gene expression measurements as well as dropout events are not exclusively caused by biological variability, low expression levels, or random variation. Furthermore, we propose FAVSeq, a machine learning-assisted pipeline for detection of factors contributing to gene expression variability in matched RNA-Seq data provided by two technologies. Based on the analysis of the matched bulk and single-cell dataset, we found the 3'-UTR and transcript lengths as the most relevant effectors of the observed variation between RNA-Seq experiments, while the same factors together with cellular compartments were shown to be associated with dropouts. Conclusions: Here, we investigated the sources of variation in RNA-Seq profiles of matched single-cell and bulk experiments. In addition, we proposed the FAVSeq pipeline for analyzing multimodal RNA sequencing data, which allowed to identify factors affecting quantitative difference in gene expression measurements as well as the presence of dropouts. Hereby, the derived knowledge can be employed further in order to improve the interpretation of RNA-Seq data and identify genes that can be affected by assay-based deviations. Source code is available under the MIT license at https://github.com/slipnitskaya/FAVSeq.


2021 ◽  
Author(s):  
Giulia Maria Rita De Luca ◽  
Jan Habraken

Abstract Background: Some of the parameters used for the quantification of Positron Emission Tomography (PET) images are the Standardized Uptake Value (SUV)Max, SUVMean and SUVPeak. In order to assess the significance of an increasing or decreasing of these parameters for diagnostic purposes it is relevant to know their standard deviation. The sources of the standard deviation can be divided in biological and technical. In this study we present a method to determine the technical variation of the SUV in PET images.Results: This method was tested on images of a NEMA quality phantom with spheres of various diameters with full-length acquisition time of 150 s per bed position and foreground to background activity ratio of F18-2-fluoro-2-deoxy-D-glucose (FDG) of 10:1. Our method is based on dividing the full-length 150 s acquisition into subsets of shorter time length and reconstructing the images in the subsets. The SUVMax, Mean and Peak were calculated for each reconstructed image in a subset. The coefficient of deviation of the SUV parameters within each subset has then been used to estimate the expected standard deviation between images at 150 s reconstruction length. We report the largest technical variation of the SUV parameters for the smallest sphere, and the smallest variation for the largest sphere. The expected variation at 150 s reconstruction length does not exceed 6% for the smallest sphere and 2% for the largest sphere. Conclusions: With the presented method we are able to determine the technical variation of SUV. The method enables us to evaluate the effect of parameter selection and lesion size on the technical variation, and therefore to evaluate its relevance on the total variation of the SUV value between studies.


PLoS ONE ◽  
2021 ◽  
Vol 16 (10) ◽  
pp. e0258982
Author(s):  
Brian Li ◽  
Kristen L. Cotner ◽  
Nathaniel K. Liu ◽  
Stefan Hinz ◽  
Mark A. LaBarge ◽  
...  

Cellular mechanical properties can reveal physiologically relevant characteristics in many cell types, and several groups have developed microfluidics-based platforms to perform high-throughput single-cell mechanical testing. However, prior work has performed only limited characterization of these platforms’ technical variability and reproducibility. Here, we evaluate the repeatability performance of mechano-node-pore sensing, a single-cell mechanical phenotyping platform developed by our research group. We measured the degree to which device-to-device variability and semi-manual data processing affected this platform’s measurements of single-cell mechanical properties. We demonstrated high repeatability across the entire technology pipeline even for novice users. We then compared results from identical mechano-node-pore sensing experiments performed by researchers in two different laboratories with different analytical instruments, demonstrating that the mechanical testing results from these two locations are in agreement. Our findings quantify the expectation of technical variability in mechano-node-pore sensing even in minimally experienced hands. Most importantly, we find that the repeatability performance we measured is fully sufficient for interpreting biologically relevant single-cell mechanical measurements with high confidence.


2021 ◽  
Author(s):  
Benjamin D Meyers ◽  
Vincent K Lee ◽  
Lauren G Dennis ◽  
Julia Wallace ◽  
Vince Schmithorst ◽  
...  

Advanced brain imaging of neonatal macrostructure and microstructure, which has prognosticating importance, is more frequently being incorporated into multi-center trials of neonatal neuroprotection. Multicenter neuroimaging studies, designed to overcome small sample sized clinical cohorts, are essential but lead to increased technical variability. Few harmonization techniques have been developed for neonatal brain microstructural (diffusion tensor) analysis. The work presented here aims to remedy two common problems that exist with the current state of the art approaches: 1) variance in scanner and protocol in data collection can limit the researcher's ability to harmonize data acquired under different conditions or using different clinical populations. 2) The general lack of objective guidelines for dealing with anatomically abnormal anatomy and pathology. Often, subjects are excluded due to subjective criteria, or due to pathology that could be informative to the final analysis, leading to the loss of reproducibility and statistical power. This proves to be a barrier in the analysis of large multi-center studies and is a particularly salient problem given the relative scarcity of neonatal imaging data. We provide an objective, data-driven, and semi-automated neonatal processing pipeline designed to harmonize compartmentalized variant data acquired under different parameters. This is done by first implementing a search space reduction step of extracting the along-tract diffusivity values along each tract of interest, rather than performing whole-brain harmonization. This is followed by a data-driven outlier detection step, with the purpose of removing unwanted noise and outliers from the final harmonization. We then use an empirical Bayes harmonization algorithm performed at the along-tract level, with the output being a lower dimensional space but still spatially informative. After applying our pipeline to this large multi-site dataset of neonates and infants with congenital heart disease (n= 398 subjects recruited across 4 centers, with a total of n=763 MRI pre-operative/post-operative time points), we show that infants with single ventricle cardiac physiology demonstrate greater white matter microstructural alterations compared to infants with bi-ventricular heart disease, supporting what has previously been shown in literature. Our method is an open-source pipeline for delineating white matter tracts in subject space but provides the necessary modular components for performing atlas space analysis. As such, we validate and introduce Diffusion Imaging of Neonates by Group Organization (DINGO), a high-level, semi-automated framework that can facilitate harmonization of subject-space tractography generated from diffusion tensor imaging acquired across varying scanners, institutions, and clinical populations. Datasets acquired using varying protocols or cohorts are compartmentalized into subsets, where a cohort-specific template is generated, allowing for the propagation of the tractography mask set with higher spatial specificity. Taken together, this pipeline can reduce multi-scanner technical variability which can confound important biological variability in relation to neonatal brain microstructure.


2021 ◽  
Author(s):  
Fiona Given ◽  
Tamsyn Stanborough ◽  
Mark Waterland ◽  
Deborah Crittenden

In this work, we introduce a novel joint experimental design and computational analysis procedure to reliably and reproducibly quantify protein analyte binding to DNA aptamer-functionalised silver nanoparticles using slippery surface-enhanced Raman spectroscopy. We employ an indirect detection approach, based upon monitoring spectral changes in the covalent bond-stretching region as intermolecular bonds are formed between the surface-immobilized probe biomolecule and its target analyte. Sample variability is minimized by preparing aptamer-only and aptamer-plus-analyte samples under the same conditions, and then analysing difference spectra. To account for technical variability, multiple spectra are recorded from the same sample. Our new DeltaPCA analysis procedure takes into account technical variability within each spectral data set while also extracting statistically robust difference spectra between data sets. Proof of principle experiments using thiolated aptamers to detect CoV-SARS-2 spike protein reveal that analyte binding is mediated through the formation of N-H...X and C-H...X hydrogen bonds between the aptamer (H-bond donor) and protein (H-bond acceptor). Our computational analysis code can be freely downloaded from https://github.com/dlc62/DeltaPCA.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Lauren A. Vanderlinden ◽  
Randi K. Johnson ◽  
Patrick M. Carry ◽  
Fran Dong ◽  
Dawn L. DeMeo ◽  
...  

Abstract Objective Illumina BeadChip arrays are commonly used to generate DNA methylation data for large epidemiological studies. Updates in technology over time create challenges for data harmonization within and between studies, many of which obtained data from the older 450K and newer EPIC platforms. The pre-processing pipeline for DNA methylation is not trivial, and influences the downstream analyses. Incorporating different platforms adds a new level of technical variability that has not yet been taken into account by recommended pipelines. Our study evaluated the performance of various tools on different versions of platform data harmonization at each step of pre-processing pipeline, including quality control (QC), normalization, batch effect adjustment, and genomic inflation. We illustrate our novel approach using 450K and EPIC data from the Diabetes Autoimmunity Study in the Young (DAISY) prospective cohort. Results We found normalization and probe filtering had the biggest effect on data harmonization. Employing a meta-analysis was an effective and easily executable method for accounting for platform variability. Correcting for genomic inflation also helped with harmonization. We present guidelines for studies seeking to harmonize data from the 450K and EPIC platforms, which includes the use of technical replicates for evaluating numerous pre-processing steps, and employing a meta-analysis.


2021 ◽  
Author(s):  
Brian Li ◽  
Kristen L. Cotner ◽  
Nathaniel K. Liu ◽  
Stefan Hinz ◽  
Mark A. LaBarge ◽  
...  

Cellular mechanical properties can reveal physiologically relevant characteristics in many cell types, and several groups have developed microfluidics-based platforms to perform single-cell mechanical testing with high throughput. However, prior work has performed only limited characterization of these platforms' technical variability and reproducibility. Here, we evaluate the repeatability performance of mechano-node-pore sensing, which is a single-cell mechanical phenotyping platform developed by our research group. We measured the degree to which device-to-device variability and semi-manual data processing affected this platform's measurements of single-cell mechanical properties, and we demonstrated high repeatability across the entire technology pipeline even for novice users. We then compared results from identical mechano-node-pore sensing experiments performed by researchers in two different labs with different analytical instruments, demonstrating that the mechanical testing results from these two locations are in agreement. Our findings quantify the expectation of technical variability in mechano-node-pore sensing even in minimally experienced hands. Most importantly, we find that the repeatability performance we measured is fully sufficient for interpreting biologically relevant single-cell mechanical measurements with high confidence.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Dana Vaknin ◽  
Guy Amit ◽  
Amir Bashan

AbstractRecent technological advances, such as single-cell RNA sequencing (scRNA-seq), allow the measurement of gene expression profiles of individual cells. These expression profiles typically exhibit substantial variations even across seemingly homogeneous populations of cells. Two main different sources contribute to this measured variability: actual differences between the biological activity of the cells and technical measurement errors. Analysis of the biological variability may provide information about the underlying gene regulation of the cells, yet distinguishing it from the technical variability is a challenge. Here, we apply a recently developed computational method for measuring the global gene coordination level (GCL) to systematically study the cell-to-cell variability in numerical models of gene regulation. We simulate ‘biological variability’ by introducing heterogeneity in the underlying regulatory dynamic of different cells, while ‘technical variability’ is represented by stochastic measurement noise. We show that the GCL decreases for cohorts of cells with increased ‘biological variability’ only when it is originated from the interactions between the genes. Moreover, we find that the GCL can evaluate and compare—for cohorts with the same cell-to-cell variability—the ratio between the introduced biological and technical variability. Finally, we show that the GCL is robust against spurious correlations that originate from a small sample size or from the compositionality of the data. The presented methodology can be useful for future analysis of high-dimensional ecological and biochemical dynamics.


2021 ◽  
Author(s):  
Ping-Han Hsieh ◽  
Camila Miranda Lopes-Ramos ◽  
Geir Kjetil Sandve ◽  
Kimberly Glass ◽  
Marieke Lydia Kuijjer

Gene co-expression measurements are widely used in computational biology to identify coordinated expression patterns across a group of samples, which may indicate that these genes are controlled by the same transcriptional regulatory program, or involved in common biological processes. Gene co-expression is generally estimated from RNA-Seq data, which are generally normalized to remove technical variability. Here, we find and demonstrate that certain normalization methods, in particular quantile-based methods, can introduce false-positive associations between genes, and that this can consequently hamper downstream co-expression network analysis. Quantile-based normalization can, however, be extremely powerful. In particular when preprocessing large-scale heterogeneous data, quantile-based normalization can be applied to remove technical variability while maintaining global differences in expression for samples with different biological attributes. We therefore developed CAIMAN, a method to correct for false-positive associations that may arise from normalization of RNA-Seq data. CAIMAN utilizes a Gaussian mixture model to fit the distribution of gene expression and to adaptively select the threshold to define lowly expressed genes, which are prone to form false-positive associations. Thereafter, CAIMAN corrects the normalized expression for these genes by removing the variability across samples that might lead to false-positive associations. Moreover, CAIMAN avoids arbitrary gene filtering and retains associations to genes that only express in small subgroups of samples, highlighting its potential future impact on network modeling and other association-based approaches in large-scale heterogeneous data.


Sign in / Sign up

Export Citation Format

Share Document