scholarly journals Data-adaptive multi-locus association testing in subjects with arbitrary genealogical relationships

Author(s):  
Gail Gong ◽  
Wei Wang ◽  
Chih-Lin Hsieh ◽  
David J. Van Den Berg ◽  
Christopher Haiman ◽  
...  

Abstract Genome-wide sequencing enables evaluation of associations between traits and combinations of variants in genes and pathways. But such evaluation requires multi-locus association tests with good power, regardless of the variant and trait characteristics. And since analyzing families may yield more power than analyzing unrelated individuals, we need multi-locus tests applicable to both related and unrelated individuals. Here we describe such tests, and we introduce SKAT-X, a new test statistic that uses genome-wide data obtained from related or unrelated subjects to optimize power for the specific data at hand. Simulations show that: a) SKAT-X performs well regardless of variant and trait characteristics; and b) for binary traits, analyzing affected relatives brings more power than analyzing unrelated individuals, consistent with previous findings for single-locus tests. We illustrate the methods by application to rare unclassified missense variants in the tumor suppressor gene BRCA2, as applied to combined data from prostate cancer families and unrelated prostate cancer cases and controls in the Multi-ethnic Cohort (MEC). The methods can be implemented using open-source code for public use as the R-package GATARS (Genetic Association Tests for Arbitrarily Related Subjects) <https://gailg.github.io/gatars/>.

2020 ◽  
Vol 6 ◽  
pp. e251 ◽  
Author(s):  
Zhaodong Hao ◽  
Dekang Lv ◽  
Ying Ge ◽  
Jisen Shi ◽  
Dolf Weijers ◽  
...  

Background Owing to the rapid advances in DNA sequencing technologies, whole genome from more and more species are becoming available at increasing pace. For whole-genome analysis, idiograms provide a very popular, intuitive and effective way to map and visualize the genome-wide information, such as GC content, gene and repeat density, DNA methylation distribution, genomic synteny, etc. However, most available software programs and web servers are available only for a few model species, such as human, mouse and fly, or have limited application scenarios. As more and more non-model species are sequenced with chromosome-level assembly being available, tools that can generate idiograms for a broad range of species and be capable of visualizing more data types are needed to help better understanding fundamental genome characteristics. Results The R package RIdeogram allows users to build high-quality idiograms of any species of interest. It can map continuous and discrete genome-wide data on the idiograms and visualize them in a heat map and track labels, respectively. Conclusion The visualization of genome-wide data mapping and comparison allow users to quickly establish a clear impression of the chromosomal distribution pattern, thus making RIdeogram a useful tool for any researchers working with omics.


Author(s):  
Zhaodong Hao ◽  
Dekang Lv ◽  
Ying Ge ◽  
Jisen Shi ◽  
Dolf Weijers ◽  
...  

Background: Owing to the rapid advances in DNA sequencing technologies, whole genome from more and more species are becoming available at increasing pace. For whole-genome analysis, idiograms provide a very popular, intuitive and effective way to map and visualize the genome-wide information, such as GC content, gene and repeat density, DNA methylation distribution, etc. However, most available software programs and web servers are available only for a few model species, such as human, mouse and fly. As boundaries between model and non-model species are shifting, tools are urgently needs to generate idiograms for a broad range of species are needed to help better understanding fundamental genome characteristics. Results: The R package RIdeogram allows users to build high-quality idiograms of any species of interest. It can map continuous and discrete genome-wide data on the idiograms and visualize them in a heat map and track labels, respectively. Conclusion: The visualization of genome-wide data mapping and comparison allow users to quickly establish a clear impression of the chromosomal distribution pattern, thus making RIdeogram a useful tool for any researchers working with omics.


2020 ◽  
Author(s):  
Cristian Yones ◽  
Natalia Macchiaroli ◽  
Laura Kamenetzky ◽  
Georgina Stegmayer ◽  
Diego Milone

AbstractExtracting stem-loop sequences (hairpins) from genome-wide data is very important nowadays for some data mining tasks in bioinformatics. The genome preprocessing is very important because it has a strong influence on the later steps and the final results. For example, for novel miRNA prediction, all well-known hairpins must be properly located. Although there are some scripts that can be adapted and put together to achieve this task, they are outdated, none of them guarantees finding correspondence to well-known structures in the genome under analysis, and they do not take advantage of the latest advances in secondary structure prediction. We present here an R package for automatic extraction of hairpins from genome-wide data (HextractorR). HextractoR makes an exhaustive and smart analysis of the genome in order to obtain a very good set of short sequences for further processing. Moreover, genomes can be processed in parallel and with low memory requirements. Results obtained showed that HextractoR has effectively outperformed other methods.HextractoR it is freely available at CRAN and Sourceforge.


Author(s):  
Zhaodong Hao ◽  
Dekang Lv ◽  
Ying Ge ◽  
Jisen Shi ◽  
Dolf Weijers ◽  
...  

Background: Owing to the rapid advances in DNA sequencing technologies, whole genome from more and more species are becoming available at increasing pace. For whole-genome analysis, idiograms provide a very popular, intuitive and effective way to map and visualize the genome-wide information, such as GC content, gene and repeat density, DNA methylation distribution, etc. However, most available software programs and web servers are available only for a few model species, such as human, mouse and fly. As boundaries between model and non-model species are shifting, tools are urgently needs to generate idiograms for a broad range of species are needed to help better understanding fundamental genome characteristics. Results: The R package RIdeogram allows users to build high-quality idiograms of any species of interest. It can map continuous and discrete genome-wide data on the idiograms and visualize them in a heat map and track labels, respectively. Conclusion: The visualization of genome-wide data mapping and comparison allow users to quickly establish a clear impression of the chromosomal distribution pattern, thus making RIdeogram a useful tool for any researchers working with omics.


2020 ◽  
Author(s):  
Gabriel Jimenez-Dominguez ◽  
Patrice Ravel ◽  
Stéphan Jalaguier ◽  
Vincent Cavaillès ◽  
Jacques Colinge

AbstractModular response analysis (MRA) is a widely used modeling technique to uncover coupling strengths in molecular networks under a steady-state condition by means of perturbation experiments. We propose an extension of this methodology to search genomic data for new associations with a network modeled by MRA and to improve the predictive accuracy of MRA models. These extensions are illustrated by exploring the cross talk between estrogen and retinoic acid receptors, two nuclear receptors implicated in several hormone-driven cancers such as breast. We also present a novel, rigorous and elegant mathematical derivation of MRA equations, which is the foundation of this work and of an R package that is freely available at https://github.com/bioinfo-ircm/aiMeRA/. This mathematical analysis should facilitate MRA understanding by newcomers.Author summaryEstrogen and retinoic acid receptors play an important role in several hormone-driven cancers and share co-regulators and co-repressors that modulate their transcription factor activity. The literature shows evidence for crosstalk between these two receptors and suggests that spatial competition on the promoters could be a mechanism. We used MRA to explore the possibility that key co-repressors, i.e., NRIP1 (RIP140) and LCoR could also mediate crosstalk by exploiting new quantitative (qPCR) and RNA sequencing data. The transcription factor role of the receptors and the availability of genome-wide data enabled us to explore extensions of the MRA methodology to explore genome-wide data sets a posteriori, searching for genes associated with a molecular network that was sampled by perturbation experiments. Despite nearly two decades of use, we felt that MRA lacked a systematic mathematical derivation. We present here an elegant and rather simple analysis that should greatly facilitate newcomers’ understanding of MRA details. Moreover, an easy-to-use R package is released that should make MRA accessible to biology labs without mathematical expertise. Quantitative data are embedded in the R package and RNA sequencing data are available from GEO.


2015 ◽  
Vol 107 (7) ◽  
pp. 232-244 ◽  
Author(s):  
Mohammed Alshalalfa ◽  
Mark Schliekelman ◽  
Heesun Shin ◽  
Nicholas Erho ◽  
Elai Davicioni

Sign in / Sign up

Export Citation Format

Share Document