The Spike-and-Slab Lasso regression modeling with compositional covariates: An application on Brazilian children malnutrition data

2019 ◽  
Vol 29 (5) ◽  
pp. 1434-1446
Author(s):  
Francisco Louzada ◽  
Taciana KO Shimizu ◽  
Adriano K Suzuki

There are considerable challenges in analyzing large-scale compositional data. In this paper, we introduce the Spike-and-Slab Lasso linear regression in the presence of compositional covariates for parameter estimation and variable selection. We consider the well-known isometric log-ratio (ilr) coordinates to avoid misleading statistical inference. The separable and non-separable (adaptative) Spike-and-Slab Lasso penalties are compared to verify the advantages of each approach. The proposed method is illustrated on simulated and on real Brazilian child malnutrition data.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Prasanna Date ◽  
Davis Arthur ◽  
Lauren Pusey-Nazzaro

AbstractTraining machine learning models on classical computers is usually a time and compute intensive process. With Moore’s law nearing its inevitable end and an ever-increasing demand for large-scale data analysis using machine learning, we must leverage non-conventional computing paradigms like quantum computing to train machine learning models efficiently. Adiabatic quantum computers can approximately solve NP-hard problems, such as the quadratic unconstrained binary optimization (QUBO), faster than classical computers. Since many machine learning problems are also NP-hard, we believe adiabatic quantum computers might be instrumental in training machine learning models efficiently in the post Moore’s law era. In order to solve problems on adiabatic quantum computers, they must be formulated as QUBO problems, which is very challenging. In this paper, we formulate the training problems of three machine learning models—linear regression, support vector machine (SVM) and balanced k-means clustering—as QUBO problems, making them conducive to be trained on adiabatic quantum computers. We also analyze the computational complexities of our formulations and compare them to corresponding state-of-the-art classical approaches. We show that the time and space complexities of our formulations are better (in case of SVM and balanced k-means clustering) or equivalent (in case of linear regression) to their classical counterparts.


mSphere ◽  
2017 ◽  
Vol 2 (5) ◽  
Author(s):  
Gaorui Bian ◽  
Gregory B. Gloor ◽  
Aihua Gong ◽  
Changsheng Jia ◽  
Wei Zhang ◽  
...  

ABSTRACT We report the large-scale use of compositional data analysis to establish a baseline microbiota composition in an extremely healthy cohort of the Chinese population. This baseline will serve for comparison for future cohorts with chronic or acute disease. In addition to the expected difference in the microbiota of children and adults, we found that the microbiota of the elderly in this population was similar in almost all respects to that of healthy people in the same population who are scores of years younger. We speculate that this similarity is a consequence of an active healthy lifestyle and diet, although cause and effect cannot be ascribed in this (or any other) cross-sectional design. One surprising result was that the gut microbiota of persons in their 20s was distinct from those of other age cohorts, and this result was replicated, suggesting that it is a reproducible finding and distinct from those of other populations. The microbiota of the aged is variously described as being more or less diverse than that of younger cohorts, but the comparison groups used and the definitions of the aged population differ between experiments. The differences are often described by null hypothesis statistical tests, which are notoriously irreproducible when dealing with large multivariate samples. We collected and examined the gut microbiota of a cross-sectional cohort of more than 1,000 very healthy Chinese individuals who spanned ages from 3 to over 100 years. The analysis of 16S rRNA gene sequencing results used a compositional data analysis paradigm coupled with measures of effect size, where ordination, differential abundance, and correlation can be explored and analyzed in a unified and reproducible framework. Our analysis showed several surprising results compared to other cohorts. First, the overall microbiota composition of the healthy aged group was similar to that of people decades younger. Second, the major differences between groups in the gut microbiota profiles were found before age 20. Third, the gut microbiota differed little between individuals from the ages of 30 to >100. Fourth, the gut microbiota of males appeared to be more variable than that of females. Taken together, the present findings suggest that the microbiota of the healthy aged in this cross-sectional study differ little from that of the healthy young in the same population, although the minor variations that do exist depend upon the comparison cohort. IMPORTANCE We report the large-scale use of compositional data analysis to establish a baseline microbiota composition in an extremely healthy cohort of the Chinese population. This baseline will serve for comparison for future cohorts with chronic or acute disease. In addition to the expected difference in the microbiota of children and adults, we found that the microbiota of the elderly in this population was similar in almost all respects to that of healthy people in the same population who are scores of years younger. We speculate that this similarity is a consequence of an active healthy lifestyle and diet, although cause and effect cannot be ascribed in this (or any other) cross-sectional design. One surprising result was that the gut microbiota of persons in their 20s was distinct from those of other age cohorts, and this result was replicated, suggesting that it is a reproducible finding and distinct from those of other populations.


2014 ◽  
Vol 83 ◽  
pp. 104-115 ◽  
Author(s):  
Jimena Di Maggio ◽  
Cecilia Paulo ◽  
Vanina Estrada ◽  
Nora Perotti ◽  
Juan C. Diaz Ricci ◽  
...  

Author(s):  
Jitender Singh Virk ◽  
Syed Azmal Ali ◽  
Gurjeet Kaur

AbstractBackgroundIndia is the second-largest population in the world, and it is not well equipped, hitherto, in the scenario of the global pandemic, SARS-CoV-2 could impart a devastating impact on the Indian population. Only way to respond against this critical condition is by practicing large-scale social distancing. India lock down for 21 days, however, till 7 April 2020, SARS- CoV-2 positive cases were growing exponentially, which raises the concerns if the number of reported and actual cases are similar.MethodsWe use Lasso Regression with α = 0.12 and Polynomial features of degree 2 to predict the growth factor. Also, we predicted Logistic curve using the Prophet Python. Further, using the growth rate to logistic, and carrying capacity is 20000 allowed us to calculate the maximum cases and new cases per day.ResultsWe found the predicted growth factor with a standard deviation of 0.3443 for the upcoming days. When the growth factor becomes 1.0, which is known as Inflection point, it will be safe to state that the rate is no longer exponential. The estimated time to reach the inflection point is between 15-20 April. At that time, the estimated number of total positive cases will be over 12500, if lockdown remains continue.ConclusionsOur analysis suggests that there is an urgent need to take action to extend the period of lockdown and allocate enough resources, including personnel, beds, and intensive care facilities, to manage the situation in the next few days and weeks. Otherwise, the outbreak in India can reach the level of the USA or Italy or could be worse than these countries within a few days or weeks, given the size of the population and lack of resources.


2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Nicolas Pröllochs ◽  
Dominik Bär ◽  
Stefan Feuerriegel

AbstractEmotions are regarded as a dominant driver of human behavior, and yet their role in online rumor diffusion is largely unexplored. In this study, we empirically study the extent to which emotions explain the diffusion of online rumors. We analyze a large-scale sample of 107,014 online rumors from Twitter, as well as their cascades. For each rumor, the embedded emotions were measured based on eight so-called basic emotions from Plutchik’s wheel of emotions (i.e., anticipation–surprise, anger–fear, trust–disgust, joy–sadness). We then estimated using a generalized linear regression model how emotions are associated with the spread of online rumors in terms of (1) cascade size, (2) cascade lifetime, and (3) structural virality. Our results suggest that rumors conveying anticipation, anger, and trust generate more reshares, spread over longer time horizons, and become more viral. In contrast, a smaller size, lifetime, and virality is found for surprise, fear, and disgust. We further study how the presence of 24 dyadic emotional interactions (i.e., feelings composed of two emotions) is associated with diffusion dynamics. Here, we find that rumors cascades with high degrees of aggressiveness are larger in size, longer-lived, and more viral. Altogether, emotions embedded in online rumors are important determinants of the spreading dynamics.


2020 ◽  
Author(s):  
Jacob Bien ◽  
Xiaohan Yan ◽  
Léo Simpson ◽  
Christian L. Müller

AbstractModern high-throughput sequencing technologies provide low-cost microbiome survey data across all habitats of life at unprecedented scale. At the most granular level, the primary data consist of sparse counts of amplicon sequence variants or operational taxonomic units that are associated with taxonomic and phylogenetic group information. In this contribution, we leverage the hierarchical structure of amplicon data and propose a data-driven, parameter-free, and scalable tree-guided aggregation framework to associate microbial subcompositions with response variables of interest. The excess number of zero or low count measurements at the read level forces traditional microbiome data analysis workflows to remove rare sequencing variants or group them by a fixed taxonomic rank, such as genus or phylum, or by phylogenetic similarity. By contrast, our framework, which we call trac (tree-aggregation of compositional data), learns data-adaptive taxon aggregation levels for predictive modeling making user-defined aggregation obsolete while simultaneously integrating seamlessly into the compositional data analysis framework. We illustrate the versatility of our framework in the context of large-scale regression problems in human-gut, soil, and marine microbial ecosystems. We posit that the inferred aggregation levels provide highly interpretable taxon groupings that can help microbial ecologists gain insights into the structure and functioning of the underlying ecosystem of interest.


2016 ◽  
Vol 275 ◽  
pp. 411-421 ◽  
Author(s):  
Alvaro Frank ◽  
Diego Fabregat-Traver ◽  
Paolo Bientinesi

Sign in / Sign up

Export Citation Format

Share Document