generative modeling
Recently Published Documents


TOTAL DOCUMENTS

200
(FIVE YEARS 125)

H-INDEX

13
(FIVE YEARS 8)

2022 ◽  
Vol 72 ◽  
pp. 226-236
Author(s):  
Alexey Strokach ◽  
Philip M. Kim

2022 ◽  
Author(s):  
Alexandre Perez-Lebel ◽  
Gaël Varoquaux ◽  
Marine Le Morvan ◽  
Julie Josse ◽  
Jean-Baptiste Poline

BACKGROUND As databases grow larger, it becomes harder to fully control their collection, and they frequently come with missing values: incomplete observations. These large databases are well suited to train machine-learning models, for instance for forecasting or to extract biomarkers in biomedical settings. Such predictive approaches can use discriminative --rather than generative-- modeling, and thus open the door to new missing-values strategies. Yet existing empirical evaluations of strategies to handle missing values have focused on inferential statistics. RESULTS Here we conduct a systematic benchmark of missing-values strategies in predictive models with a focus on large health databases: four electronic health record datasets, a population brain imaging one, a health survey and two intensive care ones. Using gradient-boosted trees, we compare native support for missing values with simple and state-of-the-art imputation prior to learning. We investigate prediction accuracy and computational time. For prediction after imputation, we find that adding an indicator to express which values have been imputed is important, suggesting that the data are missing not at random. Elaborate missing values imputation can improve prediction compared to simple strategies but requires longer computational time on large data. Learning trees that model missing values --with missing incorporated attribute-- leads to robust, fast, and well-performing predictive modeling. CONCLUSIONS Native support for missing values in supervised machine learning predicts better than state-of-the-art imputation with much less computational cost. When using imputation, it is important to add indicator columns expressing which values have been imputed.


2022 ◽  
Author(s):  
Shrijit Singh ◽  
Shreyansh Daftry ◽  
Roberto Capobianco

Entropy ◽  
2021 ◽  
Vol 24 (1) ◽  
pp. 59
Author(s):  
Baihan Lin

Inspired by the adaptation phenomenon of neuronal firing, we propose the regularity normalization (RN) as an unsupervised attention mechanism (UAM) which computes the statistical regularity in the implicit space of neural networks under the Minimum Description Length (MDL) principle. Treating the neural network optimization process as a partially observable model selection problem, the regularity normalization constrains the implicit space by a normalization factor, the universal code length. We compute this universal code incrementally across neural network layers and demonstrate the flexibility to include data priors such as top-down attention and other oracle information. Empirically, our approach outperforms existing normalization methods in tackling limited, imbalanced and non-stationary input distribution in image classification, classic control, procedurally-generated reinforcement learning, generative modeling, handwriting generation and question answering tasks with various neural network architectures. Lastly, the unsupervised attention mechanisms is a useful probing tool for neural networks by tracking the dependency and critical learning stages across layers and recurrent time steps of deep networks.


2021 ◽  
Vol 12 ◽  
Author(s):  
Johannes Ostner ◽  
Salomé Carcy ◽  
Christian L. Müller

Accurate generative statistical modeling of count data is of critical relevance for the analysis of biological datasets from high-throughput sequencing technologies. Important instances include the modeling of microbiome compositions from amplicon sequencing surveys and the analysis of cell type compositions derived from single-cell RNA sequencing. Microbial and cell type abundance data share remarkably similar statistical features, including their inherent compositionality and a natural hierarchical ordering of the individual components from taxonomic or cell lineage tree information, respectively. To this end, we introduce a Bayesian model for tree-aggregated amplicon and single-cell compositional data analysis (tascCODA) that seamlessly integrates hierarchical information and experimental covariate data into the generative modeling of compositional count data. By combining latent parameters based on the tree structure with spike-and-slab Lasso penalization, tascCODA can determine covariate effects across different levels of the population hierarchy in a data-driven parsimonious way. In the context of differential abundance testing, we validate tascCODA’s excellent performance on a comprehensive set of synthetic benchmark scenarios. Our analyses on human single-cell RNA-seq data from ulcerative colitis patients and amplicon data from patients with irritable bowel syndrome, respectively, identified aggregated cell type and taxon compositional changes that were more predictive and parsimonious than those proposed by other schemes. We posit that tascCODA1 constitutes a valuable addition to the growing statistical toolbox for generative modeling and analysis of compositional changes in microbial or cell population data.


2021 ◽  
Author(s):  
Tim Kucera ◽  
Matteo Togninalli ◽  
Laetitia Meng-Papaxanthos

Motivation: Protein Design has become increasingly important for medical and biotechnological applications. Because of the complex mechanisms underlying protein formation, the creation of a novel protein requires tedious and time-consuming computational or experimental protocols. At the same time, Machine Learning has enabled to solve complex problems by leveraging the large amounts of available data, more recently with great improvements on the domain of generative modeling. Yet, generative models have mainly been applied to specific sub-problems of protein design. Results: Here we approach the problem of general purpose Protein Design conditioned on functional labels of the hierarchical Gene Ontology. Since a canonical way to evaluate generative models in this domain is missing, we devise an evaluation scheme of several biologically and statistically inspired metrics. We then develop the conditional generative adversarial network ProteoGAN and show that it outperforms several classic and more recent deep learning baselines for protein sequence generation. We further give insights into the model by analysing hyperparameters and ablation baselines. Lastly, we hypothesize that a functionally conditional model could create proteins with novel functions by combining labels and provide first steps into this direction of research.


Author(s):  
Tarek M.Kamel ◽  

The passive system technique is dynamically used as an alternative to the active system, in order to minimize the peak loads and the total EUI in kWh/ m2 for any building prototype. Sun breaker or shading device is a basic traditional method and Mashrabiya previously used for privacy and reduces the heat gained and emitted from sun`s rays, in addition to the fabricated wood material and its specs of bad conductor. the study aims to investigate the effect of rotational shading devices around the y-axis, will the rotation have a significant impact on the EUI or not? The research methodology is built upon generative modeling tool of parametric design, Rhinoceros Version 6.0, with the cooperation of Grasshopper, Ladybug& Honeybee, and Toolbox. Five hundred running simulations are carried out to determine the optimal angle of rotation with maximum reduction in cooling loads, and the interpretation is 30°. Two linear regression equations are derived out of this valuable study to deduce the correlation between independent and dependent variables when the sun breaker material is matt or reflective, and how the total EUI kWh/m2 can be minimized?


2021 ◽  
Author(s):  
Jingjing Jiang ◽  
Ziyi Liu ◽  
Yifan Liu ◽  
Zhixiong Nan ◽  
Nanning Zheng

2021 ◽  
Author(s):  
Brendon R Lutnick ◽  
Pinaki Sarder

Segmentation of histology tissue whole side images is an important step for tissue analysis. Given enough annotated training data modern neural networks are capable accurate reproducible segmentation, however, the annotation of training datasets is time consuming. Techniques such as human in the loop annotation attempt to reduce this annotation burden, but still require a large amount of initial annotation. Semi-supervised learning, a technique which leverages both labeled and unlabeled data to learn features has shown promise for easing the burden of annotation. Towards this goal, we employ a recently published semi-supervised method: datasetGAN for the segmentation of glomeruli from renal biopsy images. We compare the performance of models trained using datasetGAN and traditional annotation and show that datasetGAN significantly reduces the amount of annotation required to develop a highly performing segmentation model. We also explore the usefulness of using datasetGAN for transfer learning and find that this greatly enhances the performance when a limited number of whole slide images are used for training.


2021 ◽  
Author(s):  
Holly Sullivan-Toole ◽  
Nathaniel Haines ◽  
Thomas M Olino

The current study examined whether generative modeling could improve the psychometric properties of IGT metrics compared to the traditional two-stage summary approach. Across four models, we examined how different assumptions at the person-level and the group-level affected inference. More specifically, two person-level modeling approaches (summary score vs. ORL computational model) were “crossed” against two group-level modeling approaches (two-stage approach vs. full generative modeling across both testing sessions) to create four models of increasing complexity (see Fig 1). Model 1 relies on the two-stage summary approach that is conventionally applied in studies of the IGT. Model 2 estimates a generative model version of Model 1 that jointly estimates person-level summary score (probabilities of choosing good versus bad decks) across both testing sessions while simultaneously estimating the test-retest correlation. Thus, Model 2 accounts for uncertainty in person-level estimates that Model 1 ignores but estimates a person-level metric analogous to that of Model 1. Model 3 estimates the person-level ORL parameters independently within each testing session and then estimates the test-retest correlation for each model parameter using a two-stage approach. Model 4 estimates the person-level ORL parameters jointly across both testing sessions while simultaneously estimating the test-retest correlations for each parameter. Thus, Model 4 estimates the same person-level metrics (ORL parameters) as Model 3 but accounts for uncertainty in the person-level estimates. Our overarching hypothesis was that both the use of a more theoretically informative person-level model (i.e., going from Model 1 to Model 3, and from Model 2 to Model 4) and the use of generative models to jointly estimate person-level parameters and their test-retest correlations (i.e. going from Model 1 to Model 2, and from Model 3 to Model 4) would yield behavioral estimates with increased utility for use in individual differences research. More specifically, we predicted that the behavioral estimates from Model 4 would have the highest test-retest reliability. Further, we had a general prediction that the Model 4 estimates would show improved construct validity in relation to an a priori set of trait and state self-report measures commonly associated with IGT performance as well as measures of internalizing symptoms; however, this set of analyses was largely exploratory as no particular associations between the ORL parameters and self-report measures were specified.


Sign in / Sign up

Export Citation Format

Share Document