scholarly journals Black Box Models and Sociological Explanations: Predicting GPA Using Neural Networks

2017 ◽  
Author(s):  
Thomas Davidson

The Fragile Families Challenge provided an opportunity to empirically assess the applicability of black box machine learning models to sociological questions and the extent to which interpretable explanations can be extracted from these models. In this paper I use neural network models to predict high school grade-point average and examine how variations of basic network parameters affect predictive performance. Using a recently proposed technique, I identify the most important predictive variables used by the best-performing model, finding that they relate to parenting and the child’s cognitive and behavioral development, consistent with prior work. I conclude by discussing the implications of these findings for the relationship between prediction and explanation in sociological analyses.

2019 ◽  
Vol 5 ◽  
pp. 237802311881770 ◽  
Author(s):  
Thomas Davidson

The Fragile Families Challenge provided an opportunity to empirically assess the applicability of black-box machine learning models to sociological questions and the extent to which interpretable explanations can be extracted from these models. In this article the author uses neural network models to predict high school grade point average and examines how variations of basic network parameters affect predictive performance. Using a recently proposed technique, the author identifies the most important predictive variables used by the best performing model, finding that they relate to parenting and the child’s cognitive and behavioral development, consistent with prior work. The author concludes by discussing the implications of these findings for the relationship between prediction and explanation in sociological analyses.


2018 ◽  
Vol 6 (11) ◽  
pp. 216-216 ◽  
Author(s):  
Zhongheng Zhang ◽  
◽  
Marcus W. Beck ◽  
David A. Winkler ◽  
Bin Huang ◽  
...  

Author(s):  
Luca Pasa ◽  
Nicolò Navarin ◽  
Alessandro Sperduti

AbstractGraph property prediction is becoming more and more popular due to the increasing availability of scientific and social data naturally represented in a graph form. Because of that, many researchers are focusing on the development of improved graph neural network models. One of the main components of a graph neural network is the aggregation operator, needed to generate a graph-level representation from a set of node-level embeddings. The aggregation operator is critical since it should, in principle, provide a representation of the graph that is isomorphism invariant, i.e. the graph representation should be a function of graph nodes treated as a set. DeepSets (in: Advances in neural information processing systems, pp 3391–3401, 2017) provides a framework to construct a set-aggregation operator with universal approximation properties. In this paper, we propose a DeepSets aggregation operator, based on Self-Organizing Maps (SOM), to transform a set of node-level representations into a single graph-level one. The adoption of SOMs allows to compute node representations that embed the information about their mutual similarity. Experimental results on several real-world datasets show that our proposed approach achieves improved predictive performance compared to the commonly adopted sum aggregation and many state-of-the-art graph neural network architectures in the literature.


2020 ◽  
Vol 172 ◽  
pp. 02010
Author(s):  
Louise Rævdal Lund Christensen ◽  
Thea Hauge Broholt ◽  
Michael Dahl Knudsen ◽  
Rasmus Elbæk Hedegaard ◽  
Steffen Petersen

Previous studies have identified a significant potential in using economic model predictive control for space heating. This type of control requires a thermodynamic model of the controlled building that maps certain controllable inputs (heat power) and measured disturbances (ambient temperature and solar irradiation) to the controlled output variable (room temperature). Occupancy related disturbances, such as people heat gains and venting through windows, are often completely ignored or assumed to be fully known (measured) in these studies. However, this assumption is usually not fulfilled in practice and the current simulation study investigated the consequences thereof. The results indicate that the predictive performance (root mean square errors) of a black-box state-space model is not significantly affected by ignoring people heat gains. On the other hand, the predictive performance was significantly improved by including window opening status as a model input. The performance of black-box models for MPC of space heating could therefore benefit from having inputs from sensors that tracks window opening.


2020 ◽  
Vol 34 (04) ◽  
pp. 4264-4271
Author(s):  
Siddhartha Jain ◽  
Ge Liu ◽  
Jonas Mueller ◽  
David Gifford

The inaccuracy of neural network models on inputs that do not stem from the distribution underlying the training data is problematic and at times unrecognized. Uncertainty estimates of model predictions are often based on the variation in predictions produced by a diverse ensemble of models applied to the same input. Here we describe Maximize Overall Diversity (MOD), an approach to improve ensemble-based uncertainty estimates by encouraging larger overall diversity in ensemble predictions across all possible inputs. We apply MOD to regression tasks including 38 Protein-DNA binding datasets, 9 UCI datasets, and the IMDB-Wiki image dataset. We also explore variants that utilize adversarial training techniques and data density estimation. For out-of-distribution test examples, MOD significantly improves predictive performance and uncertainty calibration without sacrificing performance on test data drawn from same distribution as the training data. We also find that in Bayesian optimization tasks, the performance of UCB acquisition is improved via MOD uncertainty estimates.


2003 ◽  
Vol 3 ◽  
pp. 455-476 ◽  
Author(s):  
Wun Wong ◽  
Peter J. Fos ◽  
Frederick E. Petry

The assessment of medical outcomes is important in the effort to contain costs, streamline patient management, and codify medical practices. As such, it is necessary to develop predictive models that will make accurate predictions of these outcomes. The neural network methodology has often been shown to perform as well, if not better, than the logistic regression methodology in terms of sample predictive performance. However, the logistic regression method is capable of providing an explanation regarding the relationship(s) between variables. This explanation is often crucial to understanding the clinical underpinnings of the disease process. Given the respective strengths of the methodologies in question, the combined use of a statistical (i.e., logistic regression) and machine learning (i.e., neural network) technology in the classification of medical outcomes is warranted under appropriate conditions. The study discusses these conditions and describes an approach for combining the strengths of the models.


2018 ◽  
Vol 27 (3) ◽  
pp. 413-431
Author(s):  
M.A. Jayaram ◽  
T.M. Kiran Kumar ◽  
H.V. Raghavendra

Abstract Software project effort estimation is one of the important aspects of software engineering. Researchers in this area are still striving hard to come out with the best predictive model that has befallen as a greatest challenge. In this work, the effort estimation for small-scale visualization projects all rendered on engineering, general science, and other allied areas developed by 60 postgraduate students in a supervised academic setting is modeled by three approaches, namely, linear regression, quadratic regression, and neural network. Seven unique parameters, namely, number of lines of code (LOC), new and change code (N&C), reuse code (R), cumulative grade point average (CGPA), cyclomatic complexity (CC), algorithmic complexity (AC), and function points (FP), which are considered to be influential in software development effort, are elicited along with actual effort. The three models are compared with respect to their prediction accuracy via the magnitude of error relative to the estimate (MER) for each project and also its mean MER (MMER) in all the projects in both the verification and validation phases. Evaluations of the models have shown MMER of 0.002, 0.006, and 0.009 during verification and 0.006, 0.002, and 0.002 during validation for the multiple linear regression, nonlinear regression, and neural network models, respectively. Thus, the marginal differences in the error estimates have indicated that the three models can be alternatively used for effort computation specific to visualization projects. Results have also suggested that parameters such as LOC, N&C, R, CC, and AC have a direct influence on effort prediction, whereas CGPA has an inverse relationship. FP seems to be neutral as far as visualization projects are concerned.


Sign in / Sign up

Export Citation Format

Share Document