Assessment of feature selection for student academic performance through machine learning classification

2019 ◽  
Vol 22 (4) ◽  
pp. 729-739 ◽  
Author(s):  
R. Suguna ◽  
M. Shyamala Devi ◽  
Rupali Amit Bagate ◽  
Aparna Shashikant Joshi
Author(s):  
M. Nirmala

Abstract: Data Mining in Educational System has increased tremendously in the past and still increasing in present era. This study focusses on the academic stand point and the performance of the student is evaluated by various parameters such as Scholastic Features, Demographic Features and Emotional Features are carried out. Various Machine learning methodologies are adopted to extract the masked knowledge from the educational data set provided, which helps in identifying the features giving more impact to the student academic performance and there by knowing the impacting features, helps us to predict deeper insights about student performance in academics. Various Machine learning workflow starting from problem definition to Model Prediction has been carried out in this study. The supervised learning methodology has been adopted and various Feature engineering methods has been adopted to make the ML model appropriate for training and evaluation. It is a prediction problem and various Classification algorithms such as Logistic Regression, Random Forest, SVM, KNN, XGBOOST, Decision Tree modelling has been done to fit the student data appropriately. Keywords: Scholastic, Demographic, Emotional, Logistic Regression, Random Forest, SVM, KNN, XGBOOST, Decision Tree.


2015 ◽  
Vol 8 (7) ◽  
pp. 5419-5435 ◽  
Author(s):  
W. Paja ◽  
M. Wrzesień ◽  
R. Niemiec ◽  
W. R. Rudnicki

Abstract. The climate models are extremely complex pieces of software. They reflect best knowledge on physical components of the climate, nevertheless, they contain several parameters, which are too weakly constrained by observations, and can potentially lead to a crash of simulation. Recently a study by Lucas et al. (2013) has shown that machine learning methods can be used for predicting which combinations of parameters can lead to crash of simulation, and hence which processes described by these parameters need refined analyses. In the current study we reanalyse the dataset used in this research using different methodology. We confirm the main conclusion of the original study concerning suitability of machine learning for prediction of crashes. We show, that only three of the eight parameters indicated in the original study as relevant for prediction of the crash are indeed strongly relevant, three other are relevant but redundant, and two are not relevant at all. We also show that the variance due to split of data between training and validation sets has large influence both on accuracy of predictions and relative importance of variables, hence only cross-validated approach can deliver robust prediction of performance and relevance of variables.


Sign in / Sign up

Export Citation Format

Share Document