Employee's Attrition Prediction Using Machine Learning Approaches

Author(s):  
Krishna Kumar Mohbey

In any industry, attrition is a big problem, whether it is about employee attrition of an organization or customer attrition of an e-commerce site. If we can accurately predict which customer or employee will leave their current company or organization, then it will save much time, effort, and cost of the employer and help them to hire or acquire substitutes in advance, and it would not create a problem in the ongoing progress of an organization. In this chapter, a comparative analysis between various machine learning approaches such as Naïve Bayes, SVM, decision tree, random forest, and logistic regression is presented. The presented result will help us in identifying the behavior of employees who can be attired over the next time. Experimental results reveal that the logistic regression approach can reach up to 86% accuracy over other machine learning approaches.

2021 ◽  
Vol 2096 (1) ◽  
pp. 012190
Author(s):  
E V Bunyaeva ◽  
I V Kuznetsov ◽  
Y V Ponomarchuk ◽  
P S Timosh

Abstract The paper considers comparative analysis results of the machine learning methods used for the gesture recognition based on the surface single-channel electromyography (sEMG) data. The data were processed using multilayer perceptron, support vector machine, decision tree ensemble (Random Forest) and logistic regression for the chosen four gesture types. The conclusion was derived on the analysis efficiency of these methods using commonly recommended accuracy metrics.


2021 ◽  
Vol 2076 (1) ◽  
pp. 012045
Author(s):  
Aimin Li ◽  
Meng Fan ◽  
Guangduo Qin

Abstract There are many traditional methods available for water body extraction based on remote sensing images, such as normalised difference water index (NDWI), modified NDWI (MNDWI), and the multi-band spectrum method, but the accuracy of these methods is limited. In recent years, machine learning algorithms have developed rapidly and been applied widely. Using Landsat-8 images, models such as decision tree, logistic regression, a random forest, neural network, support vector method (SVM), and Xgboost were adopted in the present research within machine learning algorithms. Based on this, through cross validation and a grid search method, parameters were determined for each model.Moreover, the merits and demerits of several models in water body extraction were discussed and a comparative analysis was performed with three methods for determining thresholds in the traditional NDWI. The results show that the neural network has excellent performances and is a stable model, followed by the SVM and the logistic regression algorithm. Furthermore, the ensemble algorithms including the random forest and Xgboost were affected by sample distribution and the model of the decision tree returned the poorest performance.


2007 ◽  
Vol 16 (06) ◽  
pp. 1001-1014 ◽  
Author(s):  
PANAGIOTIS ZERVAS ◽  
IOSIF MPORAS ◽  
NIKOS FAKOTAKIS ◽  
GEORGE KOKKINAKIS

This paper presents and discusses the problem of emotion recognition from speech signals with the utilization of features bearing intonational information. In particular parameters extracted from Fujisaki's model of intonation are presented and evaluated. Machine learning models were build with the utilization of C4.5 decision tree inducer, instance based learner and Bayesian learning. The datasets utilized for the purpose of training machine learning models were extracted from two emotional databases of acted speech. Experimental results showed the effectiveness of Fujisaki's model attributes since they enhanced the recognition process for most of the emotion categories and learning approaches helping to the segregation of emotion categories.


2018 ◽  
Vol 7 (3.34) ◽  
pp. 291
Author(s):  
M Malleswari ◽  
R.J Manira ◽  
Praveen Kumar ◽  
Murugan .

 Big data analytics has been the focus for large scale data processing. Machine learning and Big data has great future in prediction. Churn prediction is one of the sub domain of big data. Preventing customer attrition especially in telecom is the advantage of churn prediction.  Churn prediction is a day-to-day affair involving millions. So a solution to prevent customer attrition can save a lot. This paper propose to do comparison of three machine learning techniques Decision tree algorithm, Random Forest algorithm and Gradient Boosted tree algorithm using Apache Spark. Apache Spark is a data processing engine used in big data which provides in-memory processing so that the processing speed is higher. The analysis is made by extracting the features of the data set and training the model. Scala is a programming language that combines both object oriented and functional programming and so a powerful programming language. The analysis is implemented using Apache Spark and modelling is done using scala ML. The accuracy of Decision tree model came out as 86%, Random Forest model is 87% and Gradient Boosted tree is 85%. 


Cardiovascular diseases are one of the main causes of mortality in the world. A proper prediction mechanism system with reasonable cost can significantly reduce this death toll in the low-income countries like Bangladesh. For those countries we propose machine learning backed embedded system that can predict possible cardiac attack effectively by excluding the high cost angiogram and incorporating only twelve (12) low cost features which are age, sex, chest pain, blood pressure, cholesterol, blood sugar, ECG results, heart rate, exercise induced angina, old peak, slope, and history of heart disease. Here, two heart disease datasets of own built NICVD (National Institute of Cardiovascular Disease, Bangladesh) patients’, and UCI (University of California Irvin) are used. The overall process comprises into four phases: Comprehensive literature review, collection of stable angina patients’ data through survey questionnaires from NICVD, feature vector dimensionality is reduced manually (from 14 to 12 dimensions), and the reduced feature vector is fed to machine learning based classifiers to obtain a prediction model for the heart disease. From the experiments, it is observed that the proposed investigation using NICVD patient’s data with 12 features without incorporating angiographic disease status to Artificial Neural Network (ANN) shows better classification accuracy of 92.80% compared to the other classifiers Decision Tree (82.50%), Naïve Bayes (85%), Support Vector Machine (SVM) (75%), Logistic Regression (77.50%), and Random Forest (75%) using the 10-fold cross validation. To accommodate small scale training and test data in our experimental environment we have observed the accuracy of ANN, Decision Tree, Naïve Bayes, SVM, Logistic Regression and Random Forest using Jackknife method, which are 84.80%, 71%, 75.10%, 75%, 75.33% and 71.42% respectively. On the other hand, the classification accuracies of the corresponding classifiers are 91.7%, 76.90%, 86.50%, 76.3%, 67.0% and 67.3%, respectively for the UCI dataset with 12 attributes. Whereas the same dataset with 14 attributes including angiographic status shows the accuracies 93.5%, 76.7%, 86.50%, 76.8%, 67.7% and 69.6% for the respective classifiers


Author(s):  
M. Nirmala

Abstract: Data Mining in Educational System has increased tremendously in the past and still increasing in present era. This study focusses on the academic stand point and the performance of the student is evaluated by various parameters such as Scholastic Features, Demographic Features and Emotional Features are carried out. Various Machine learning methodologies are adopted to extract the masked knowledge from the educational data set provided, which helps in identifying the features giving more impact to the student academic performance and there by knowing the impacting features, helps us to predict deeper insights about student performance in academics. Various Machine learning workflow starting from problem definition to Model Prediction has been carried out in this study. The supervised learning methodology has been adopted and various Feature engineering methods has been adopted to make the ML model appropriate for training and evaluation. It is a prediction problem and various Classification algorithms such as Logistic Regression, Random Forest, SVM, KNN, XGBOOST, Decision Tree modelling has been done to fit the student data appropriately. Keywords: Scholastic, Demographic, Emotional, Logistic Regression, Random Forest, SVM, KNN, XGBOOST, Decision Tree.


MATICS ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 21-27
Author(s):  
Via Ardianto Nugroho ◽  
Derry Pramono Adi ◽  
Achmad Teguh Wibowo ◽  
MY Teguh Sulistyono ◽  
Agustinus Bimo Gumelar

Pada industri jasa pelayanan peti kemas, Terminal Nilam merupakan pelanggan dari PT. BIMA, yang secara khusus bergerak dibidang jasa perbaikan dan perawatan alat berat. Terminal ini menjadi sentral tempat untuk melakukan aktifitas bongkar muat peti kemas domestik yang memiliki empat buah container crane untuk melayani dua kapal. Proses perawatan alat berat seperti container crane yang selama ini beroperasi, agaknya kurang memperhatikan data pengelompokkan atau klasifikasi jenis perawatan yang dibutuhkan oleh alat berat tersebut. Di kemudian hari, alat berat dapat menunjukkan kinerja yang tidak maksimal bahkan dapat berujung pada kecelakaan kerja. Selain itu, kelalaian perawatan container crane juga dapat menyebabkan pembengkakan biaya perawatan lanjut. Target produksi bongkar muat dapat berkurang dan juga keterlambatan jadwal kapal sandar sangat mungkin terjadi. Metode pembelajaran menggunakan mesin atau biasa disebut dengan Machine Learning (ML), dengan mudah dapat melenyapkan kemungkinan-kemungkinan tersebut. ML dalam penelitian ini, kami rancang agar bekerja dengan mengidentifikasi lalu mengelompokkan jenis perawatan container crane yang sesuai, yaitu ringan atau berat. Metode ML yang pilih untuk digunakan dalam penelitian ini yaitu Random Forest, Support Vector Machine, k-Nearest Neighbor, Naïve Bayes, Logistic Regression, J48, dan Decision Tree. Penelitian ini menunjukkan keberhasilan ML model tree dalam melakukan pembelajaran jenis data perawatan container crane (numerik dan kategoris), dengan J48 menunjukkan performa terbaik dengan nilai akurasi dan nilai ROC-AUC mencapai 99,1%. Pertimbangan klasifikasi kami lakukan dengan mengacu kepada tanggal terakhir perawatan, hour meter, breakdown, shutdown, dan sparepart.


2019 ◽  
Vol 8 (4) ◽  
pp. 1477-1483

With the fast moving technological advancement, the internet usage has been increased rapidly in all the fields. The money transactions for all the applications like online shopping, banking transactions, bill settlement in any industries, online ticket booking for travel and hotels, Fees payment for educational organization, Payment for treatment to hospitals, Payment for super market and variety of applications are using online credit card transactions. This leads to the fraud usage of other accounts and transaction that result in the loss of service and profit to the institution. With this background, this paper focuses on predicting the fraudulent credit card transaction. The Credit Card Transaction dataset from KAGGLE machine learning Repository is used for prediction analysis. The analysis of fraudulent credit card transaction is achieved in four ways. Firstly, the relationship between the variables of the dataset is identified and represented by the graphical notations. Secondly, the feature importance of the dataset is identified using Random Forest, Ada boost, Logistic Regression, Decision Tree, Extra Tree, Gradient Boosting and Naive Bayes classifiers. Thirdly, the extracted feature importance if the credit card transaction dataset is fitted to Random Forest classifier, Ada boost classifier, Logistic Regression classifier, Decision Tree classifier, Extra Tree classifier, Gradient Boosting classifier and Naive Bayes classifier. Fourth, the Performance Analysis is done by analyzing the performance metrics like Accuracy, FScore, AUC Score, Precision and Recall. The implementation is done by python in Anaconda Spyder Navigator Integrated Development Environment. Experimental Results shows that the Decision Tree classifier have achieved the effective prediction with the precision of 1.0, recall of 1.0, FScore of 1.0 , AUC Score of 89.09 and Accuracy of 99.92%.


2021 ◽  
Vol 5 (1) ◽  
pp. 35
Author(s):  
Uttam Narendra Thakur ◽  
Radha Bhardwaj ◽  
Arnab Hazra

Disease diagnosis through breath analysis has attracted significant attention in recent years due to its noninvasive nature, rapid testing ability, and applicability for patients of all ages. More than 1000 volatile organic components (VOCs) exist in human breath, but only selected VOCs are associated with specific diseases. Selective identification of those disease marker VOCs using an array of multiple sensors are highly desirable in the current scenario. The use of efficient sensors and the use of suitable classification algorithms is essential for the selective and reliable detection of those disease markers in complex breath. In the current study, we fabricated a noble metal (Au, Pd and Pt) nanoparticle-functionalized MoS2 (Chalcogenides, Sigma Aldrich, St. Louis, MO, USA)-based sensor array for the selective identification of different VOCs. Four sensors, i.e., pure MoS2, Au/MoS2, Pd/MoS2, and Pt/MoS2 were tested under exposure to different VOCs, such as acetone, benzene, ethanol, xylene, 2-propenol, methanol and toluene, at 50 °C. Initially, principal component analysis (PCA) and linear discriminant analysis (LDA) were used to discriminate those seven VOCs. As compared to the PCA, LDA was able to discriminate well between the seven VOCs. Four different machine learning algorithms such as k-nearest neighbors (kNN), decision tree, random forest, and multinomial logistic regression were used to further identify those VOCs. The classification accuracy of those seven VOCs using KNN, decision tree, random forest, and multinomial logistic regression was 97.14%, 92.43%, 84.1%, and 98.97%, respectively. These results authenticated that multinomial logistic regression performed best between the four machine learning algorithms to discriminate and differentiate the multiple VOCs that generally exist in human breath.


Author(s):  
Joshua J. Levy ◽  
A. James O’Malley

AbstractBackgroundMachine learning approaches have become increasingly popular modeling techniques, relying on data-driven heuristics to arrive at its solutions. Recent comparisons between these algorithms and traditional statistical modeling techniques have largely ignored the superiority gained by the former approaches due to involvement of model-building search algorithms. This has led to alignment of statistical and machine learning approaches with different types of problems and the under-development of procedures that combine their attributes. In this context, we hoped to understand the domains of applicability for each approach and to identify areas where a marriage between the two approaches is warranted. We then sought to develop a hybrid statistical-machine learning procedure with the best attributes of each.MethodsWe present three simple examples to illustrate when to use each modeling approach and posit a general framework for combining them into an enhanced logistic regression model building procedure that aids interpretation. We study 556 benchmark machine learning datasets to uncover when machine learning techniques outperformed rudimentary logistic regression models and so are potentially well-equipped to enhance them. We illustrate a software package, InteractionTransformer, which embeds logistic regression with advanced model building capacity by using machine learning algorithms to extract candidate interaction features from a random forest model for inclusion in the model. Finally, we apply our enhanced logistic regression analysis to two real-word biomedical examples, one where predictors vary linearly with the outcome and another with extensive second-order interactions.ResultsPreliminary statistical analysis demonstrated that across 556 benchmark datasets, the random forest approach significantly outperformed the logistic regression approach. We found a statistically significant increase in predictive performance when using hybrid procedures and greater clarity in the association with the outcome of terms acquired compared to directly interpreting the random forest output.ConclusionsWhen a random forest model is closer to the true model, hybrid statistical-machine learning procedures can substantially enhance the performance of statistical procedures in an automated manner while preserving easy interpretation of the results. Such hybrid methods may help facilitate widespread adoption of machine learning techniques in the biomedical setting.


Sign in / Sign up

Export Citation Format

Share Document