Hyperparameter Tuning for Machine Learning Algorithms Used for Arabic Sentiment Analysis

Enas Elgeldawi; Awny Sayed; Ahmed R. Galal; Alaa M. Zaki

doi:10.3390/informatics8040079

Hyperparameter Tuning for Machine Learning Algorithms Used for Arabic Sentiment Analysis

Informatics ◽

10.3390/informatics8040079 ◽

2021 ◽

Vol 8 (4) ◽

pp. 79

Author(s):

Enas Elgeldawi ◽

Awny Sayed ◽

Ahmed R. Galal ◽

Alaa M. Zaki

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Bayesian Optimization ◽

Support Vector ◽

Learning Models ◽

Before And After ◽

Tuning Process ◽

Machine Learning Models

Machine learning models are used today to solve problems within a broad span of disciplines. If the proper hyperparameter tuning of a machine learning classifier is performed, significantly higher accuracy can be obtained. In this paper, a comprehensive comparative analysis of various hyperparameter tuning techniques is performed; these are Grid Search, Random Search, Bayesian Optimization, Particle Swarm Optimization (PSO), and Genetic Algorithm (GA). They are used to optimize the accuracy of six machine learning algorithms, namely, Logistic Regression (LR), Ridge Classifier (RC), Support Vector Machine Classifier (SVC), Decision Tree (DT), Random Forest (RF), and Naive Bayes (NB) classifiers. To test the performance of each hyperparameter tuning technique, the machine learning models are used to solve an Arabic sentiment classification problem. Sentiment analysis is the process of detecting whether a text carries a positive, negative, or neutral sentiment. However, extracting such sentiment from a complex derivational morphology language such as Arabic has been always very challenging. The performance of all classifiers is tested using our constructed dataset both before and after the hyperparameter tuning process. A detailed analysis is described, along with the strengths and limitations of each hyperparameter tuning technique. The results show that the highest accuracy was given by SVC both before and after the hyperparameter tuning process, with a score of 95.6208 obtained when using Bayesian Optimization.

Download Full-text

Machine Learning Models for Finger Bend Evaluation using Implemented Low cost Flex Sensor

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35742 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 3605-3611

Author(s):

Pratyush Kaware

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Low Cost ◽

Learning Algorithms ◽

Cost Effective ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Models ◽

Machine Learning Models

In this paper a cost-effective sensor has been implemented to read finger bend signals, by attaching the sensor to a finger, so as to classify them based on the degree of bent as well as the joint about which the finger was being bent. This was done by testing with various machine learning algorithms to get the most accurate and consistent classifier. Finally, we found that Support Vector Machine was the best algorithm suited to classify our data, using we were able predict live state of a finger, i.e., the degree of bent and the joints involved. The live voltage values from the sensor were transmitted using a NodeMCU micro-controller which were converted to digital and uploaded on a database for analysis.

Download Full-text

The Tabu_Genetic Algorithm: A Novel Method for Hyper-Parameter Optimization of Learning Algorithms

Electronics ◽

10.3390/electronics8050579 ◽

2019 ◽

Vol 8 (5) ◽

pp. 579 ◽

Cited By ~ 6

Author(s):

Baosu Guo ◽

Jingwen Hu ◽

Wenwen Wu ◽

Qingjin Peng ◽

Fenghe Wu

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Parameter Optimization ◽

Random Search ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Parameters Optimization ◽

Bayesian Optimization ◽

Learning Models ◽

Machine Learning Models

Machine learning algorithms have been widely used to deal with a variety of practical problems such as computer vision and speech processing. But the performance of machine learning algorithms is primarily affected by their hyper-parameters, as without good hyper-parameter values the performance of these algorithms will be very poor. Unfortunately, for complex machine learning models like deep neural networks, it is very difficult to determine their hyper-parameters. Therefore, it is of great significance to develop an efficient algorithm for hyper-parameter automatic optimization. In this paper, a novel hyper-parameter optimization methodology is presented to combine the advantages of a Genetic Algorithm and Tabu Search to achieve the efficient search for hyper-parameters of learning algorithms. This method is defined as the Tabu_Genetic Algorithm. In order to verify the performance of the proposed algorithm, two sets of contrast experiments are conducted. The Tabu_Genetic Algorithm and other four methods are simultaneously used to search for good values of hyper-parameters of deep convolutional neural networks. Experimental results show that, compared to Random Search and Bayesian optimization methods, the proposed Tabu_Genetic Algorithm finds a better model in less time. Whether in a low-dimensional or high-dimensional space, the Tabu_Genetic Algorithm has better search capabilities as an effective method for finding the hyper-parameters of learning algorithms. The presented method in this paper provides a new solution for solving the hyper-parameters optimization problem of complex machine learning models, which will provide machine learning algorithms with better performance when solving practical problems.

Download Full-text

Short-Term Electricity Generation Forecasting Using Machine Learning Algorithms: A Case Study of the Benin Electricity Community (C.E.B)

TH Wildau Engineering and Natural Sciences Proceedings ◽

10.52825/thwildauensp.v1i.25 ◽

2021 ◽

Vol 1 ◽

Author(s):

Agbassou Guenoupkati ◽

Adekunlé Akim Salami ◽

Mawugno Koffi Kodjo ◽

Kossi Napo

Keyword(s):

Machine Learning ◽

Time Series ◽

Linear Regression ◽

Performance Metrics ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Models ◽

Short Term ◽

Machine Learning Models

Time series forecasting in the energy sector is important to power utilities for decision making to ensure the sustainability and quality of electricity supply, and the stability of the power grid. Unfortunately, the presence of certain exogenous factors such as weather conditions, electricity price complicate the task using linear regression models that are becoming unsuitable. The search for a robust predictor would be an invaluable asset for electricity companies. To overcome this difficulty, Artificial Intelligence differs from these prediction methods through the Machine Learning algorithms which have been performing over the last decades in predicting time series on several levels. This work proposes the deployment of three univariate Machine Learning models: Support Vector Regression, Multi-Layer Perceptron, and the Long Short-Term Memory Recurrent Neural Network to predict the electricity production of Benin Electricity Community. In order to validate the performance of these different methods, against the Autoregressive Integrated Mobile Average and Multiple Regression model, performance metrics were used. Overall, the results show that the Machine Learning models outperform the linear regression methods. Consequently, Machine Learning methods offer a perspective for short-term electric power generation forecasting of Benin Electricity Community sources.

Download Full-text

Execution Assessment of Machine Learning Algorithms for Spam Profile Detection on Instagram

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/561032021 ◽

2021 ◽

Vol 10 (3) ◽

pp. 1889-1894

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Tools ◽

Learning Models ◽

K Nearest Neighbor

Witheverypassingsecondsocialnetworkcommunityisgrowingrapidly,becauseofthat,attackershaveshownkeeninterestinthesekindsofplatformsandwanttodistributemischievouscontentsontheseplatforms.Withthefocus on introducing new set of characteristics and features forcounteractivemeasures,agreatdealofstudieshasresearchedthe possibility of lessening the malicious activities on social medianetworks. This research was to highlight features for identifyingspammers on Instagram and additional features were presentedto improve the performance of different machine learning algorithms. Performance of different machine learning algorithmsnamely, Multilayer Perceptron (MLP), Random Forest (RF), K-Nearest Neighbor (KNN) and Support Vector Machine (SVM)were evaluated on machine learning tools named, RapidMinerand WEKA. The results from this research tells us that RandomForest (RF) outperformed all other selected machine learningalgorithmsonbothselectedmachinelearningtools.OverallRandom Forest (RF) provided best results on RapidMiner. Theseresultsareusefulfortheresearcherswhoarekeentobuildmachine learning models to find out the spamming activities onsocialnetworkcommunities.

Download Full-text

Imputation of PaO2 from SpO2 values from the MIMIC-III Critical Care Database Using Machine-Learning Based Algorithms

10.1101/2021.04.21.21255877 ◽

2021 ◽

Cited By ~ 1

Author(s):

Shuangxia Ren ◽

Jill Zupetic ◽

Mehdi Nouraie ◽

Xinghua Lu ◽

Richard D. Boyce ◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Superior Performance ◽

Learning Models ◽

Mechanically Ventilated ◽

Mechanically Ventilated Patients ◽

Mimic Iii ◽

Ventilated Patients ◽

Machine Learning Models

AbstractBackgroundThe partial pressure of oxygen (PaO2)/fraction of oxygen delivered (FIO2) ratio is the reference standard for assessment of hypoxemia in mechanically ventilated patients. Non-invasive monitoring with the peripheral saturation of oxygen (SpO2) is increasingly utilized to estimate PaO2 because it does not require invasive sampling. Several equations have been reported to impute PaO2/FIO2 from SpO2 /FIO2. However, machine-learning algorithms to impute the PaO2 from the SpO2 has not been compared to published equations.Research QuestionHow do machine learning algorithms perform at predicting the PaO2 from SpO2 compared to previously published equations?MethodsThree machine learning algorithms (neural network, regression, and kernel-based methods) were developed using 7 clinical variable features (n=9,900 ICU events) and subsequently 3 features (n=20,198 ICU events) as input into the models from data available in mechanically ventilated patients from the Medical Information Mart for Intensive Care (MIMIC) III database. As a regression task, the machine learning models were used to impute PaO2 values. As a classification task, the models were used to predict patients with moderate-to-severe hypoxemic respiratory failure based on a clinically relevant cut-off of PaO2/FIO2 ≤ 150. The accuracy of the machine learning models was compared to published log-linear and non-linear equations. An online imputation calculator was created.ResultsCompared to seven features, three features (SpO2, FiO2 and PEEP) were sufficient to impute PaO2/FIO2 ratio using a large dataset. Any of the tested machine learning models enabled imputation of PaO2/FIO2 from the SpO2/FIO2 with lower error and had greater accuracy in predicting PaO2/FIO2 ≤ 150 compared to published equations. Using three features, the machine learning models showed superior performance in imputing PaO2 across the entire span of SpO2 values, including those ≥ 97%.InterpretationThe improved performance shown for the machine learning algorithms suggests a promising framework for future use in large datasets.

Download Full-text

Validation of Machine Learning Models for Structural Dam Behaviour Interpretation and Prediction

Water ◽

10.3390/w13192717 ◽

2021 ◽

Vol 13 (19) ◽

pp. 2717

Author(s):

Juan Mata ◽

Fernando Salazar ◽

José Barateiro ◽

António Antunes

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Arch Dam ◽

Boosted Regression Trees ◽

Structural Safety ◽

Support Vector ◽

Learning Models ◽

Structural Behaviour ◽

Safety Control ◽

Machine Learning Models

The main aim of structural safety control is the multiple assessments of the expected dam behaviour based on models and the measurements and parameters that characterise the dam’s response and condition. In recent years, there is an increase in the use of data-based models for the analysis and interpretation of the structural behaviour of dams. Multiple Linear Regression is the conventional, widely used approach in dam engineering, although interesting results have been published based on machine learning algorithms such as artificial neural networks, support vector machines, random forest, and boosted regression trees. However, these models need to be carefully developed and properly assessed before their application in practice. This is even more relevant when an increase in users of machine learning models is expected. For this reason, this paper presents extensive work regarding the verification and validation of data-based models for the analysis and interpretation of observed dam’s behaviour. This is presented by means of the development of several machine learning models to interpret horizontal displacements in an arch dam in operation. Several validation techniques are applied, including historical data validation, sensitivity analysis, and predictive validation. The results are discussed and conclusions are drawn regarding the practical application of data-based models.

Download Full-text

Sentiment Analysis and Topic Modeling on Tweets about Online Education during COVID-19

Applied Sciences ◽

10.3390/app11188438 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8438

Author(s):

Muhammad Mujahid ◽

Ernesto Lee ◽

Furqan Rustam ◽

Patrick Bernard Washington ◽

Saleem Ullah ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Online Education ◽

Sentiment Analysis ◽

Topic Modeling ◽

Support Vector ◽

Learning Approaches ◽

Learning Models ◽

E Learning ◽

Machine Learning Models

Amid the worldwide COVID-19 pandemic lockdowns, the closure of educational institutes leads to an unprecedented rise in online learning. For limiting the impact of COVID-19 and obstructing its widespread, educational institutions closed their campuses immediately and academic activities are moved to e-learning platforms. The effectiveness of e-learning is a critical concern for both students and parents, specifically in terms of its suitability to students and teachers and its technical feasibility with respect to different social scenarios. Such concerns must be reviewed from several aspects before e-learning can be adopted at such a larger scale. This study endeavors to investigate the effectiveness of e-learning by analyzing the sentiments of people about e-learning. Due to the rise of social media as an important mode of communication recently, people’s views can be found on platforms such as Twitter, Instagram, Facebook, etc. This study uses a Twitter dataset containing 17,155 tweets about e-learning. Machine learning and deep learning approaches have shown their suitability, capability, and potential for image processing, object detection, and natural language processing tasks and text analysis is no exception. Machine learning approaches have been largely used both for annotation and text and sentiment analysis. Keeping in view the adequacy and efficacy of machine learning models, this study adopts TextBlob, VADER (Valence Aware Dictionary for Sentiment Reasoning), and SentiWordNet to analyze the polarity and subjectivity score of tweets’ text. Furthermore, bearing in mind the fact that machine learning models display high classification accuracy, various machine learning models have been used for sentiment classification. Two feature extraction techniques, TF-IDF (Term Frequency-Inverse Document Frequency) and BoW (Bag of Words) have been used to effectively build and evaluate the models. All the models have been evaluated in terms of various important performance metrics such as accuracy, precision, recall, and F1 score. The results reveal that the random forest and support vector machine classifier achieve the highest accuracy of 0.95 when used with Bow features. Performance comparison is carried out for results of TextBlob, VADER, and SentiWordNet, as well as classification results of machine learning models and deep learning models such as CNN (Convolutional Neural Network), LSTM (Long Short Term Memory), CNN-LSTM, and Bi-LSTM (Bidirectional-LSTM). Additionally, topic modeling is performed to find the problems associated with e-learning which indicates that uncertainty of campus opening date, children’s disabilities to grasp online education, and lagging efficient networks for online education are the top three problems.

Download Full-text

A Comparative Analysis of Machine Learning Algorithms to Predict Alzheimer’s Disease

Journal of Healthcare Engineering ◽

10.1155/2021/9917919 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Morshedul Bari Antor ◽

A. H. M. Shafayet Jamil ◽

Maliha Mamtaz ◽

Mohammad Monirujjaman Khan ◽

Sultan Aljahdali ◽

...

Keyword(s):

Machine Learning ◽

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Support Vector Machine ◽

Machine Learning Algorithms ◽

Learning System ◽

Fine Tuning ◽

Support Vector ◽

Learning Models ◽

Machine Learning Models

Alzheimer’s disease has been one of the major concerns recently. Around 45 million people are suffering from this disease. Alzheimer’s is a degenerative brain disease with an unspecified cause and pathogenesis which primarily affects older people. The main cause of Alzheimer’s disease is Dementia, which progressively damages the brain cells. People lost their thinking ability, reading ability, and many more from this disease. A machine learning system can reduce this problem by predicting the disease. The main aim is to recognize Dementia among various patients. This paper represents the result and analysis regarding detecting Dementia from various machine learning models. The Open Access Series of Imaging Studies (OASIS) dataset has been used for the development of the system. The dataset is small, but it has some significant values. The dataset has been analyzed and applied in several machine learning models. Support vector machine, logistic regression, decision tree, and random forest have been used for prediction. First, the system has been run without fine-tuning and then with fine-tuning. Comparing the results, it is found that the support vector machine provides the best results among the models. It has the best accuracy in detecting Dementia among numerous patients. The system is simple and can easily help people by detecting Dementia among them.

Download Full-text

Detection of Online Fake News Using Blending Ensemble Learning

Scientific Programming ◽

10.1155/2021/3434458 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Arvin Hansrajh ◽

Timothy T. Adeliyi ◽

Jeanette Wing

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Machine Learning Algorithms ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Fake News ◽

Learning Models ◽

Linear Discriminant ◽

Proposed Model ◽

Machine Learning Models

The exponential growth in fake news and its inherent threat to democracy, public trust, and justice has escalated the necessity for fake news detection and mitigation. Detecting fake news is a complex challenge as it is intentionally written to mislead and hoodwink. Humans are not good at identifying fake news. The detection of fake news by humans is reported to be at a rate of 54% and an additional 4% is reported in the literature as being speculative. The significance of fighting fake news is exemplified during the present pandemic. Consequently, social networks are ramping up the usage of detection tools and educating the public in recognising fake news. In the literature, it was observed that several machine learning algorithms have been applied to the detection of fake news with limited and mixed success. However, several advanced machine learning models are not being applied, although recent studies are demonstrating the efﬁcacy of the ensemble machine learning approach; hence, the purpose of this study is to assist in the automated detection of fake news. An ensemble approach is adopted to help resolve the identified gap. This study proposed a blended machine learning ensemble model developed from logistic regression, support vector machine, linear discriminant analysis, stochastic gradient descent, and ridge regression, which is then used on a publicly available dataset to predict if a news report is true or not. The proposed model will be appraised with the popular classical machine learning models, while performance metrics such as AUC, ROC, recall, accuracy, precision, and f1-score will be used to measure the performance of the proposed model. Results presented showed that the proposed model outperformed other popular classical machine learning models.

Download Full-text

Surface Motion Prediction and Mapping for Road Infrastructures Management by PS-InSAR Measurements and Machine Learning Algorithms

Remote Sensing ◽

10.3390/rs12233976 ◽

2020 ◽

Vol 12 (23) ◽

pp. 3976

Author(s):

Nicholas Fiorentini ◽

Mehdi Maboudi ◽

Pietro Leandri ◽

Massimo Losa ◽

Markus Gerke

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Central Italy ◽

Machine Learning Algorithms ◽

Boosted Regression Trees ◽

Bayesian Optimization ◽

Support Vector ◽

Surface Motion ◽

Persistent Scatterers ◽

Taylor Diagram

This paper introduces a methodology for predicting and mapping surface motion beneath road pavement structures caused by environmental factors. Persistent Scatterer Interferometric Synthetic Aperture Radar (PS-InSAR) measurements, geospatial analyses, and Machine Learning Algorithms (MLAs) are employed for achieving the purpose. Two single learners, i.e., Regression Tree (RT) and Support Vector Machine (SVM), and two ensemble learners, i.e., Boosted Regression Trees (BRT) and Random Forest (RF) are utilized for estimating the surface motion ratio in terms of mm/year over the Province of Pistoia (Tuscany Region, central Italy, 964 km2), in which strong subsidence phenomena have occurred. The interferometric process of 210 Sentinel-1 images from 2014 to 2019 allows exploiting the average displacements of 52,257 Persistent Scatterers as output targets to predict. A set of 29 environmental-related factors are preprocessed by SAGA-GIS, version 2.3.2, and ESRI ArcGIS, version 10.5, and employed as input features. Once the dataset has been prepared, three wrapper feature selection approaches (backward, forward, and bi-directional) are used for recognizing the set of most relevant features to be used in the modeling. A random splitting of the dataset in 70% and 30% is implemented to identify the training and test set. Through a Bayesian Optimization Algorithm (BOA) and a 10-Fold Cross-Validation (CV), the algorithms are trained and validated. Therefore, the Predictive Performance of MLAs is evaluated and compared by plotting the Taylor Diagram. Outcomes show that SVM and BRT are the most suitable algorithms; in the test phase, BRT has the highest Correlation Coefficient (0.96) and the lowest Root Mean Square Error (0.44 mm/year), while the SVM has the lowest difference between the standard deviation of its predictions (2.05 mm/year) and that of the reference samples (2.09 mm/year). Finally, algorithms are used for mapping surface motion over the study area. We propose three case studies on critical stretches of two-lane rural roads for evaluating the reliability of the procedure. Road authorities could consider the proposed methodology for their monitoring, management, and planning activities.

Download Full-text