A fuzzy gaussian rank aggregation ensemble feature selection method for microarray data

Author(s):  
B. Venkatesh ◽  
J. Anuradha

In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.

Author(s):  
Gang Liu ◽  
Chunlei Yang ◽  
Sen Liu ◽  
Chunbao Xiao ◽  
Bin Song

A feature selection method based on mutual information and support vector machine (SVM) is proposed in order to eliminate redundant feature and improve classification accuracy. First, local correlation between features and overall correlation is calculated by mutual information. The correlation reflects the information inclusion relationship between features, so the features are evaluated and redundant features are eliminated with analyzing the correlation. Subsequently, the concept of mean impact value (MIV) is defined and the influence degree of input variables on output variables for SVM network based on MIV is calculated. The importance weights of the features described with MIV are sorted by descending order. Finally, the SVM classifier is used to implement feature selection according to the classification accuracy of feature combination which takes MIV order of feature as a reference. The simulation experiments are carried out with three standard data sets of UCI, and the results show that this method can not only effectively reduce the feature dimension and high classification accuracy, but also ensure good robustness.


2018 ◽  
Vol 29 (1) ◽  
pp. 1122-1134
Author(s):  
H. M. Keerthi Kumar ◽  
B. S. Harish

Abstract In recent internet era, micro-blogging sites produce enormous amount of short textual information, which appears in the form of opinions or sentiments of users. Sentiment analysis is a challenging task in short text, due to use of formal language, misspellings, and shortened forms of words, which leads to high dimensionality and sparsity. In order to deal with these challenges, this paper proposes a novel, simple, and yet effective feature selection method, to select frequently distributed features related to each class. In this paper, the feature selection method is based on class-wise information, to identify the relevant feature related to each class. We evaluate the proposed feature selection method by comparing with existing feature selection methods like chi-square ( χ2), entropy, information gain, and mutual information. The performances are evaluated using classification accuracy obtained from support vector machine, K nearest neighbors, and random forest classifiers on two publically available datasets viz., Stanford Twitter dataset and Ravikiran Janardhana dataset. In order to demonstrate the effectiveness of the proposed feature selection method, we conducted extensive experimentation by selecting different feature sets. The proposed feature selection method outperforms the existing feature selection methods in terms of classification accuracy on the Stanford Twitter dataset. Similarly, the proposed method performs competently equally in terms of classification accuracy compared to other feature selection methods in most of the feature subsets on Ravikiran Janardhana dataset.


2021 ◽  
Vol 40 (1) ◽  
pp. 535-550
Author(s):  
Ashis Kumar Mandal ◽  
Rikta Sen ◽  
Basabi Chakraborty

The fundamental aim of feature selection is to reduce the dimensionality of data by removing irrelevant and redundant features. As finding out the best subset of features from all possible subsets is computationally expensive, especially for high dimensional data sets, meta-heuristic algorithms are often used as a promising method for addressing the task. In this paper, a variant of recent meta-heuristic approach Owl Search Optimization algorithm (OSA) has been proposed for solving the feature selection problem within a wrapper-based framework. Several strategies are incorporated with an aim to strengthen BOSA (binary version of OSA) in searching the global best solution. The meta-parameter of BOSA is initialized dynamically and then adjusted using a self-adaptive mechanism during the search process. Besides, elitism and mutation operations are combined with BOSA to control the exploitation and exploration better. This improved BOSA is named in this paper as Modified Binary Owl Search Algorithm (MBOSA). Decision Tree (DT) classifier is used for wrapper based fitness function, and the final classification performance of the selected feature subset is evaluated by Support Vector Machine (SVM) classifier. Simulation experiments are conducted on twenty well-known benchmark datasets from UCI for the evaluation of the proposed algorithm, and the results are reported based on classification accuracy, the number of selected features, and execution time. In addition, BOSA along with three common meta-heuristic algorithms Binary Bat Algorithm (BBA), Binary Particle Swarm Optimization (BPSO), and Binary Genetic Algorithm (BGA) are used for comparison. Simulation results show that the proposed approach outperforms similar methods by reducing the number of features significantly while maintaining a comparable level of classification accuracy.


2022 ◽  
Vol 65 (1) ◽  
pp. 75-86
Author(s):  
Parth C. Upadhyay ◽  
John A. Lory ◽  
Guilherme N. DeSouza ◽  
Timotius A. P. Lagaunne ◽  
Christine M. Spinka

HighlightsA machine learning framework estimated residue cover in RGB images taken at three resolutions from 88 locations.The best results primarily used texture features, the RFE-SVM feature selection method, and the SVM classifier.Accounting for shadows and plants plus modifying and optimizing the texture features may improve performance.An automated system developed using machine learning is a viable strategy to estimate residue cover from RGB images obtained with handheld or UAV platforms.Abstract. Maintaining plant residue on the soil surface contributes to sustainable cultivation of arable land. Applying machine learning methods to RGB images of residue could overcome the subjectivity of manual methods. The objectives of this study were to use supervised machine learning while identifying the best feature selection method, the best classifier, and the most effective image feature types for classifying residue levels in RGB imagery. Imagery was collected from 88 locations in 40 row-crop fields in five Missouri counties between early May and late June in 2018 and 2019 using a tripod-mounted camera (0.014 cm pixel-1 ground sampling distance, GSD) and an unmanned aerial vehicle (UAV, 0.05 and 0.14 GSD). At each field location, 50 contiguous 0.3 × 0.2 m region of interest (ROI) images were extracted from the imagery, resulting in a dataset of 4,400 ROI images at each GSD. Residue percentages for ground truth were estimated using a bullseye grid method (n = 100 points) based on the 0.014 GSD images. Representative color, texture, and shape features were extracted and evaluated using four feature selection methods and two classifiers. Recursive feature elimination using support vector machine (RFE-SVM) was the best feature selection method, and the SVM classifier performed best for classifying the amount of residue as a three-class problem. The best features for this application were associated with texture, with local binary pattern (LBP) features being the most prevalent for all three GSDs. Shape features were irrelevant. The three residue classes were correctly identified with 88%, 84%, and 81% 10-fold cross-validation scores for the 2018 training data and 81%, 69%, and 65% accuracy for the 2019 testing data in decreasing resolution order. Converting image-wise data (0.014 GSD) to location residue estimates using a Bayesian model showed good agreement with the location-based ground truth (r2 = 0.90). This initial assessment documents the use of RGB images to match other methods of estimating residue, with potential to replace or be used as a quality control for line-transect assessments. Keywords: Feature selection, Soil erosion, Support vector machine, Texture features, Unmanned aerial vehicle.


2021 ◽  
Author(s):  
Chunyuan Wang ◽  
Yatao Zhang ◽  
Xinge Jiang ◽  
Feifei Liu ◽  
Zhimin Zhang ◽  
...  

Abstract This paper proposed a feature selection method combined with multi-time-scales analysis and heart rate variability (HRV) analysis for middle and early diagnosis of congestive heart failure (CHF). In previous studies regarding the diagnosis of CHF, researchers have tended to increase the variety of HRV features by searching for new ones or to use different machine learning algorithms to optimize the classification of CHF and normal sinus rhythms subject (NSR). In fact, the full utilization of traditional HRV features can also improve classification accuracy. The proposed method constructs a multi-time-scales feature matrix according to traditional HRV features that exhibit good stability in multiple time-scales and differences in different time-scales. The multi-scales features yield better performance than the traditional single-time-scales features when the features are fed into a support vector machine (SVM) classifier, and the results of the SVM classifier exhibit a sensitivity, a specificity, and an accuracy of 99.52%, 100.00%, and 99.83%, respectively. These results indicate that the proposed feature selection method can effectively reduce redundant features and computational load when used for automatic diagnosis of CHF.


2021 ◽  
Vol 16 ◽  
pp. 121-132
Author(s):  
Ghada AL-Rawashdeh ◽  
Rabiei Bin Mamat ◽  
Jawad Hammad Rawashdeh

Email is one of the most economical and fast communication means in recent years; however, there has been a high increase in the rate of spam emails in recent times due to the increased number of email users. Emails are mainly classified into spam and non-spam categories using data mining classification techniques. This paper provides a description and comparative for the evaluation of effective classifiers using three algorithms - namely k-nearest neighbor, Naive Bayesian, and support vector machine. Seven spam email datasets were used to conducted experiment in the MATLAB environment without using any feature selection method. The simulation results showed SVM classifier to achieve a better classification accuracy compared to the K-NN and NB.


2020 ◽  
pp. 407-421
Author(s):  
Noria Bidi ◽  
Zakaria Elberrichi

This article presents a new adaptive algorithm called FS-SLOA (Feature Selection-Seven Spot Ladybird Optimization Algorithm) which is a meta-heuristic feature selection method based on the foraging behavior of a seven spot ladybird. The new efficient technique has been applied to find the best subset features, which achieves the highest accuracy in classification using three classifiers: the Naive Bayes (NB), the Nearest Neighbors (KNN) and the Support Vector Machine (SVM). The authors' proposed approach has been experimented on four well-known benchmark datasets (Wisconsin Breast cancer, Pima Diabetes, Mammographic Mass, and Dermatology datasets) taken from the UCI machine learning repository. Experimental results prove that the classification accuracy of FS-SLOA is the best performing for different datasets.


Electronics ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 114
Author(s):  
Fitriani Muttakin ◽  
Jui-Tang Wang ◽  
Mulyanto Mulyanto ◽  
Jenq-Shiou Leu

Artificial intelligence, particularly machine learning, is the fastest-growing research trend in educational fields. Machine learning shows an impressive performance in many prediction models, including psychosocial education. The capability of machine learning to discover hidden patterns in large datasets encourages researchers to invent data with high-dimensional features. In contrast, not all features are needed by machine learning, and in many cases, high-dimensional features decrease the performance of machine learning. The feature selection method is one of the appropriate approaches to reducing the features to ensure machine learning works efficiently. Various selection methods have been proposed, but research to determine the essential subset feature in psychosocial education has not been established thus far. This research investigated and proposed methods to determine the best feature selection method in the domain of psychosocial education. We used a multi-criteria decision system (MCDM) approach with Additive Ratio Assessment (ARAS) to rank seven feature selection methods. The proposed model evaluated the best feature selection method using nine criteria from the performance metrics provided by machine learning. The experimental results showed that the ARAS is promising for evaluating and recommending the best feature selection method for psychosocial education data using the teacher’s psychosocial risk levels dataset.


2020 ◽  
Vol 4 (1) ◽  
pp. 29
Author(s):  
Sasan Sarbast Abdulkhaliq ◽  
Aso Mohammad Darwesh

Nowadays, people from every part of the world use social media and social networks to express their feelings toward different topics and aspects. One of the trendiest social media is Twitter, which is a microblogging website that provides a platform for its users to share their views and feelings about products, services, events, etc., in public. Which makes Twitter one of the most valuable sources for collecting and analyzing data by researchers and developers to reveal people sentiment about different topics and services, such as products of commercial companies, services, well-known people such as politicians and athletes, through classifying those sentiments into positive and negative. Classification of people sentiment could be automated through using machine learning algorithms and could be enhanced through using appropriate feature selection methods. We collected most recent tweets about (Amazon, Trump, Chelsea FC, CR7) using Twitter-Application Programming Interface and assigned sentiment score using lexicon rule-based approach, then proposed a machine learning model to improve classification accuracy through using hybrid feature selection method, namely, filter-based feature selection method Chi-square (Chi-2) plus wrapper-based binary coordinate ascent (Chi-2 + BCA) to select optimal subset of features from term frequency-inverse document frequency (TF-IDF) generated features for classification through support vector machine (SVM), and Bag of words generated features for logistic regression (LR) classifiers using different n-gram ranges. After comparing the hybrid (Chi-2+BCA) method with (Chi-2) selected features, and also with the classifiers without feature subset selection, results show that the hybrid feature selection method increases classification accuracy in all cases. The maximum attained accuracy with LR is 86.55% using (1 + 2 + 3-g) range, with SVM is 85.575% using the unigram range, both in the CR7 dataset.


Sign in / Sign up

Export Citation Format

Share Document