Automated classification of tropical shrub species: a hybrid of leaf shape and machine learning approach

Plants play a crucial role in foodstuff, medicine, industry, and environmental protection. The skill of recognising plants is very important in some applications, including conservation of endangered species and rehabilitation of lands after mining activities. However, it is a difficult task to identify plant species because it requires specialized knowledge. Developing an automated classification system for plant species is necessary and valuable since it can help specialists as well as the public in identifying plant species easily. Shape descriptors were applied on the myDAUN dataset that contains 45 tropical shrub species collected from the University of Malaya (UM), Malaysia. Based on literature review, this is the first study in the development of tropical shrub species image dataset and classification using a hybrid of leaf shape and machine learning approach. Four types of shape descriptors were used in this study namely morphological shape descriptors (MSD), Histogram of Oriented Gradients (HOG), Hu invariant moments (Hu) and Zernike moments (ZM). Single descriptor, as well as the combination of hybrid descriptors were tested and compared. The tropical shrub species are classified using six different classifiers, which are artificial neural network (ANN), random forest (RF), support vector machine (SVM), k-nearest neighbour (k-NN), linear discriminant analysis (LDA) and directed acyclic graph multiclass least squares twin support vector machine (DAG MLSTSVM). In addition, three types of feature selection methods were tested in the myDAUN dataset, Relief, Correlation-based feature selection (CFS) and Pearson’s coefficient correlation (PCC). The well-known Flavia dataset and Swedish Leaf dataset were used as the validation dataset on the proposed methods. The results showed that the hybrid of all descriptors of ANN outperformed the other classifiers with an average classification accuracy of 98.23% for the myDAUN dataset, 95.25% for the Flavia dataset and 99.89% for the Swedish Leaf dataset. In addition, the Relief feature selection method achieved the highest classification accuracy of 98.13% after 80 (or 60%) of the original features were reduced, from 133 to 53 descriptors in the myDAUN dataset with the reduction in computational time. Subsequently, the hybridisation of four descriptors gave the best results compared to others. It is proven that the combination MSD and HOG were good enough for tropical shrubs species classification. Hu and ZM descriptors also improved the accuracy in tropical shrubs species classification in terms of invariant to translation, rotation and scale. ANN outperformed the others for tropical shrub species classification in this study. Feature selection methods can be used in the classification of tropical shrub species, as the comparable results could be obtained with the reduced descriptors and reduced in computational time and cost.

Download Full-text

Hybrid Harmony Search–Artificial Intelligence Models in Credit Scoring

Entropy ◽

10.3390/e22090989 ◽

2020 ◽

Vol 22 (9) ◽

pp. 989

Author(s):

Rui Ying Goh ◽

Lai Soon Lee ◽

Hsin-Vonn Seow ◽

Kathiresan Gopal

Keyword(s):

Artificial Intelligence ◽

Feature Selection ◽

Credit Scoring ◽

Harmony Search ◽

Model Performance ◽

Hybrid Models ◽

Performance Model ◽

Computational Time ◽

Support Vector ◽

Artificial Intelligence Models

Credit scoring is an important tool used by financial institutions to correctly identify defaulters and non-defaulters. Support Vector Machines (SVM) and Random Forest (RF) are the Artificial Intelligence techniques that have been attracting interest due to their flexibility to account for various data patterns. Both are black-box models which are sensitive to hyperparameter settings. Feature selection can be performed on SVM to enable explanation with the reduced features, whereas feature importance computed by RF can be used for model explanation. The benefits of accuracy and interpretation allow for significant improvement in the area of credit risk and credit scoring. This paper proposes the use of Harmony Search (HS), to form a hybrid HS-SVM to perform feature selection and hyperparameter tuning simultaneously, and a hybrid HS-RF to tune the hyperparameters. A Modified HS (MHS) is also proposed with the main objective to achieve comparable results as the standard HS with a shorter computational time. MHS consists of four main modifications in the standard HS: (i) Elitism selection during memory consideration instead of random selection, (ii) dynamic exploration and exploitation operators in place of the original static operators, (iii) a self-adjusted bandwidth operator, and (iv) inclusion of additional termination criteria to reach faster convergence. Along with parallel computing, MHS effectively reduces the computational time of the proposed hybrid models. The proposed hybrid models are compared with standard statistical models across three different datasets commonly used in credit scoring studies. The computational results show that MHS-RF is most robust in terms of model performance, model explainability and computational time.

Download Full-text

Feature Selection for Text and Image Data Using Differential Evolution with SVM and Naïve Bayes Classifiers

Engineering Journal ◽

10.4186/ej.2020.24.5.161 ◽

2020 ◽

Vol 24 (5) ◽

pp. 161-172

Author(s):

Abhishek Dixit ◽

Ashish Mani ◽

Rohit Bansal

Keyword(s):

Feature Selection ◽

Differential Evolution ◽

Naive Bayes ◽

Image Data ◽

Naïve Bayes ◽

Experimental Result ◽

Computational Time ◽

Compact Representation ◽

Support Vector ◽

Text And Image

Classification problems are increasing in various important applications such as text categorization, images, medical imaging diagnosis and bimolecular analysis etc. due to large amount of attribute set. Feature extraction methods in case of large dataset play an important role to reduce the irrelevant feature and thereby increases the performance of classifier algorithm. There exist various methods based on machine learning for text and image classification. These approaches are utilized for dimensionality reduction which aims to filter less informative and outlier data. Therefore, these approaches provide compact representation and computationally better tractable accuracy. At the same time, these methods can be challenging if the search space is doubled multiple time. To optimize such challenges, a hybrid approach is suggested in this paper. The proposed approach uses differential evolution (DE) for feature selection with naïve bayes (NB) and support vector machine (SVM) classifiers to enhance the performance of selected classifier. The results are verified using text and image data which reflects improved accuracy compared with other conventional techniques. A 25 benchmark datasets (UCI) from different domains are considered to test the proposed algorithms. A comparative study between proposed hybrid classification algorithms are presented in this work. Finally, the experimental result shows that the differential evolution with NB classifier outperforms and produces better estimation of probability terms. The proposed technique in terms of computational time is also feasible.

Download Full-text

EXPLOITATION OF SPECTRAL AND TEMPORAL INFORMATION FOR MAPPING PLANT SPECIES IN A FORMER INDUSTRIAL SITE

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b3-2021-559-2021 ◽

2021 ◽

Vol XLIII-B3-2021 ◽

pp. 559-566

Author(s):

R. Gimenez ◽

G. Lassalle ◽

R. Hédacq ◽

A. Elger ◽

D. Dubucq ◽

...

Keyword(s):

Time Series ◽

Feature Selection ◽

Plant Species ◽

Vegetation Indices ◽

Principal Component ◽

Temporal Information ◽

Support Vector ◽

Species Discrimination ◽

High Temporal Resolution ◽

Periodic Monitoring

Abstract. Characterization and seasonal (periodic) monitoring of plant species distribution in the context of former industrial activity is crucial to assess long-term anthropogenic footprint on vegetated area. Species discrimination has shown promising results using both HyperSpectral (HS) and MultiSpectral (MS) images. Airborne HS instruments enable high spatial and spectral resolution imagery while time series of satellite MS images provide high temporal resolution and phenological information. This paper aims to compare supervised classification results obtained with non-parametric (Random Forest, RF, Support Vector Machine, SVM) and parametric methods (Regularized Logistic Regression, RLR) applied on both kinds of images acquired on an industrial brownfield. The studied site is a complex vegetated environment due to species diversity: 8 dominant species are retained. The performance obtained by preliminary feature selection based on principal component analysis and vegetation indices, to improve separability of spectral or temporal information according to species, is analysed. The best performance is obtained by RLR method applied on HS data without feature selection (global accuracy of 93 %). Feature selection is found to be a necessary step to perform classification with time series of MS images. Species that are difficult to distinguish from the HS image, namely Salix and Populus, are well separated using Sentinel-2 images (precision around 70%).

Download Full-text

Leaf Recognition for Plant Classification Using Direct Acyclic Graph Based Multi-Class Least Squares Twin Support Vector Machine

International Journal of Image and Graphics ◽

10.1142/s0219467816500121 ◽

2016 ◽

Vol 16 (03) ◽

pp. 1650012 ◽

Cited By ~ 6

Author(s):

Divya Tomar ◽

Sonali Agarwal

Keyword(s):

Support Vector Machine ◽

Least Squares ◽

Plant Species ◽

Recognition System ◽

Twin Support Vector Machine ◽

Support Vector ◽

Individual Plant ◽

Species Classification ◽

Acyclic Graph ◽

Direct Acyclic Graph

As most of the plant species are at the risk of extinction, the task of plant identification has become a challenging process and an active area of research. In this paper, we propose a leaf recognition system for plant species classification using leaf image data through a novel direct acyclic graph based multi-class least squares twin support vector machine (DAG-MLSTSVM) classifier. Hybrid feature selection (HFS) approach is used to obtain the best discriminant features for the recognition of individual plant species. Leaves are recognized on the basis of shape and texture features. The experimental results indicate that the proposed DAG-MLSTSVM based plant leaf recognition system is highly accurate and having faster processing speed as compared to artificial neural network and direct acyclic graph based support vector machine.

Download Full-text

Plant Species Classification Using Leaf Shape and Texture

2012 International Conference on Industrial Control and Electronics Engineering ◽

10.1109/icicee.2012.538 ◽

2012 ◽

Cited By ~ 3

Author(s):

Hang Zhang ◽

Paul Yanne ◽

Shangsong Liang

Keyword(s):

Plant Species ◽

Leaf Shape ◽

Species Classification

Download Full-text

An Aggregated Mutual Information Based Feature Selection with Machine Learning Methods for Enhancing IoT Botnet Attack Detection

Sensors ◽

10.3390/s22010185 ◽

2021 ◽

Vol 22 (1) ◽

pp. 185

Author(s):

Mohammed Al-Sarem ◽

Faisal Saeed ◽

Eman H. Alkhammash ◽

Norah Saleh Alghamdi

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Mutual Information ◽

Intrusion Detection ◽

Intrusion Detection Systems ◽

Computational Time ◽

Support Vector ◽

Learning Methods ◽

Detection Systems ◽

Machine Learning Methods

Due to the wide availability and usage of connected devices in Internet of Things (IoT) networks, the number of attacks on these networks is continually increasing. A particularly serious and dangerous type of attack in the IoT environment is the botnet attack, where the attackers can control the IoT systems to generate enormous networks of “bot” devices for generating malicious activities. To detect this type of attack, several Intrusion Detection Systems (IDSs) have been proposed for IoT networks based on machine learning and deep learning methods. As the main characteristics of IoT systems include their limited battery power and processor capacity, maximizing the efficiency of intrusion detection systems for IoT networks is still a research challenge. It is important to provide efficient and effective methods that use lower computational time and have high detection rates. This paper proposes an aggregated mutual information-based feature selection approach with machine learning methods to enhance detection of IoT botnet attacks. In this study, the N-BaIoT benchmark dataset was used to detect botnet attack types using real traffic data gathered from nine commercial IoT devices. The dataset includes binary and multi-class classifications. The feature selection method incorporates Mutual Information (MI) technique, Principal Component Analysis (PCA) and ANOVA f-test at finely-granulated detection level to select the relevant features for improving the performance of IoT Botnet classifiers. In the classification step, several ensemble and individual classifiers were used, including Random Forest (RF), XGBoost (XGB), Gaussian Naïve Bayes (GNB), k-Nearest Neighbor (k-NN), Logistic Regression (LR) and Support Vector Machine (SVM). The experimental results showed the efficiency and effectiveness of the proposed approach, which outperformed other techniques using various evaluation metrics.

Download Full-text

Enhancing Big Data Feature Selection Using a Hybrid Correlation-Based Feature Selection

Electronics ◽

10.3390/electronics10232984 ◽

2021 ◽

Vol 10 (23) ◽

pp. 2984

Author(s):

Masurah Mohamad ◽

Ali Selamat ◽

Ondrej Krejcar ◽

Ruben Gonzalez Crespo ◽

Enrique Herrera-Viedma ◽

...

Keyword(s):

Feature Selection ◽

Big Data ◽

Data Analysis ◽

Selection Process ◽

Data Extraction ◽

Computational Time ◽

Support Vector ◽

The Neural Network ◽

Correlation Based Feature Selection ◽

The One

This study proposes an alternate data extraction method that combines three well-known feature selection methods for handling large and problematic datasets: the correlation-based feature selection (CFS), best first search (BFS), and dominance-based rough set approach (DRSA) methods. This study aims to enhance the classifier’s performance in decision analysis by eliminating uncorrelated and inconsistent data values. The proposed method, named CFS-DRSA, comprises several phases executed in sequence, with the main phases incorporating two crucial feature extraction tasks. Data reduction is first, which implements a CFS method with a BFS algorithm. Secondly, a data selection process applies a DRSA to generate the optimized dataset. Therefore, this study aims to solve the computational time complexity and increase the classification accuracy. Several datasets with various characteristics and volumes were used in the experimental process to evaluate the proposed method’s credibility. The method’s performance was validated using standard evaluation measures and benchmarked with other established methods such as deep learning (DL). Overall, the proposed work proved that it could assist the classifier in returning a significant result, with an accuracy rate of 82.1% for the neural network (NN) classifier, compared to the support vector machine (SVM), which returned 66.5% and 49.96% for DL. The one-way analysis of variance (ANOVA) statistical result indicates that the proposed method is an alternative extraction tool for those with difficulties acquiring expensive big data analysis tools and those who are new to the data analysis field.

Download Full-text

A novel approach for selective feature mechanism for two-phase intrusion detection system

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v14.i1.pp101-112 ◽

2019 ◽

Vol 14 (1) ◽

pp. 101

Author(s):

B Narendra Kumar ◽

M S V Sivarama Bhadri Raju ◽

B Vishnu Vardhan

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Performance Metrics ◽

Detection System ◽

Computational Time ◽

Support Vector ◽

Phase System ◽

Second Phase ◽

Two Phase ◽

Linear Dependency

Intrusion Detection is an important aspect to secure the computing systems from different intrusions. To improve the accuracy and to reduce the computational time, this paper proposes a two-phase hybrid method based on the SVM and RNN. In addition, this paper also had a proposal to obtain a few sets of features with a feature selection technique in which the detection performance increases. For the two-phase system, two different feature selection techniques were proposed which solves both the linear dependency and non-linear dependency between the features. In the first phase, the RNN combines with the proposed Joint Mutual Information Maximization (JMIM) based feature selection and in the second phase, the Support Vector Machine (SVM) combines with correlation based feature selection. Extensive simulations are carried out over the proposed system using two different datasets, NSL-KDD and Kyoto2006+. The performance is measured through the performance metrics such as Detection Rate (DR), Precision, False Alarm Rate (FAR), Accuracy and F-Score. Furthermore, a comparative analysis with few recent hybrid frameworks is also enumerated. The obtained results signify the effectiveness of proposed method.

Download Full-text

KLASIFIKASI DIABETIC RETINOPATHY MENGGUNAKAN SELEKSI FITUR DAN SUPPORT VECTOR MACHINE

Jurnal RESISTOR (Rekayasa Sistem Komputer) ◽

10.31598/jurnalresistor.v1i2.312 ◽

2018 ◽

Vol 1 (2) ◽

pp. 109-117

Author(s):

Muhammad Imron Rosadi ◽

Cahya Bagus Sanjaya ◽

Lukman Hakim

Keyword(s):

Support Vector Machine ◽

Feature Extraction ◽

Feature Selection ◽

Diabetic Retinopathy ◽

Selection Process ◽

Computational Time ◽

Support Vector ◽

Features Selection ◽

Kernel Parameters ◽

Selection Of

Diabetic Retinopathy is a disease common complications of diabetes mellitus. The complications in the form of damages on the part of the retina of the eye. The high levels of glucose in the blood are the cause of small capillaries become broke and can lead to blindness. The symptoms shown by the sufferers of Diabetic Retinopaythy (DR), among others, microaneurysms, hemorrhages, exudates, soft hard exudate and neovascularization. These symptoms are at a certain intensity can be an indicator of the phase (the level of severity) DR sufferers. There are four stages of the process of pattern recognition, namely preprocessing,feature ekstraction, feature selection and classification. On preprocessing the image do Change the RGB image into Green channel, image Adaptive Histogram Equalization, removal of blood vessels, removal of optic disks, detection of exudate. A collection from the results of preprocessing placed in the vector of characteristics by using the feature extraction of GLCM consisting of order 1 and 2, to order then conducted as input Support Vector Machine (SVM). While in SVM there are three issues that emerged, namely; How to select a kernel function, what is the optimal number of input features, and how to determine the best kernel parameters. These issues are important, because the number of features affect the required kernel parameters values and vice versa, so that the selection of the features required in building the classification system. On the research of feature extraction methods was presented GLCM, features selection, and SVM for detecting diabetic retinopathy. feature selection process using the F-Score feature to select the results of features extraction. From the results of the selection of these features is used to input the classification. The dataset used amounted to 50 data, which is divided into 2 classes, where 25 sets taken from normal retinal scans and 25 sets of the rest of the scan of the retina with diabetic retinopathy. SVM classification with feature selection to increase accuracy and computational time than lose without a selection of features with a value of 90% accuracy and computational time 0.010 seconds.

Download Full-text

Detection of white blood cells using optimized qGWO

Intelligent Decision Technologies ◽

10.3233/idt-200055 ◽

2021 ◽

Vol 15 (1) ◽

pp. 141-149

Author(s):

Prerna Sharma ◽

Moolchand Sharma ◽

Divij Gupta ◽

Nimisha Mittal

Keyword(s):

Feature Selection ◽

Human Body ◽

Optimization Algorithm ◽

Blood Cells ◽

White Blood Cells ◽

Computational Time ◽

Support Vector ◽

Grey Wolf ◽

Grey Wolf Optimization ◽

Minimal Set

This paper presents an optimized quantum Grey Wolf Optimization algorithm (qGWO), which is an enhanced version of the Grey Wolf optimization algorithm for feature selection of blood cells, which can further used for the detection of WBCs. White blood cells count in the human body determines the immune system of the human body. A deviation in the count of WBCs from the normal cell count in the human body may indicate an abnormal condition. The proposed model uses a quantum grey wolf optimization algorithm for the detection of White Blood cells among the dataset of various types of blood cells. Quantum Grey Wolf algorithm is used to find the minimal set of optimal features from the set of available features to detect the White Blood Cells optimally. As the ordinary Grey Wolf Optimization algorithm also used to find the minimal set of optimal features, but the features selected by qGWO are better in terms of computational time. Further, several classification algorithms such as Support Vector Machine (SVM), Random Forest algorithm, K Nearest Neighbor(KNN) algorithm are applied to the model to predict its accuracy for the selected subset of features after feature selection. The performance of several classifiers is compared, and the model attained the maximum accuracy of 97.8% using KNN with minimum computational time. The result obtained shows that the algorithm proposed is capable of finding an optimal subset of features and maximizing the accuracy.

Download Full-text