unbalanced classification
Recently Published Documents


TOTAL DOCUMENTS

19
(FIVE YEARS 11)

H-INDEX

4
(FIVE YEARS 1)

2021 ◽  
pp. 1-26
Author(s):  
Wenbin Pei ◽  
Bing Xue ◽  
Lin Shang ◽  
Mengjie Zhang

Abstract High-dimensional unbalanced classification is challenging because of the joint effects of high dimensionality and class imbalance. Genetic programming (GP) has the potential benefits for use in high-dimensional classification due to its built-in capability to select informative features. However, once data is not evenly distributed, GP tends to develop biased classifiers which achieve a high accuracy on the majority class but a low accuracy on the minority class. Unfortunately, the minority class is often at least as important as the majority class. It is of importance to investigate how GP can be effectively utilized for high-dimensional unbalanced classification. In this paper, to address the performance bias issue of GP, a new two-criterion fitness function is developed, which considers two criteria, i.e. the approximation of area under the curve (AUC) and the classification clarity (i.e. how well a program can separate two classes). The obtained values on the two criteria are combined in pairs, instead of summing them together. Furthermore, this paper designs a three-criterion tournament selection to effectively identify and select good programs to be used by genetic operators for generating better offspring during the evolutionary learning process. The experimental results show that the proposed method achieves better classification performance than other compared methods.


2021 ◽  
Vol 5 (1) ◽  
pp. 105-116
Author(s):  
Qorry Meidianingsih ◽  
Debby Agustine

The problems of imbalanced class classification have been found in many real applications. It has potential to make the minority class instances tend to be classified into the majority class. This study examined the performance of bagging method’s application in safe-level SMOTE based on Support Vector Machine classifier. The data used consisted of three types based on the proportion of observations in the majority and minority classes. Each type of data has three variables, two independent variables and one variable dependent. The observations of independent variables were generated based on multivariate normal distribution, while dependent variables are binary. The results showed that the classifier has a high accuracy and sensitivity for all types of data for both in the imbalanced class and the balanced class (obtained by safe-level SMOTE and safe-level SMOTEBagging). Nevertheless, specificity was the main measure in assessing the performance of the classifier because it provides accuracy in classifying the minority class observations. The specificity increased when the number of observations between the two classes were approximately balance due to the implementation of safe-level SMOTE. The best performance of the Support Vector Machine in predicting minority class observations was achieved when bagging were applied in safe-level SMOTE. The specificity rate for all types of data were 77.93 percent, 78.46 percent, and 85.69 percent, respectively.


2020 ◽  
Vol 16 (1) ◽  
pp. 32-48
Author(s):  
Wei Cong

Using the ensemble learning method to mine valuable information from a sea of financial data accumulated on the market of financial securities is very important for studying data processing. On the basis of financial data from A-share companies listed on Shanghai Stock Market, this article takes the perspective of unbalanced classification of ST stocks to carry out a study of the construction of a financial warning model for the listed companies. In our experiment, HDRF (HDRandom Forest, Hellinger Distance based Random Forest), ensemble classification models of Bagging, AdaBoost, and Rotation Forest, which take Hellinger distance decision tree (HDDT) as the base classifier, and the ensemble classification model which takes the C4.5 decision tree as the base classifier, are compared in respect of both the area under the ROC curve and the F-measure. As shown in the experimental results, the HDRF and the HDDT based classifier, as an ensemble method, are effective for financial data of listed companies.


Sign in / Sign up

Export Citation Format

Share Document