resampling methods
Recently Published Documents


TOTAL DOCUMENTS

312
(FIVE YEARS 52)

H-INDEX

35
(FIVE YEARS 2)

Electronics ◽  
2022 ◽  
Vol 11 (2) ◽  
pp. 228
Author(s):  
Ahmad B. Hassanat ◽  
Ahmad S. Tarawneh ◽  
Samer Subhi Abed ◽  
Ghada Awad Altarawneh ◽  
Malek Alrashidi ◽  
...  

Since most classifiers are biased toward the dominant class, class imbalance is a challenging problem in machine learning. The most popular approaches to solving this problem include oversampling minority examples and undersampling majority examples. Oversampling may increase the probability of overfitting, whereas undersampling eliminates examples that may be crucial to the learning process. We present a linear time resampling method based on random data partitioning and a majority voting rule to address both concerns, where an imbalanced dataset is partitioned into a number of small subdatasets, each of which must be class balanced. After that, a specific classifier is trained for each subdataset, and the final classification result is established by applying the majority voting rule to the results of all of the trained models. We compared the performance of the proposed method to some of the most well-known oversampling and undersampling methods, employing a range of classifiers, on 33 benchmark machine learning class-imbalanced datasets. The classification results produced by the classifiers employed on the generated data by the proposed method were comparable to most of the resampling methods tested, with the exception of SMOTEFUNA, which is an oversampling method that increases the probability of overfitting. The proposed method produced results that were comparable to the Easy Ensemble (EE) undersampling method. As a result, for solving the challenge of machine learning from class-imbalanced datasets, we advocate using either EE or our method.


2021 ◽  
pp. 0193841X2110656
Author(s):  
Zachary K. Collier ◽  
Haobai Zhang ◽  
Bridgette Johnson

Background Finite mixture models cluster individuals into latent subgroups based on observed traits. However, inaccurate enumeration of clusters can have lasting implications on policy decisions and allocations of resources. Applied and methodological researchers accept no obvious best model fit statistic, and different measures could suggest different numbers of latent clusters. Objectives The purpose of this article is to evaluate and compare different cluster enumeration techniques. Research Design Study I demonstrates how recently proposed resampling methods result in no precise number of clusters on which all fit statistics agree. We recommend the pre-processing method in Study II as an alternative. Both studies used nationally representative data on working memory, cognitive flexibility, and inhibitory control. Conclusions The data plus priors method shows promise to address inconsistencies among fit measures and help applied researchers using finite mixture models in the future.


Stats ◽  
2021 ◽  
Vol 4 (4) ◽  
pp. 1091-1115
Author(s):  
Bradley Efron

This article was prepared for the Special Issue on Resampling methods for statistical inference of the 2020s. Modern algorithms such as random forests and deep learning are automatic machines for producing prediction rules from training data. Resampling plans have been the key technology for evaluating a rule’s prediction accuracy. After a careful description of the measurement of prediction error the article discusses the advantages and disadvantages of the principal methods: cross-validation, the nonparametric bootstrap, covariance penalties (Mallows’ Cp and the Akaike Information Criterion), and conformal inference. The emphasis is on a broad overview of a large subject, featuring examples, simulations, and a minimum of technical detail.


2021 ◽  
Author(s):  
Qi Ye ◽  
Tomohiro Kuroda ◽  
Tong Ruan ◽  
Wenlong Zhang ◽  
Xiaoling Ge

2021 ◽  
Vol 15 ◽  
Author(s):  
Shuhao Shi ◽  
Kai Qiao ◽  
Shuai Yang ◽  
Linyuan Wang ◽  
Jian Chen ◽  
...  

The graph neural network (GNN) has been widely used for graph data representation. However, the existing researches only consider the ideal balanced dataset, and the imbalanced dataset is rarely considered. Traditional methods such as resampling, reweighting, and synthetic samples that deal with imbalanced datasets are no longer applicable in GNN. This study proposes an ensemble model called Boosting-GNN, which uses GNNs as the base classifiers during boosting. In Boosting-GNN, higher weights are set for the training samples that are not correctly classified by the previous classifiers, thus achieving higher classification accuracy and better reliability. Besides, transfer learning is used to reduce computational cost and increase fitting ability. Experimental results indicate that the proposed Boosting-GNN model achieves better performance than graph convolutional network (GCN), GraphSAGE, graph attention network (GAT), simplifying graph convolutional networks (SGC), multi-scale graph convolution networks (N-GCN), and most advanced reweighting and resampling methods on synthetic imbalanced datasets, with an average performance improvement of 4.5%.


Author(s):  
Alexander Mielke ◽  
Bridget M. Waller ◽  
Claire Pérez ◽  
Alan V. Rincon ◽  
Julie Duboscq ◽  
...  

AbstractUnderstanding facial signals in humans and other species is crucial for understanding the evolution, complexity, and function of the face as a communication tool. The Facial Action Coding System (FACS) enables researchers to measure facial movements accurately, but we currently lack tools to reliably analyse data and efficiently communicate results. Network analysis can provide a way to use the information encoded in FACS datasets: by treating individual AUs (the smallest units of facial movements) as nodes in a network and their co-occurrence as connections, we can analyse and visualise differences in the use of combinations of AUs in different conditions. Here, we present ‘NetFACS’, a statistical package that uses occurrence probabilities and resampling methods to answer questions about the use of AUs, AU combinations, and the facial communication system as a whole in humans and non-human animals. Using highly stereotyped facial signals as an example, we illustrate some of the current functionalities of NetFACS. We show that very few AUs are specific to certain stereotypical contexts; that AUs are not used independently from each other; that graph-level properties of stereotypical signals differ; and that clusters of AUs allow us to reconstruct facial signals, even when blind to the underlying conditions. The flexibility and widespread use of network analysis allows us to move away from studying facial signals as stereotyped expressions, and towards a dynamic and differentiated approach to facial communication.


2021 ◽  
Vol 2123 (1) ◽  
pp. 012034
Author(s):  
H Khusna ◽  
M Mashuri ◽  
Wibawati

Abstract The white crystal sugar which is widely consumed sugar has two critical to qualities, namely the index of solution colour and the level of sulphur dioxide. These quality characteristics have small mean and variability shifts, as well as autocorrelation pattern. This research aims to propose residual-based Maximum Multivariate Cumulative Sum (Max-MCUSUM) control chart, one of the single control charts to monitor small shifts of mean and variability simultaneously, for monitoring the quality of white crystal sugar. The vector autoregressive (VAR) model is utilized to model the daily solution colour index and the daily sulphur dioxide level, then the residuals are monitored using Max-MCUSUM chart. The VAR-based Max-MCUSUM chart employs bootstrap, one of the nonparametric resampling methods, to estimate the control limit. The results of white crystal sugar quality control show that the processes in the last week of August 2020 need to be improved. Monitoring the white crystal sugar data using conventional control chart leads to many false alarm signals. Furthermore, the proposed control chart is more sensitive than the residual-based MEWMA and residual-based Hotelling’s T 2 charts in case of monitoring the quality of white crystal sugar.


Sign in / Sign up

Export Citation Format

Share Document