feature reduction
Recently Published Documents


TOTAL DOCUMENTS

757
(FIVE YEARS 328)

H-INDEX

29
(FIVE YEARS 8)

Sensors ◽  
2022 ◽  
Vol 22 (2) ◽  
pp. 575
Author(s):  
Prabhjot Kaur ◽  
Shilpi Harnal ◽  
Rajeev Tiwari ◽  
Shuchi Upadhyay ◽  
Surbhi Bhatia ◽  
...  

Agriculture is crucial to the economic prosperity and development of India. Plant diseases can have a devastating influence towards food safety and a considerable loss in the production of agricultural products. Disease identification on the plant is essential for long-term agriculture sustainability. Manually monitoring plant diseases is difficult due to time limitations and the diversity of diseases. In the realm of agricultural inputs, automatic characterization of plant diseases is widely required. Based on performance out of all image-processing methods, is better suited for solving this task. This work investigates plant diseases in grapevines. Leaf blight, Black rot, stable, and Black measles are the four types of diseases found in grape plants. Several earlier research proposals using machine learning algorithms were created to detect one or two diseases in grape plant leaves; no one offers a complete detection of all four diseases. The photos are taken from the plant village dataset in order to use transfer learning to retrain the EfficientNet B7 deep architecture. Following the transfer learning, the collected features are down-sampled using a Logistic Regression technique. Finally, the most discriminant traits are identified with the highest constant accuracy of 98.7% using state-of-the-art classifiers after 92 epochs. Based on the simulation findings, an appropriate classifier for this application is also suggested. The proposed technique’s effectiveness is confirmed by a fair comparison to existing procedures.


2022 ◽  
Vol 9 (1) ◽  
Author(s):  
Joffrey L. Leevy ◽  
John Hancock ◽  
Taghi M. Khoshgoftaar ◽  
Jared M. Peterson

AbstractThe recent years have seen a proliferation of Internet of Things (IoT) devices and an associated security risk from an increasing volume of malicious traffic worldwide. For this reason, datasets such as Bot-IoT were created to train machine learning classifiers to identify attack traffic in IoT networks. In this study, we build predictive models with Bot-IoT to detect attacks represented by dataset instances from the Information Theft category, as well as dataset instances from the data exfiltration and keylogging subcategories. Our contribution is centered on the evaluation of ensemble feature selection techniques (FSTs) on classification performance for these specific attack instances. A group or ensemble of FSTs will often perform better than the best individual technique. The classifiers that we use are a diverse set of four ensemble learners (Light GBM, CatBoost, XGBoost, and random forest (RF)) and four non-ensemble learners (logistic regression (LR), decision tree (DT), Naive Bayes (NB), and a multi-layer perceptron (MLP)). The metrics used for evaluating classification performance are area under the receiver operating characteristic curve (AUC) and Area Under the precision-recall curve (AUPRC). For the most part, we determined that our ensemble FSTs do not affect classification performance but are beneficial because feature reduction eases computational burden and provides insight through improved data visualization.


2022 ◽  
Vol 2022 ◽  
pp. 1-17
Author(s):  
Zhihui Hu ◽  
Xiaoran Wei ◽  
Xiaoxu Han ◽  
Guang Kou ◽  
Haoyu Zhang ◽  
...  

Density peaks clustering (DPC) is a well-known density-based clustering algorithm that can deal with nonspherical clusters well. However, DPC has high computational complexity and space complexity in calculating local density ρ and distance δ , which makes it suitable only for small-scale data sets. In addition, for clustering high-dimensional data, the performance of DPC still needs to be improved. High-dimensional data not only make the data distribution more complex but also lead to more computational overheads. To address the above issues, we propose an improved density peaks clustering algorithm, which combines feature reduction and data sampling strategy. Specifically, features of the high-dimensional data are automatically extracted by principal component analysis (PCA), auto-encoder (AE), and t-distributed stochastic neighbor embedding (t-SNE). Next, in order to reduce the computational overhead, we propose a novel data sampling method for the low-dimensional feature data. Firstly, the data distribution in the low-dimensional feature space is estimated by the Quasi-Monte Carlo (QMC) sequence with low-discrepancy characteristics. Then, the representative QMC points are selected according to their cell densities. Next, the selected QMC points are used to calculate ρ and δ instead of the original data points. In general, the number of the selected QMC points is much smaller than that of the initial data set. Finally, a two-stage classification strategy based on the QMC points clustering results is proposed to classify the original data set. Compared with current works, our proposed algorithm can reduce the computational complexity from O n 2 to O N n , where N denotes the number of selected QMC points and n is the size of original data set, typically N ≪ n . Experimental results demonstrate that the proposed algorithm can effectively reduce the computational overhead and improve the model performance.


2022 ◽  
Author(s):  
Ira S. Hofer ◽  
Marina Kupina ◽  
Lori Laddaran ◽  
Eran Halperin

Abstract Introduction: Manuscripts that have successfully used machine learning (ML) to predict a variety of perioperative outcomes often use only a limited number of features selected by a clinician. We hypothesized that techniques leveraging a broad set of features for patient laboratory results, medications, and the surgical procedure name would improve performance as compared to a more limited set of features chosen by clinicians. Methods Feature vectors for laboratory results included 702 features total derived from 39 laboratory tests, medications consisted of a binary flag for 126 commonly used medications, procedure name used the Word2Vec package for create a vector of length 100. Nine models were trained: Baseline Features, one for each of the three types of data Baseline+Each data type (, all features, and then all features with feature reduction algorithm. Results Across both outcomes the models that contained all features (model 8) (Mortality ROC-AUC 94.42, PR-AUC 31.0; AKI ROC-AUC 92.47, PR-AUC 76.73) was superior to models with only subsets of features Conclusion Featurization techniques leveraging a broad away of clinical data can improve performance of perioperative prediction models.


Big data analysis applications in the field of medical image processing have recently increased rapidly. Feature reduction plays a significant role in eliminating irrelevant features and creating a successful research model for Big Data applications. Fuzzy clustering is used for the segment of the nucleus. Various features, including shape, texture, and color-based features, have been used to address the segmented nucleus. The Modified Dominance Soft Set Feature Selection Algorithm (MDSSA) is intended in this paper to determine the most important features for the classification of leukaemia images. The results of the MDSSA are evaluated using the variance analysis called ANOVA. In the dataset extracted function, the MDSSA selected 17 percent of the features that were more promising than the existing reduction algorithms. The proposed approach also reduces the time needed for further analysis of Big Data. The experimental findings confirm that the performance of the proposed reduction approach is higher than other approaches.


2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

This paper proposes a novel hybrid framework with BWO based feature reduction technique which combines the merits of both machine learning and lexicon-based approaches to attain better scalability and accuracy. The scalability problem arises due to noisy, irrelevant and unique features present in the extracted features from proposed approach, which can be eliminated by adopting an effective feature reduction technique. In our proposed BWO approach, without changing the accuracy (90%), the feature-set size is reduced up to 43%. The proposed feature selection technique outperforms other commonly used PSO and GAbased feature selection techniques with reduced computation time of 21 sec. Moreover, our sentiment analysis approach is analysed using performance metrices such as precision, recall, F-measure, and computation time. Many organizations can use these online reviews to make well-informed decisions towards the users’ interests and preferences to enhance customer satisfaction, product quality and to find the aspects to improve the products, thereby to generate more profits.


2022 ◽  
Vol 2161 (1) ◽  
pp. 012074
Author(s):  
Hemavati ◽  
V Susheela Devi ◽  
R Aparna

Abstract Nowadays, multi-label classification can be considered as one of the important challenges for classification problem. In this case instances are assigned more than one class label. Ensemble learning is a process of supervised learning where several classifiers are trained to get a better solution for a given problem. Feature reduction can be used to improve the classification accuracy by considering the class label information with principal Component Analysis (PCA). In this paper, stacked ensemble learning method with augmented class information PCA (CA PCA) is proposed for classification of multi-label data (SEMML). In the initial step, the dimensionality reduction step is applied, then the number of classifiers have to be chosen to apply on the original training dataset, then the stacking method is applied to it. By observing the results of experiments conducted are showing our proposed method is working better as compared to the existing methods.


2022 ◽  
Vol 10 (1) ◽  
pp. 0-0

Parkinson’s is the second most common neurodegenerative disorder after Alzheimer’s disease which adversely affects the nervous system of the patients. During the nascent stage, the symptoms of Parkinson’s disease are mild and sometimes go unnoticeable but as the disease progresses the symptoms go severe, so its diagnosis at an early stage is not easy. Recent research has shown that changes in speech or distortion in voice can be taken effectively used for early Parkinson’s detection. In this work, the authors propose a system of Parkinson's disease detection using speech signals. As the feature selection plays an important role during classification, authors have proposed a hybrid MIRFE feature selection approach. The result of the proposed feature selection approach is compared with the 5 standard feature selection methods by XGBoost classifier. The proposed MIRFE approach selects 40 features out of 754 features with a feature reduction ratio of 94.69%. An accuracy of 93.88% and area under curve (AUC) of 0.978 is obtained by the proposed system.


2022 ◽  
pp. 703-727
Author(s):  
Audu Musa Mabu ◽  
Rajesh Prasad ◽  
Raghav Yadav

With the progression of bioinformatics, applications of GE profiles on cancer diagnosis along with classification have become an intriguing subject in the bioinformatics field. It holds numerous genes with few samples that make it arduous to examine and process. A novel strategy aimed at the classification of GE dataset as well as clustering-centered feature selection is proposed in the paper. The proposed technique first preprocesses the dataset using normalization, and later, feature selection was accomplished with the assistance of feature clustering support vector machine (FCSVM). It has two phases, gene clustering and gene representation. To make the chose top-positioned features worthy for classification, feature reduction is performed by utilizing SVM-recursive feature elimination (SVM-RFE) algorithm. Finally, the feature-reduced data set was classified using artificial neural network (ANN) classifier. When compared with some recent swarm intelligence feature reduction approach, FCSVM-ANN showed an elegant performance.


Sign in / Sign up

Export Citation Format

Share Document