Implementasi teknik seleksi fitur pada klasifikasi malware Android menggunakan support vector machine (SVM)

Repositor ◽  
2019 ◽  
Vol 1 (1) ◽  
pp. 1
Author(s):  
Hendra Saputra ◽  
Setio Basuki ◽  
Mahar Faiqurahman

AbstrakPertumbuhan Malware Android telah meningkat secara signifikan seiring dengan majunya jaman dan meninggkatnya keragaman teknik dalam pengembangan Android. Teknik Machine Learning adalah metode yang saat ini bisa kita gunakan dalam memodelkan pola fitur statis dan dinamis dari Malware Android. Dalam tingkat keakurasian dari klasifikasi jenis Malware peneliti menghubungkan antara fitur aplikasi dengan fitur yang dibutuhkan dari setiap jenis kategori Malware. Kategori jenis Malware yang digunakan merupakan jenis Malware yang banyak beredar saat ini. Untuk mengklasifikasi jenis Malware pada penelitian ini digunakan Support Vector Machine (SVM). Jenis SVM yang akan digunakan adalah class SVM one against one menggunakan Kernel RBF. Fitur yang akan dipakai dalam klasifikasi ini adalah Permission dan Broadcast Receiver. Untuk meningkatkan akurasi dari hasil klasifikasi pada penelitian ini digunakan metode Seleksi Fitur. Seleksi Fitur yang digunakan ialah Correlation-based Feature  Selection (CSF), Gain Ratio (GR) dan Chi-Square (CHI). Hasil dari Seleksi Fitur akan di evaluasi bersama dengan hasil yang tidak menggunakan Seleksi Fitur. Akurasi klasifikasi Seleksi Fitur CFS menghasilkan akurasi sebesar 90.83% , GR dan CHI sebesar 91.25% dan data yang tidak menggunakan Seleksi Fitur sebesar 91.67%. Hasil dari pengujian menunjukan bahwa Permission dan Broadcast Receiver bisa digunakan dalam mengklasifikasi jenis Malware, akan tetapi metode Seleksi Fitur yang digunakan mempunyai akurasi yang berada sedikit dibawah data yang tidak menggunakan Seleksi Fitur. Kata kunci: klasifikasi malware android, seleksi fitur, SVM dan multi class SVM one agains one  Abstract Android Malware has growth significantly along with the advance of the times and the increasing variety of technique in the development of Android. Machine Learning technique is a method that now we can use in the modeling the pattern of a static and dynamic feature of Android Malware. In the level of accuracy of the Malware type classification, the researcher connect between the application feature with the feature required by each types of Malware category. The category of malware used is a type of Malware that many circulating today, to classify the type of Malware in this study used Support Vector Machine (SVM). The SVM type wiil be used is class SVM one against one using the RBF Kernel. The feature will be used in this classification are the Permission and Broadcast Receiver.  To improve the accuracy of the classification result in this study used Feature Selection method. Selection of feature used are Correlation-based Feature Selection (CFS), Gain Ratio (GR) and Chi-Square (CHI). Result from Feature Selection will be evaluated together with result that not use Feature Selection. Accuracy Classification Feature Selection CFS result accuracy of 90.83%, GR and CHI of 91.25% and data that not use Feature Selection of 91.67%. The result of testing indicate that permission and broadcast receiver can be used in classyfing type of Malware, but the Feature Selection method that used have accuracy is a little below the data that are not using Feature Selection. Keywords: Classification Android Malware, Feature Selection, SVM and Multi Class SVM one against one

Author(s):  
Akshya Yadav ◽  
Imlikumla Jamir ◽  
Raj Rajeshwari Jain ◽  
Mayank Sohani

Cancer has been characterized as one of the leading diseases that cause death in humans. Breast cancer, being a subtype of cancer, causes death in one out of every eight women worldwide. The solution to counter this is by conducting early and accurate diagnosis for faster treatment. To achieve such accuracy in a short span of time proves difficult with existing techniques. Also, the medical tests conducted in hospitals for detecting cancer is expensive and is difficult for any common man to afford. To counter these problems, in this paper, we use the concept of applying Support Vector machine a Machine Learning algorithm to predict whether a person is prone to breast cancer. We evaluate the performance of this algorithm by calculating its accuracy and apply a min-max scaling method so as to counter and overcome the problem of overfitting and outliers. After scaling of the dataset, we apply a feature selection method called Principle component analysis to improve the algorithms accuracy by decreasing the number of parameters. The final algorithm has improved accuracy with the absence of overfitting and outliers, thus this algorithm can be used to develop and build systems that can be deployed in clinics, hospitals and medical centers for early and quick diagnosis of breast cancer. The training dataset is from the University of Wisconsin (UCI) Machine Learning Repository which is used to evaluate the performance of the Support vector machine by calculating its accuracy.


Author(s):  
Gang Liu ◽  
Chunlei Yang ◽  
Sen Liu ◽  
Chunbao Xiao ◽  
Bin Song

A feature selection method based on mutual information and support vector machine (SVM) is proposed in order to eliminate redundant feature and improve classification accuracy. First, local correlation between features and overall correlation is calculated by mutual information. The correlation reflects the information inclusion relationship between features, so the features are evaluated and redundant features are eliminated with analyzing the correlation. Subsequently, the concept of mean impact value (MIV) is defined and the influence degree of input variables on output variables for SVM network based on MIV is calculated. The importance weights of the features described with MIV are sorted by descending order. Finally, the SVM classifier is used to implement feature selection according to the classification accuracy of feature combination which takes MIV order of feature as a reference. The simulation experiments are carried out with three standard data sets of UCI, and the results show that this method can not only effectively reduce the feature dimension and high classification accuracy, but also ensure good robustness.


2010 ◽  
Vol 30 (4) ◽  
pp. 993-996 ◽  
Author(s):  
Juan-ying XIE ◽  
Chun-xia WANG ◽  
Shuai JIANG ◽  
Yan ZHANG

Author(s):  
Nina Zhou ◽  
Lipo Wang

This chapter introduces an approach to class-dependent feature selection and a novel support vector machine (SVM). The relative background and theory are presented for describing the proposed method, and real applications of the method on several biomedical datasets are demonstrated in the end. The authors hope this chapter can provide readers a different view of feature selection method and also the classifier so as to promote more promising methods and applications.


2022 ◽  
Vol 65 (1) ◽  
pp. 75-86
Author(s):  
Parth C. Upadhyay ◽  
John A. Lory ◽  
Guilherme N. DeSouza ◽  
Timotius A. P. Lagaunne ◽  
Christine M. Spinka

HighlightsA machine learning framework estimated residue cover in RGB images taken at three resolutions from 88 locations.The best results primarily used texture features, the RFE-SVM feature selection method, and the SVM classifier.Accounting for shadows and plants plus modifying and optimizing the texture features may improve performance.An automated system developed using machine learning is a viable strategy to estimate residue cover from RGB images obtained with handheld or UAV platforms.Abstract. Maintaining plant residue on the soil surface contributes to sustainable cultivation of arable land. Applying machine learning methods to RGB images of residue could overcome the subjectivity of manual methods. The objectives of this study were to use supervised machine learning while identifying the best feature selection method, the best classifier, and the most effective image feature types for classifying residue levels in RGB imagery. Imagery was collected from 88 locations in 40 row-crop fields in five Missouri counties between early May and late June in 2018 and 2019 using a tripod-mounted camera (0.014 cm pixel-1 ground sampling distance, GSD) and an unmanned aerial vehicle (UAV, 0.05 and 0.14 GSD). At each field location, 50 contiguous 0.3 × 0.2 m region of interest (ROI) images were extracted from the imagery, resulting in a dataset of 4,400 ROI images at each GSD. Residue percentages for ground truth were estimated using a bullseye grid method (n = 100 points) based on the 0.014 GSD images. Representative color, texture, and shape features were extracted and evaluated using four feature selection methods and two classifiers. Recursive feature elimination using support vector machine (RFE-SVM) was the best feature selection method, and the SVM classifier performed best for classifying the amount of residue as a three-class problem. The best features for this application were associated with texture, with local binary pattern (LBP) features being the most prevalent for all three GSDs. Shape features were irrelevant. The three residue classes were correctly identified with 88%, 84%, and 81% 10-fold cross-validation scores for the 2018 training data and 81%, 69%, and 65% accuracy for the 2019 testing data in decreasing resolution order. Converting image-wise data (0.014 GSD) to location residue estimates using a Bayesian model showed good agreement with the location-based ground truth (r2 = 0.90). This initial assessment documents the use of RGB images to match other methods of estimating residue, with potential to replace or be used as a quality control for line-transect assessments. Keywords: Feature selection, Soil erosion, Support vector machine, Texture features, Unmanned aerial vehicle.


Sign in / Sign up

Export Citation Format

Share Document