scholarly journals Multi-Class Support Vector Machine via Maximizing Multi-Class Margins

Author(s):  
Jie Xu ◽  
Xianglong Liu ◽  
Zhouyuan Huo ◽  
Cheng Deng ◽  
Feiping Nie ◽  
...  

Support Vector Machine (SVM) is originally proposed as a binary classification model, and it has already achieved great success in different applications. In reality, it is more often to solve a problem which has more than two classes. So, it is natural to extend SVM to a multi-class classifier. There have been many works proposed to construct a multi-class classifier based on binary SVM, such as one versus all strategy, one versus one strategy and Weston's multi-class SVM. One versus all strategy and one versus one strategy split the multi-class problem to multiple binary classification subproblems, and we need to train multiple binary classifiers. Weston's multi-class SVM is formed by ensuring risk constraints and imposing a specific regularization, like Frobenius norm. It is not derived by maximizing the margin between hyperplane and training data which is the motivation in SVM. In this paper, we propose a multi-class SVM model from the perspective of maximizing margin between training points and hyperplane, and analyze the relation between our model and other related methods. In the experiment, it shows that our model can get better or compared results when comparing with other related methods.

Author(s):  
Noviah Dwi Putranti ◽  
Edi Winarko

AbstrakAnalisis sentimen dalam penelitian ini merupakan proses klasifikasi dokumen tekstual ke dalam dua kelas, yaitu kelas sentimen positif dan negatif.  Data opini diperoleh dari jejaring sosial Twitter berdasarkan query dalam Bahasa Indonesia. Penelitian ini bertujuan untuk menentukan sentimen publik terhadap objek tertentu yang disampaikan di Twitter dalam bahasa Indonesia, sehingga membantu usaha untuk melakukan riset pasar atas opini publik. Data yang sudah terkumpul dilakukan proses preprocessing dan POS tagger untuk menghasilkan model klasifikasi melalui proses pelatihan. Teknik pengumpulan kata yang memiliki sentimen dilakukan dengan pendekatan berdasarkan kamus, yang dihasilkan dalam penelitian ini berjumlah 18.069 kata. Algoritma Maximum Entropy digunakan untuk POS tagger dan algoritma yang digunakan untuk membangun model klasifikasi atas data pelatihan dalam penelitian ini adalah Support Vector Machine. Fitur yang digunakan adalah unigram dengan fitur pembobotan TFIDF. Implementasi klasifikasi diperoleh akurasi 86,81 %  pada pengujian 7 fold cross validation untuk tipe kernel Sigmoid. Pelabelan kelas secara manual dengan POS tagger menghasilkan akurasi 81,67%.  Kata kunci—analisis sentimen, klasifikasi, maximum entropy POS tagger, support vector machine, twitter.  AbstractSentiment analysis in this research classified textual documents into two classes, positive and negative sentiment. Opinion data obtained a query from social networking site Twitter of Indonesian tweet. This research uses  Indonesian tweets. This study aims to determine public sentiment toward a particular object presented in Twitter businesses conduct market. Collected data then prepocessed to help POS tagged to generate classification models through the training process. Sentiment word collection has done the dictionary based approach, which is generated in this study consists 18.069 words. Maximum Entropy algorithm is used for POS tagger and the algorithms used to build the classification model on the training data is Support Vector Machine. The unigram features used are the features of TFIDF weighting.Classification implementation 86,81 % accuration at examination of 7 validation cross fold for the type of kernel of Sigmoid. Class labeling manually with POS tagger yield accuration 81,67 %. Keywords—sentiment analysis, classification, maximum entropy POS tagger, support vector machine, twitter.


Molecules ◽  
2020 ◽  
Vol 25 (6) ◽  
pp. 1442 ◽  
Author(s):  
Tao Shen ◽  
Hong Yu ◽  
Yuan-Zhong Wang

Gentiana, which is one of the largest genera of Gentianoideae, most of which had potential pharmaceutical value, and applied to local traditional medical treatment. Because of the phytochemical diversity and difference of bioactive compounds among species, which makes it crucial to accurately identify authentic Gentiana species. In this paper, the feasibility of using the infrared spectroscopy technique combined with chemometrics analysis to identify Gentiana and its related species was studied. A total of 180 batches of raw spectral fingerprints were obtained from 18 species of Gentiana and Tripterospermum by near-infrared (NIR: 10,000–4000 cm−1) and Fourier transform mid-infrared (MIR: 4000–600 cm−1) spectrum. Firstly, principal component analysis (PCA) was utilized to explore the natural grouping of the 180 samples. Secondly, random forests (RF), support vector machine (SVM), and K-nearest neighbors (KNN) models were built while using full spectra (including 1487 NIR variables and 1214 FT-MIR variables, respectively). The MIR-SVM model had a higher classification accuracy rate than the other models that were based on the results of the calibration sets and prediction sets. The five feature selection strategies, VIP (variable importance in the projection), Boruta, GARF (genetic algorithm combined with random forest), GASVM (genetic algorithm combined with support vector machine), and Venn diagram calculation, were used to reduce the dimensions of the data variable in order to further reduce numbers of variables for modeling. Finally, 101 NIR and 73 FT-MIR bands were selected as the feature variables, respectively. Thirdly, stacking models were built based on the optimal spectral dataset. Most of the stacking models performed better than the full spectra-based models. RF and SVM (as base learners), combined with the SVM meta-classifier, was the optimal stacked generalization strategy. For the SG-Ven-MIR-SVM model, the accuracy (ACC) of the calibration set and validation set were both 100%. Sensitivity (SE), specificity (SP), efficiency (EFF), Matthews correlation coefficient (MCC), and Cohen’s kappa coefficient (K) were all 1, which showed that the model had the optimal authenticity identification performance. Those parameters indicated that stacked generalization combined with feature selection is probably an important technique for improving the classification model predictive accuracy and avoid overfitting. The study result can provide a valuable reference for the safety and effectiveness of the clinical application of medicinal Gentiana.


2011 ◽  
Vol 3 ◽  
pp. BECB.S7503 ◽  
Author(s):  
Sangeetha Subramaniam ◽  
Monica Mehrotra ◽  
Dinesh Gupta

There is an urgent need to develop novel anti-malarials in view of the increasing disease burden and growing resistance of the currently used drugs against the malarial parasites. Proliferation inhibitors targeting P. falciparum intraerythrocytic cycle are one of the important classes of compounds being explored for its potential to be novel antimalarials. Support Vector Machine (SVM) based model developed by us can facilitate rapid screening of large and diverse chemical libraries by reducing false hits and prioritising compounds before setting up expensive High Throughput Screening experiment. The SVM model, trained with molecular descriptors of proliferation inhibitors and non-inhibitors, displayed a satisfactory performance on cross validations and independent data set, with an average accuracy of 83% and AUC of 0.88. Intriguingly, the method displayed remarkable accuracy for the recently submitted P. falciparum whole cell screening datasets. The method also predicted several inhibitors in the National Cancer Institute diversity set, mostly similar to the known inhibitors.


2017 ◽  
Vol 2017 ◽  
pp. 1-8 ◽  
Author(s):  
Yun-xiao Lou ◽  
Xian-shu Fu ◽  
Xiao-ping Yu ◽  
Zi-hong Ye ◽  
Hai-feng Cui ◽  
...  

This paper focused on an effective method to discriminate the geographical origin of Wuyi-Rock tea by the stable isotope ratio (SIR) and metallic element profiling (MEP) combined with support vector machine (SVM) analysis. Wuyi-Rock tea (n=99) collected from nine producing areas and non-Wuyi-Rock tea (n=33) from eleven nonproducing areas were analysed for SIR and MEP by established methods. The SVM model based on coupled data produced the best prediction accuracy (0.9773). This prediction shows that instrumental methods combined with a classification model can provide an effective and stable tool for provenance discrimination. Moreover, every feature variable in stable isotope and metallic element data was ranked by its contribution to the model. The results show that δ2H, δ18O, Cs, Cu, Ca, and Rb contents are significant indications for provenance discrimination and not all of the metallic elements improve the prediction accuracy of the SVM model.


Author(s):  
Zhenhua Li ◽  
Junjie Cheng ◽  
A. Abu-Siada

Background: Winding deformation is one of the most common faults that an operating power transformer experiences over its operational life. Thus it is essential to detect and rectify such faults at early stages to avoid potential catastrophic consequences to the transformer. At present, methods published in the literature for transformer winding fault diagnosis are mainly focused on identifying fault type and quantifying its extent without giving much attention to the identification of fault location. Methods: This paper presents a method based on a genetic algorithm and support vector machine (GA-SVM) to improve the faults’ classification of power transformers in terms of type and location. In this regard, a sinusoidal sweep signal in the frequency range of 600 kHz to 1MHz is applied to one terminal of the transformer winding. A mathematical index of the induced current at the head and end of the transformer winding under various fault conditions is used to extract unique features that are fed to a support vector machine (SVM) model for training. Parameters of the SVM model are optimized using a genetic algorithm (GA). Results : The effectiveness of mathematical indicators to extract fault type characteristics and the proposed fault classification model for fault diagnosis is demonstrated through extensive simulation analysis for various transformer winding faults at different locations. Conclusion : The proposed model can effectively identify different fault types and determine their location within the transformer winding, and the diagnostic rate of the fault type and fault location are 100% and 90%, respectively.


Author(s):  
Zida Ziyan Azkiya ◽  
Fatma Indriani ◽  
Heru Kartika Chandra

Abstrak— Pada kasus deteksi penderita penyakit demam berdarah (Dengue Hemorrhagic Fever- DHF), data training yang tersedia umumnya hanya data pasien penderita positif. Sedangkan data orang normal (data negatif) tidak tersedia secara khusus. Pada makalah ini dipaparkan pembangunan model klasifikasi untuk deteksi DHF dengan pendekatan One Class Classification (OCC). Data yang digunakan pada penelitian ini adalah hasil uji darah dari laboratorium dari pasien penderita penyakit demam berdarah. Metode yang diteliti adalah One-class Support Vector Machine dan K-Means. Hasil yang diperoleh pada penelitian ini adalah untuk metode SVM memiliki nilai precision = 1,0, recall = 0,993, f-1 score = 0,997, dan tingkat akurasi sebesar 99,7%  sedangkan dengan metode K-Means diperoleh nilai precision = 0,901, recall = 0,973, f-1 score = 0,936, dan tingkat akurasi sebesar 93,3%. Hal ini  menunjukkan bahwa metode SVM sedikit lebih unggul dibandingkan dengan K-Means untuk kasus ini. Kata Kunci— demam berdarah, Dengue Hemorrhagic Fever, K-Means, One Class Classification, OSVMAbstract— Two class classification problem maps input into two target classes. In certain cases, training data is available only in the form of a single class, as in the case of Dengue Hemorrhagic Fever (DHF) patients, where only data of positive patients is available. In this paper, we report our experiment in building a classification model for detecting DHF infection using One Class Classification (OCC) approach. Data from this study is sourced from laboratory tests of patients with dengue fever. The OCC methods compared are One-Class Support Vector Machine and One-Class K-Means. The result shows SVM method obtained precision value = 1.0, recall = 0.993, f-1 score = 0.997, and accuracy of 99.7% while the K-Means method obtained precision value = 0.901, recall = 0.973, f- 1 score = 0.936, and accuracy of 93.3%. This indicates that the SVM method is slightly superior to K-Means for One-Class Classification of DHF patients. Keywords— Dengue Hemorrhagic Fever, K-Means, One Class Classification, OSVM


2020 ◽  
Vol 10 (11) ◽  
pp. 2628-2633 ◽  
Author(s):  
A. Sheryl Oliver ◽  
M. Anuradha ◽  
J. Jean Justus ◽  
Kiranmai Bellam ◽  
T. Jayasankar

Lung cancer is a serious illness affects people all over the globe. To increase the survival rate of patients affected by lung cancer, in advance recognition of lung cancer with effective treatments is important. This study introduces a new deep learning (DL) based feature extraction and classification technique for CT lung images. A DL model using Coding Network (CN) is presented for the extraction of high-level features and classical features. Initially, the convolution neural network is trained as a coding network and the actual pixels are coded into feature vectors for representing the high-level concepts for classification. Next, an extraction of chosen classical features takes place depending upon background knowledge of lung CT images. In addition, an automatic feature fusion takes place to avoid annoying parameter choice. Besides, support vector machine (SVM) model is employed for classify CT lung images in an effective way. For experimentation, a benchmark dataset is utilized to appraise the outcome of the presented CN-SVM model and is validated under several dimensions.


2018 ◽  
Vol 1 (1) ◽  
pp. 120-130 ◽  
Author(s):  
Chunxiang Qian ◽  
Wence Kang ◽  
Hao Ling ◽  
Hua Dong ◽  
Chengyao Liang ◽  
...  

Support Vector Machine (SVM) model optimized by K-Fold cross-validation was built to predict and evaluate the degradation of concrete strength in a complicated marine environment. Meanwhile, several mathematical models, such as Artificial Neural Network (ANN) and Decision Tree (DT), were also built and compared with SVM to determine which one could make the most accurate predictions. The material factors and environmental factors that influence the results were considered. The materials factors mainly involved the original concrete strength, the amount of cement replaced by fly ash and slag. The environmental factors consisted of the concentration of Mg2+, SO42-, Cl-, temperature and exposing time. It was concluded from the prediction results that the optimized SVM model appeared to perform better than other models in predicting the concrete strength. Based on SVM model, a simulation method of variables limitation was used to determine the sensitivity of various factors and the influence degree of these factors on the degradation of concrete strength.


2020 ◽  
Vol 15 ◽  
Author(s):  
Chun Qiu ◽  
Sai Li ◽  
Shenghui Yang ◽  
Lin Wang ◽  
Aihui Zeng ◽  
...  

Aim: To search the genes related to the mechanisms of the occurrence of glioma and to try to build a prediction model for glioblastomas. Background: The morbidity and mortality of glioblastomas are very high, which seriously endangers human health. At present, the goals of many investigations on gliomas are mainly to understand the cause and mechanism of these tumors at the molecular level and to explore clinical diagnosis and treatment methods. However, there is no effective early diagnosis method for this disease, and there are no effective prevention, diagnosis or treatment measures. Methods: First, the gene expression profiles derived from GEO were downloaded. Then, differentially expressed genes (DEGs) in the disease samples and the control samples were identified. After that, GO and KEGG enrichment analyses of DEGs were performed by DAVID. Furthermore, the correlation-based feature subset (CFS) method was applied to the selection of key DEGs. In addition, the classification model between the glioblastoma samples and the controls was built by an Support Vector Machine (SVM) based on selected key genes. Results and Discussion: Thirty-six DEGs, including 17 upregulated and 19 downregulated genes, were selected as the feature genes to build the classification model between the glioma samples and the control samples by the CFS method. The accuracy of the classification model by using a 10-fold cross-validation test and independent set test was 76.25% and 70.3%, respectively. In addition, PPP2R2B and CYBB can also be found in the top 5 hub genes screened by the protein– protein interaction (PPI) network. Conclusions: This study indicated that the CFS method is a useful tool to identify key genes in glioblastomas. In addition, we also predicted that genes such as PPP2R2B and CYBB might be potential biomarkers for the diagnosis of glioblastomas.


Sign in / Sign up

Export Citation Format

Share Document