Binary Spectrum Feature for Improved Classifier Performance

Author(s):  
Nalika Ulapane ◽  
Karthick Thiyagarajan ◽  
sarath kodagoda

<div>Classification has become a vital task in modern machine learning and Artificial Intelligence applications, including smart sensing. Numerous machine learning techniques are available to perform classification. Similarly, numerous practices, such as feature selection (i.e., selection of a subset of descriptor variables that optimally describe the output), are available to improve classifier performance. In this paper, we consider the case of a given supervised learning classification task that has to be performed making use of continuous-valued features. It is assumed that an optimal subset of features has already been selected. Therefore, no further feature reduction, or feature addition, is to be carried out. Then, we attempt to improve the classification performance by passing the given feature set through a transformation that produces a new feature set which we have named the “Binary Spectrum”. Via a case study example done on some Pulsed Eddy Current sensor data captured from an infrastructure monitoring task, we demonstrate how the classification accuracy of a Support Vector Machine (SVM) classifier increases through the use of this Binary Spectrum feature, indicating the feature transformation’s potential for broader usage.</div><div><br></div>

2020 ◽  
Author(s):  
Nalika Ulapane ◽  
Karthick Thiyagarajan ◽  
sarath kodagoda

<div>Classification has become a vital task in modern machine learning and Artificial Intelligence applications, including smart sensing. Numerous machine learning techniques are available to perform classification. Similarly, numerous practices, such as feature selection (i.e., selection of a subset of descriptor variables that optimally describe the output), are available to improve classifier performance. In this paper, we consider the case of a given supervised learning classification task that has to be performed making use of continuous-valued features. It is assumed that an optimal subset of features has already been selected. Therefore, no further feature reduction, or feature addition, is to be carried out. Then, we attempt to improve the classification performance by passing the given feature set through a transformation that produces a new feature set which we have named the “Binary Spectrum”. Via a case study example done on some Pulsed Eddy Current sensor data captured from an infrastructure monitoring task, we demonstrate how the classification accuracy of a Support Vector Machine (SVM) classifier increases through the use of this Binary Spectrum feature, indicating the feature transformation’s potential for broader usage.</div><div><br></div>


The online discussion forums and blogs are very vibrant platforms for cancer patients to express their views in the form of stories. These stories sometimes become a source of inspiration for some patients who are anxious in searching the similar cases. This paper proposes a method using natural language processing and machine learning to analyze unstructured texts accumulated from patient’s reviews and stories. The proposed methodology aims to identify behavior, emotions, side-effects, decisions and demographics associated with the cancer victims. The pre-processing phase of our work involves extraction of web text followed by text-cleaning where some special characters and symbols are omitted, and finally tagging the texts using NLTK’s (Natural Language Toolkit) POS (Parts of Speech) Tagger. The post-processing phase performs training of seven machine learning classifiers (refer Table 6). The Decision Tree classifier shows the higher precision (0.83) among the other classifiers while, the Area under the operating Characteristics (AUC) for Support Vector Machine (SVM) classifier is highest (0.98).


Author(s):  
Mokhtar Al-Suhaiqi ◽  
Muneer A. S. Hazaa ◽  
Mohammed Albared

Due to rapid growth of research articles in various languages, cross-lingual plagiarism detection problem has received increasing interest in recent years. Cross-lingual plagiarism detection is more challenging task than monolingual plagiarism detection. This paper addresses the problem of cross-lingual plagiarism detection (CLPD) by proposing a method that combines keyphrases extraction, monolingual detection methods and machine learning approach. The research methodology used in this study has facilitated to accomplish the objectives in terms of designing, developing, and implementing an efficient Arabic – English cross lingual plagiarism detection. This paper empirically evaluates five different monolingual plagiarism detection methods namely i)N-Grams Similarity, ii)Longest Common Subsequence, iii)Dice Coefficient, iv)Fingerprint based Jaccard Similarity  and v) Fingerprint based Containment Similarity. In addition, three machine learning approaches namely i) naïve Bayes, ii) Support Vector Machine, and iii) linear logistic regression classifiers are used for Arabic-English Cross-language plagiarism detection. Several experiments are conducted to evaluate the performance of the key phrases extraction methods. In addition, Several experiments to investigate the performance of machine learning techniques to find the best method for Arabic-English Cross-language plagiarism detection. According to the experiments of Arabic-English Cross-language plagiarism detection, the highest result was obtained using SVM   classifier with 92% f-measure. In addition, the highest results were obtained by all classifiers are achieved, when most of the monolingual plagiarism detection methods are used. 


2022 ◽  
Vol 2022 ◽  
pp. 1-9
Author(s):  
Channabasava Chola ◽  
J. V. Bibal Benifa ◽  
D. S. Guru ◽  
Abdullah Y. Muaad ◽  
J. Hanumanthappa ◽  
...  

Drosophila melanogaster is an important genetic model organism used extensively in medical and biological studies. About 61% of known human genes have a recognizable match with the genetic code of Drosophila flies, and 50% of fly protein sequences have mammalian analogues. Recently, several investigations have been conducted in Drosophila to study the functions of specific genes exist in the central nervous system, heart, liver, and kidney. The outcomes of the research in Drosophila are also used as a unique tool to study human-related diseases. This article presents a novel automated system to classify the gender of Drosophila flies obtained through microscopic images (ventral view). The proposed system takes an image as input and converts it into grayscale illustration to extract the texture features from the image. Then, machine learning (ML) classifiers such as support vector machines (SVM), Naive Bayes (NB), and K -nearest neighbour (KNN) are used to classify the Drosophila as male or female. The proposed model is evaluated using the real microscopic image dataset, and the results show that the accuracy of the KNN is 90%, which is higher than the accuracy of the SVM classifier.


The advancement in cyber-attack technologies have ushered in various new attacks which are difficult to detect using traditional intrusion detection systems (IDS).Existing IDS are trained to detect known patterns because of which newer attacks bypass the current IDS and go undetected. In this paper, a two level framework is proposed which can be used to detect unknown new attacks using machine learning techniques. In the first level the known types of classes for attacks are determined using supervised machine learning algorithms such as Support Vector Machine (SVM) and Neural networks (NN). The second level uses unsupervised machine learning algorithms such as K-means. The experimentation is carried out with four models with NSL- KDD dataset in Openstack cloud environment. The Model with Support Vector Machine for supervised machine learning, Gradual Feature Reduction (GFR) for feature selection and K-means for unsupervised algorithm provided the optimum efficiency of 94.56 %.


Author(s):  
Abdulrahman A. Alshdadi ◽  
Ahmed S. Alghamdi ◽  
Ali Daud ◽  
Saqib Hussain

Web spam is the unwanted request on websites, low-quality backlinks, emails, and reviews which is generated by an automated program. It is the big threat for website owners; because of it, they can lose their top keywords ranking from search engines, which will result in huge financial loss to the business. Over the years, researchers have tried to identify malicious domains based on specific features. However, lighthouse plugin, Ahrefs tool, and social media platforms features are ignored. In this paper, the authors are focused on detection of the spam domain name from a mixture of legit and spam domain name dataset. The dataset is taken from Google webmaster tools. Machine learning models are applied on individual, distributed, and hybrid features, which significantly improved the performance of existing malicious domain machine learning techniques. Better accuracy is achieved for support vector machine (SVM) classifier, as compared to Naïve Bayes, C4.5, AdaBoost, LogitBoost.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Tom Elliot ◽  
Robert Morse ◽  
Duane Smythe ◽  
Ashley Norris

AbstractIt is 50 years since Sieveking et al. published their pioneering research in Nature on the geochemical analysis of artefacts from Neolithic flint mines in southern Britain. In the decades since, geochemical techniques to source stone artefacts have flourished globally, with a renaissance in recent years from new instrumentation, data analysis, and machine learning techniques. Despite the interest over these latter approaches, there has been variation in the quality with which these methods have been applied. Using the case study of flint artefacts and geological samples from England, we present a robust and objective evaluation of three popular techniques, Random Forest, K-Nearest-Neighbour, and Support Vector Machines, and present a pipeline for their appropriate use. When evaluated correctly, the results establish high model classification performance, with Random Forest leading with an average accuracy of 85% (measured through F1 Scores), and with Support Vector Machines following closely. The methodology developed in this paper demonstrates the potential to significantly improve on previous approaches, particularly in removing bias, and providing greater means of evaluation than previously utilised.


Author(s):  
Seyma Kiziltas Koc ◽  
Mustafa Yeniad

Technologies which are used in the healthcare industry are changing rapidly because the technology is evolving to improve people's lifestyles constantly. For instance, different technological devices are used for the diagnosis and treatment of diseases. It has been revealed that diagnosis of disease can be made by computer systems with developing technology.Machine learning algorithms are frequently used tools because of their high performance in the field of health as well as many field. The aim of this study is to investigate different machine learning classification algorithms that can be used in the diagnosis of diabetes and to make comparative analyzes according to the metrics in the literature. In the study, seven classification algorithms were used in the literature. These algorithms are Logistic Regression, K-Nearest Neighbor, Multilayer Perceptron, Random Forest, Decision Trees, Support Vector Machine and Naive Bayes. Firstly, classification performance of algorithms are compared. These comparisons are based on accuracy, sensitivity, precision, and F1-score. The results obtained showed that support vector machine algorithm had the highest accuracy with 78.65%.


2020 ◽  
Author(s):  
Hao Li ◽  
Liqian Cui ◽  
Liping Cao ◽  
Yizhi Zhang ◽  
Yueheng Liu ◽  
...  

Abstract Background: Bipolar disorder (BPD) is a common mood disorder that is often goes misdiagnosed or undiagnosed. Recently, machine learning techniques have been combined with neuroimaging methods to aid in the diagnosis of BPD. However, most studies have focused on the construction of classifiers based on single-modality MRI. Hence, in this study, we aimed to construct a support vector machine (SVM) model using a combination of structural and functional MRI, which could be used to accurately identify patients with BPD.Methods: In total, 44 patients with BPD and 36 healthy controls were enrolled in the study. Clinical evaluation and MRI scans were performed for each subject. Next, image pre-processing, VBM and ReHo analyses were performed. The ReHo values of each subject in the clusters showing significant differences were extracted. Further, LASSO approach was recruited to screen features. Based on selected features, the SVM model was established, and discriminant analysis was performed.Results: After using the two-sample t-test with multiple comparisons, a total of 8 clusters were extracted from the data (VBM = 6; ReHo = 2). Next, we used both VBM and ReHo data to construct the new SVM classifier, which could effectively identify patients with BPD at an accuracy of 87.5% (95%CI: 72.5-95.3%), sensitivity of 86.4% (95%CI: 64.0-96.4%), and specificity of 88.9% (95%CI: 63.9-98.0%) in the test data (p=0.0022). Conclusions: A combination of structural and functional MRI can be of added value in the construction of SVM classifiers to aid in the accurate identification of BPD in the clinic.


2020 ◽  
Vol 17 (9) ◽  
pp. 4219-4222
Author(s):  
ManjulaSri Rayudu ◽  
Srujana Pendam ◽  
Srilaxmi Dasari

All the patients of Type1 and more than 60% of Type2 Diabetes suffer from Diabetic Retinopathy (DR). Diabetic retinopathy causes damage to retina of eye and slowly leads to complete vision loss. The longer the patients are suffering from diabetes the probability of presence of DR is more. Hence diabetic retinopathy is to be identified in early stage to avoid blindness. The objective of this research work is to predict the severity of diabetic retinopathy (Non Proliferated) using machine learning techniques. Proliferated diabetic retinopathy (later stage) is characterized by neovasculature in the retinal veins and is the final stage. Non proliferated DR (earlier stage) is identified by any of the abnormalities out of microaneurysms, Hard exudates and hemorrhages. Then Machine learning techniques are employed to identify the class of DR. The following Classification and regression techniques are employed for categorizing the DR: Gini Diversity Index method, Linear discriminant analysis, Ensemble method with bagged and boosted trees, K-Nearest Neighbor, and Support Vector Machine classification methods. 89 images from DRIVE database (DiaRet DB1) are classified using the machine learning techniques cited above. It is observed the maximum accuracy is achieved as 88.8% with Linear SVM classifier.


Sign in / Sign up

Export Citation Format

Share Document