scholarly journals KLASIFIKASI TEKS MENGGUNAKAN CHI SQUARE FEATURE SELECTION UNTUK MENENTUKAN KOMIK BERDASARKAN PERIODE, MATERI DAN FISIKDENGAN ALGORITMA NAIVEBAYES

Compiler ◽  
2016 ◽  
Vol 5 (2) ◽  
Author(s):  
Siti Anisah ◽  
Anton Setiawan Honggowibowo ◽  
Asih Pujiastuti

A comic has its own characteristics compared the other types of books. The difference between comic and other books can be seen from the category o f period, material and physical. Comicand other booksneeded an application o f classification system. Looking for the problem, classification system was made using Chi Square Feature Selection and Naive Bayes algorithm to determine the comic based on the period, material and physical. Delphi programming language and Oracle Database are used to build the Classification System. Chi Square Feature Selection acquired trait a comic is in 0.10347 and which not comic is in 1.9531. Furthermore, data is classified by the Naive Bayes algorithm. From 120 titles o f comic that consists 60 titles o f comic and non comicused to build classesfor trainand 60 titles o f comic and non comic used to test. The results o f Naive Bayesalgorithm for comic is 96,67%with 3.33% error rate, and non comic is 90% with 10% error rate. The classification to determine comic is good.

2020 ◽  
Vol 4 (3) ◽  
pp. 504-512
Author(s):  
Faried Zamachsari ◽  
Gabriel Vangeran Saragih ◽  
Susafa'ati ◽  
Windu Gata

The decision to move Indonesia's capital city to East Kalimantan received mixed responses on social media. When the poverty rate is still high and the country's finances are difficult to be a factor in disapproval of the relocation of the national capital. Twitter as one of the popular social media, is used by the public to express these opinions. How is the tendency of community responses related to the move of the National Capital and how to do public opinion sentiment analysis related to the move of the National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine to get the highest accuracy value is the goal in this study. Sentiment analysis data will take from public opinion using Indonesian from Twitter social media tweets in a crawling manner. Search words used are #IbuKotaBaru and #PindahIbuKota. The stages of the research consisted of collecting data through social media Twitter, polarity, preprocessing consisting of the process of transform case, cleansing, tokenizing, filtering and stemming. The use of feature selection to increase the accuracy value will then enter the ratio that has been determined to be used by data testing and training. The next step is the comparison between the Support Vector Machine and Naive Bayes methods to determine which method is more accurate. In the data period above it was found 24.26% positive sentiment 75.74% negative sentiment related to the move of a new capital city. Accuracy results using Rapid Miner software, the best accuracy value of Naive Bayes with Feature Selection is at a ratio of 9:1 with an accuracy of 88.24% while the best accuracy results Support Vector Machine with Feature Selection is at a ratio of 5:5 with an accuracy of 78.77%.


2021 ◽  
Vol 5 (3) ◽  
pp. 527-533
Author(s):  
Yoga Religia ◽  
Amali Amali

The quality of an airline's services cannot be measured from the company's point of view, but must be seen from the point of view of customer satisfaction. Data mining techniques make it possible to predict airline customer satisfaction with a classification model. The Naïve Bayes algorithm has demonstrated outstanding classification accuracy, but currently independent assumptions are rarely discussed. Some literature suggests the use of attribute weighting to reduce independent assumptions, which can be done using particle swarm optimization (PSO) and genetic algorithm (GA) through feature selection. This study conducted a comparison of PSO and GA optimization on Naïve Bayes for the classification of Airline Passenger Satisfaction data taken from www.kaggle.com. After testing, the best performance is obtained from the model formed, namely the classification of Airline Passenger Satisfaction data using the Naïve Bayes algorithm with PSO optimization, where the accuracy value is 86.13%, the precision value is 87.90%, the recall value is 87.29%, and the value is AUC of 0.923.


2020 ◽  
Vol 4 (1) ◽  
pp. 76-85
Author(s):  
Dwi Yuni Utami ◽  
Elah Nurlelah ◽  
Noer Hikmah

Liver disease is an inflammatory disease of the liver and can cause the liver to be unable to function as usual and even cause death. According to WHO (World Health Organization) data, almost 1.2 million people per year, especially in Southeast Asia and Africa, have died from liver disease. The problem that usually occurs is the difficulty of recognizing liver disease early on, even when the disease has spread. This study aims to compare and evaluate Naive Bayes algorithm as a selected algorithm and Naive Bayes algorithm based on Genetic Algorithm (GA) and Bagging to find out which algorithm has a higher accuracy in predicting liver disease by processing a dataset taken from the UCI Machine Learning Repository database (GA). University of California Invene). From the results of testing by evaluating both the confusion matrix and the ROC curve, it was proven that the testing carried out by the Naive Bayes Optimization algorithm using Algortima Genetics and Bagging has a higher accuracy value than only using the Naive Bayes algorithm. The accuracy value for the Naive Bayes algorithm model is 66.66% and the accuracy value for the Naive Bayes model with attribute selection using Genetic Algorithms and Bagging is 72.02%. Based on this value, the difference in accuracy is 5.36%.Keywords: Liver Disease, Naïve Bayes, Genetic Agorithms, Bagging.


Author(s):  
Son Doan ◽  
◽  
Susumu Horiguchi ◽  

Text categorization involves assigning a natural language document to one or more predefined classes. One of the most interesting issues is feature selection. We propose an approach using multicriteria ranking of eatures, a new procedure for feature selection, and apply these to text categorization. Experimental results dealing with Reuters-21578 and 20Newsgroups benchmark data and the naive Bayes algorithm show that our proposal outperforms conventional feature selection in text categorization performance.


Author(s):  
Irfan Santiko ◽  
Ikhsan Honggo

Chronic kidney disease is a disease that can cause death, because the pathophysiological etiology resulting in a progressive decline in renal function, and ends in kidney failure. Chronic Kidney Disease (CKD) has now become a serious problem in the world. Kidney and urinary tract diseases have caused the death of 850,000 people each year. This suggests that the disease was ranked the 12th highest mortality rate. Some studies in the field of health including one with chronic kidney disease have been carried out to detect the disease early, In this study, testing the Naive Bayes algorithm to detect the disease on patients who tested positive for negative CKD and CKD. From the results of the test algorithm accuracy value will be compared against the results of the algorithm accuracy before use and after feature selection using feature selection Featured Correlation Based Selection (CFS), it is known that Naive Bayes algorithm after feature selection that is 93.58%, while the naive Bayes without feature selection the result is 93.54% accuracy. Seeing the value of a second accuracy testing Naive Bayes algorithm without using the feature selection and feature selection, testing both these algorithms including the classification is very good, because the accuracy value above 0.90 to 1.00. Included in the excellent classification. higher accuracy results.


Author(s):  
Oman Somantri ◽  
Dyah Apriliani

<p>Conducting an assessment of consumer sentiments taken from social media in assessing a culinary food gives useful information for everyone who wants to get this information especially for migrants and tourists, in th other hand that information is very valuable for food stall and restaurant owners as information in improvinf food quality. Overcoming this problem, a sentiment analysis classification model using naïve bayes algorithm (NB) was applied to get this information. This problem occurs is the level of accuracy of classification of consumer ratings of culinary food is still not optimal because the weight of values in the data preprocessing process are not optimal. In this paper proposed a hybrid feature selection models to overcome the problems in the process of selecting the feature attributes that have not been optimal by using a combination of information gain (IG) and genetic algorithm (GA) algorithms. The result of this research showed that after the experiment and compared to using others algorithms produce the best of the level occuracy is 93%.</p>


2018 ◽  
Vol 2 (3) ◽  
pp. 153 ◽  
Author(s):  
Muhammad Firman Aji Saputra ◽  
Triyanna Widiyaningtyas ◽  
Aji Prasetya Wibawa

Illiteracy is an inability to recognize characters, both in order to read and write. It is a significant problem for countries all around the world including Indonesia. In Indonesia, illiteracy rate is generally set as an indicator to see whether or not education in Indonesia is successful. If this problem is not going to be overcome, it will affect people’s prosperity. One system that has been used to overcome this problem is prioritizing the treatment from areas with the highest illiteracy rate and followed by areas with lower illiteracy rate. The method is going to be a way easier to be applied if it is supported by classification process. Since the classification process needs a class, and there has not been any fine classification of illiteracy rate, there is needed a clustering process before classification process. This research is aimed to get optimal number of classes through clustering process and know the result of illiteracy classification process. The clustering process is conducted by using k means algorithm, and for the classification process is conducted by using Naïve Bayes algorithm. The testing method used to assess the success of classification process is 10-fold method. Based on the research result, it can be concluded that the optimal illiteracy classes are three classes with the classification accuracy value of 96.4912% and error rate value of 3.5088%. Whereas the classification with two classes get the accuracy value of 93.8596% and error rate value of 6.1404%. And for the classification with five classes get the accuracy value of 90.3509% and error rate value of 9.6491%.


Author(s):  
Abi Rafdi ◽  
Herman Mawengkang Herman ◽  
Syahril Efendi

This study analyzes Sentiment to see opinions, points of view, judgments, attitudes, and emotions towards creatures and aspects expressed through texts. One of Social Media is like Twitter is one of the most widely used means of communication as a research topic. The main problem with sentiment analysis is voting and using the best feature options for maximum results. Either, the most widely known classification method is Naive Bayes. However, Naive Bayes is very sensitive to significant features. That way, in this test, a comparison of feature selection is carried out using Particle Swarm Optimization and Genetic Algorithm to improve the accuracy performance of the Naive Bayes algorithm. Analyses are performed by comparing before and after testing using feature selection. Validation uses a cross-validation technique, while the confusion matrix ??is appealed to measure accuracy. The results showed the highest increase for Naïve Bayes algorithm accuracy when using the feature selection of the Particle Swarm Optimization Algorithm from 60.26% to 77.50%, while the genetic algorithm from 60.26% to 70.71%. Therefore, the choice of the best characteristics is Particle Swarm Optimization which is superior with an increase in accuracy of 17.24%.


Sign in / Sign up

Export Citation Format

Share Document