Comparison of Random Forest and Support Vector Machine for Indonesian Tweet Complaint Classification
In digital business, the managerial commonly need to process text so that it can be used to support decision-making. The number of text documents contained ideas and opinions is progressing and challenging to understand one by one. Whereas if the data are processed and correctly rendered using machine learning, it can present a general overview of a particular case, organization, or object quickly. Numerous researches have been accomplished in this research area, nevertheless, most of the studies concentrated on English text classification. Every language has various techniques or methods to classify text depending on the characteristics of its grammar. The result of classification among languages may be different even though it used the same algorithm. Given the greatness of text classification, text classification algorithms that can be implemented is the support vector machine (SVM) and Random Forest (RF). Based on the background above, this research is aimed to find out the performance of support vector machine algorithm and random forest in classification of Indonesian text. 1. Result of SVM classifier with cross validation k-10 is derived the best accuracy with value 0.9648, however, it spends computational time as long as 40.118 second. Then, result of RF classifier with values, i.e. 'bootstrap': False, 'min_samples_leaf': 1, 'n_estimators': 10, 'min_samples_split': 3, 'criterion': 'entropy', 'max_features': 3, 'max_depth': None is achieved accuracy is 0.9561 and computational time 109.399 second.