Visualization Tool for Interpreting User Needs From User-Generated Content via Text Mining and Classification

Author(s):  
Thomas Stone ◽  
Seung-Kyum Choi

The amount of user-generated content related to consumer products continues to grow as users increasingly take advantage of forums, product review sites, and social media platforms. The content is a promising source of insight into users’ needs and experiences. However, the challenge remains as to how concise and useful insights can be extracted from large quantities of unstructured data. We propose a visualization tool which allows designers to quickly and intuitively sift through large amounts of user-generated content and derive useful insights regarding users’ perceptions of product features. The tool leverages machine learning algorithms to automate labor-intensive portions of the process, and no manual labeling is required by the designer. Language processing techniques are arranged in a novel way to guide the designer in selecting the appropriate inputs, and multidimensional scaling enables presentation of the results in concise 2D plots. To demonstrate the efficacy of the tool, a case study is performed on action cameras. Product reviews from Amazon.com are analyzed as the user-generated content. Results from the case study show that the tool is helpful in condensing large amounts of user-generated content into useful insights, such as the key differentiations that users perceive among similar products.

2018 ◽  
Vol 7 (2) ◽  
pp. 20-34
Author(s):  
Eya Boukchina ◽  
Sehl Mellouli ◽  
Emna Menif

Citizens' participation is a form of democracy in which citizens are part of the decision-making process with regard to the development of their society. In today's emergence of Information and Communication Technologies, citizens can participate in these processes by submitting inputs through digital media such as social media platforms or dedicated websites. From these different means, a high quantity of data, of different forms (text, image, video), can be generated. This data needs to be processed in order to extract valuable data that can be used by a city's decision-makers. This paper presents natural language processing techniques to extract valuable information from comments posted by citizens. It applies the Latent Semantic Analysis on a corpus of citizens' comments to automatically identify the subjects that were raised by citizens.


IoT ◽  
2020 ◽  
Vol 1 (2) ◽  
pp. 218-239 ◽  
Author(s):  
Ravikumar Patel ◽  
Kalpdrum Passi

In the derived approach, an analysis is performed on Twitter data for World Cup soccer 2014 held in Brazil to detect the sentiment of the people throughout the world using machine learning techniques. By filtering and analyzing the data using natural language processing techniques, sentiment polarity was calculated based on the emotion words detected in the user tweets. The dataset is normalized to be used by machine learning algorithms and prepared using natural language processing techniques like word tokenization, stemming and lemmatization, part-of-speech (POS) tagger, name entity recognition (NER), and parser to extract emotions for the textual data from each tweet. This approach is implemented using Python programming language and Natural Language Toolkit (NLTK). A derived algorithm extracts emotional words using WordNet with its POS (part-of-speech) for the word in a sentence that has a meaning in the current context, and is assigned sentiment polarity using the SentiWordNet dictionary or using a lexicon-based method. The resultant polarity assigned is further analyzed using naïve Bayes, support vector machine (SVM), K-nearest neighbor (KNN), and random forest machine learning algorithms and visualized on the Weka platform. Naïve Bayes gives the best accuracy of 88.17% whereas random forest gives the best area under the receiver operating characteristics curve (AUC) of 0.97.


Author(s):  
Thomas Stone ◽  
Seung-Kyum Choi

The use of online, user-generated content for consumer preference modeling has been a recent topic of interest among the engineering and marketing communities. With the rapid growth of many different types of user-generate content sources, the tasks of reliable opinion extraction and data interpretation are critical challenges. This research investigates one of the largest and most-active content sources, Twitter, and its viability as a content source for preference modeling. Support Vector Machine (SVM) is used for sentiment classification of the messages, and a Twitter query strategy is developed to categorize messages according to product attributes and attribute levels. Over 7,000 messages are collected for a smartphone design case study. The preference modeling results are compared with those from a typical product review study, including over 2,500 product reviews. Overall, the results demonstrate that consumers do express their product opinions through Twitter; thus, this content source could potentially facilitate product design and decision-making via preference modeling.


E-commerce is evolving at a rapid pace that new doors have been opened for the people to express their emotions towards the products. The opinions of the customers plays an important role in the e-commerce sites. It is practically a tedious job to analyze the opinions of users and form a pros and cons for respective products. This paper develops a solution through machine learning algorithms by pre-processing the reviews based on features of mobile products. This mainly focus on aspect level of opinions which uses SentiWordNet, Natural Language Processing and aggregate scores for analyzing the text reviews. The experimental results provide the visual representation of products which provide better understanding of product reviews rather than reading through long textual reviews which includes strengths and weakness of the product using Naive Bayes algorithm. This results also helps the e-commerce vendors to overcome the weakness of the products and meet the customer expectations.


Author(s):  
Anurag Langan

Grading student answers is a tedious and time-consuming task. A study had found that almost on average around 25% of a teacher's time is spent in scoring the answer sheets of students. This time could be utilized in much better ways if computer technology could be used to score answers. This system will aim to grade student answers using the various Natural Language processing techniques and Machine Learning algorithms available today.


Author(s):  
Pawar A B ◽  
Jawale M A ◽  
Kyatanavar D N

Usages of Natural Language Processing techniques in the field of detection of fake news is analyzed in this research paper. Fake news are misleading concepts spread by invalid resources can provide damages to human-life, society. To carry out this analysis work, dataset obtained from web resource OpenSources.co is used which is mainly part of Signal Media. The document frequency terms as TF-IDF of bi-grams used in correlation with PCFG (Probabilistic Context Free Grammar) on a set of 11,000 documents extracted as news articles. This set tested on classification algorithms namely SVM (Support Vector Machines), Stochastic Gradient Descent, Bounded Decision Trees, Gradient Boosting algorithm with Random Forests. In experimental analysis, found that combination of Stochastic Gradient Descent with TF-IDF of bi-grams gives an accuracy of 77.2% in detecting fake contents, which observes with PCFGs having slight recalling defects


2020 ◽  
Author(s):  
Sohini Sengupta ◽  
Sareeta Mugde ◽  
Garima Sharma

Twitter is one of the world's biggest social media platforms for hosting abundant number of user-generated posts. It is considered as a gold mine of data. Majority of the tweets are public and thereby pullable unlike other social media platforms. In this paper we are analyzing the topics related to mental health that are recently (June, 2020) been discussed on Twitter. Also amidst the on-going pandemic, we are going to find out if covid-19 emerges as one of the factors impacting mental health. Further we are going to do an overall sentiment analysis to better understand the emotions of users.


Information ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 525
Author(s):  
Henrique Lopes-Cardoso ◽  
Tomás Freitas Osório ◽  
Luís Vilar Barbosa ◽  
Gil Rocha ◽  
Luís Paulo Reis ◽  
...  

The Natural Language Processing (NLP) community has witnessed huge improvements in the last years. However, most achievements are evaluated on benchmarked curated corpora, with little attention devoted to user-generated content and less-resourced languages. Despite the fact that recent approaches target the development of multi-lingual tools and models, they still underperform in languages such as Portuguese, for which linguistic resources do not abound. This paper exposes a set of challenges encountered when dealing with a real-world complex NLP problem, based on user-generated complaint data in Portuguese. This case study meets the needs of a country-wide governmental institution responsible for food safety and economic surveillance, and its responsibilities in handling a high number of citizen complaints. Beyond looking at the problem from an exclusively academic point of view, we adopt application-level concerns when analyzing the progress obtained through different techniques, including the need to obtain explainable decision support. We discuss modeling choices and provide useful insights for researchers working on similar problems or data.


Author(s):  
Niloufar Shoeibi ◽  
Nastaran Shoeibi ◽  
Pablo Chamoso ◽  
Zakie Alizadehsani ◽  
Juan M. Corchado

Social media platforms are entirely an undeniable part of the lifestyle from the past decade. Analyzing the information being shared is a crucial step to understand humans behavior. Social media analysis is aiming to guarantee a better experience for the user and risen user satisfaction. But first, it is necessary to know how and from which aspects to compare users with each other. In this paper, an intelligent system has been proposed to measure the similarity of Twitter profiles. For this, firstly, the timeline of each profile has been extracted using the official Twitter API. Then, all information is given to the proposed system. Next, in parallel, three aspects of a profile are derived. Behavioral ratios are time-series-related information showing the consistency and habits of the user. Dynamic time warping has been utilized for comparison of the behavioral ratios of two profiles. Next, Graph Network Analysis is used for monitoring the interactions of the user and its audience; for estimating the similarity of graphs, Jaccard similarity is used. Finally, for the Content similarity measurement, natural language processing techniques for preprocessing and TF-IDF for feature extraction are employed and then compared using the cosine similarity method. Results have presented the similarity level of different profiles. As the case study, people with the same interest show higher similarity. This way of comparison is helpful in many other areas. Also, it enables to find duplicate profiles; those are profiles with almost the same behavior and content.


Sign in / Sign up

Export Citation Format

Share Document