scholarly journals A real-time big data sentiment analysis for iraqi tweets using spark streaming

2020 ◽  
Vol 9 (4) ◽  
pp. 1411-1419
Author(s):  
Nashwan Dheyaa Zaki ◽  
Nada Yousif Hashim ◽  
Yasmin Makki Mohialden ◽  
Mostafa Abdulghafoor Mohammed ◽  
Tole Sutikno ◽  
...  

The scale of data streaming in social networks, such as Twitter, is increasing exponentially. Twitter is one of the most important and suitable big data sources for machine learning research in terms of analysis, prediction, extract knowledge, and opinions. People use Twitter platform daily to express their opinion which is a fundamental fact that influence their behaviors. In recent years, the flow of Iraqi dialect has been increased, especially on the Twitter platform. Sentiment analysis for different dialects and opinion mining has become a hot topic in data science researches. In this paper, we will attempt to develop a real-time analytic model for sentiment analysis and opinion mining to Iraqi tweets using spark streaming, also create a dataset for researcher in this field. The Twitter handle Bassam AlRawi is the case study here. The new method is more suitable in the current day machine learning applications and fast online prediction. 

Author(s):  
Amaechi Chinedum ◽  
Okeke Ogochukwu C

Opinion Mining also known as Sentiment Analysis (SA) has recently become the focus of many researchers, because analysis of online text is useful and demanded in many different applications. Analysis of social sentiments is a trending topic in this era because users share their emotions in more suitable format with the help of micro blogging services like twitter. Twitter provides information about individual's real-time feelings through the data resources provided by persons. The essential task is to extract user's tweets and implement an analysis and survey. However, this extracted information can very helpful to make prediction about the user's opinion towards specific policies. The motive of this paper is to perform a survey on sentiment analysis algorithms that shows the utilizing of different ML and Lexicon investigation methodologies and their accuracy. Our paper also focuses on the three kinds of machine learning algorithms for Sentiment Analysis- Supervised, Unsupervised Algorithms.


Various fields like Text Mining, Linguistics, Decision Making and Natural Language Processing together form the basis for Opinion Mining or Sentiment Analysis. People share their feelings, observations and thoughts on social media, which has emerged as a powerful tool for rapidly growing enormous repository of real time discussions and thoughts shared by people. In this paper, we aim to decipher the current popular opinions or emotions from various sources, hence, contributing to sentiment analysis domain. Text from social media, blogs and product reviews are classified according to the sentiment they project. We re-examine the traditional processes of sentiment extraction, to incorporate the increase in complexity and number of the data sources and relevant topics, while re-populating the meaning of sentiment. Working across and within numerous streams of social media, expression of sentiment and classification of polarity is re-examined, thereby redefining and enhancing the realm of sentiment. Numerous social media streams are analyzed to build datasets that are topical for each stream and are later polarized according to their sentiment expression. In conclusion, defining a sentiment and developing tools for its analysis in real time of human idea exchange is the motive.


Author(s):  
Carlos Arcila Calderón ◽  
Félix Ortega Mohedano ◽  
Mateo Álvarez ◽  
Miguel Vicente Mariño

The large-scale analysis of tweets in real-time using supervised sentiment analysis depicts a unique opportunity for communication and audience research. Bringing together machine learning and streaming analytics approaches in a distributed environment might help scholars to obtain valuable data from Twitter in order to immediately classify messages depending on the context with no restrictions of time or storage, empowering cross-sectional, longitudinal and experimental designs with new inputs. Even when communication and audience researchers begin to use computational methods, most of them remain unfamiliar with distributed technologies to face big data challenges. This paper describes the implementation of parallelized machine learning methods in Apache Spark to predict sentiments in real-time tweets and explains how this process can be scaled up using academic or commercial distributed computing when personal computers do not support computations and storage. We discuss the limitation of these methods and their implications in communication, audience and media studies.El análisis a gran escala de tweets en tiempo real utilizando el análisis de sentimiento supervisado representa una oportunidad única para la investigación de comunicación y audiencias. El poner juntos los enfoques de aprendizaje automático y de analítica en tiempo real en un entorno distribuido puede ayudar a los investigadores a obtener datos valiosos de Twitter con el fin de clasificar de forma inmediata mensajes en función de su contexto, sin restricciones de tiempo o almacenamiento, mejorando los diseños transversales, longitudinales y experimentales con nuevas fuentes de datos. A pesar de que los investigadores de comunicación y audiencias ya han comenzado a utilizar los métodos computacionales en sus rutinas, la mayoría desconocen el uso de las tecnologías de computo distribuido para afrontar retos de dimensión big data.  Este artículo describe la implementación de métodos de aprendizaje automático paralelizados en Apache Spark para predecir sentimientos de tweets en tiempo real y explica cómo este proceso puede ser escalado usando computación distribuida tanto comercial como académica, cuando los ordenadores personales son insuficientes para almacenar y analizar los datos. Se discuten las limitaciones de estos métodos y sus implicaciones en los estudios de medios, comunicación y audiencias.


IEEE Access ◽  
2021 ◽  
Vol 9 ◽  
pp. 89694-89698
Author(s):  
Aysegul Ucar ◽  
Jessy W. Grizzle ◽  
Maani Ghaffari ◽  
Mattias Wahde ◽  
H. Levent Akin ◽  
...  

Author(s):  
Hina Jamil ◽  
Tariq Umer ◽  
Celal Ceken ◽  
Fadi Al-Turjman
Keyword(s):  
Big Data ◽  

2021 ◽  
Vol 11 (22) ◽  
pp. 10596
Author(s):  
Chung-Hong Lee ◽  
Hsin-Chang Yang ◽  
Yenming J. Chen ◽  
Yung-Lin Chuang

Recently, an emerging application field through Twitter messages and algorithmic computation to detect real-time world events has become a new paradigm in the field of data science applications. During a high-impact event, people may want to know the latest information about the development of the event because they want to better understand the situation and possible trends of the event for making decisions. However, often in emergencies, the government or enterprises are usually unable to notify people in time for early warning and avoiding risks. A sensible solution is to integrate real-time event monitoring and intelligence gathering functions into their decision support system. Such a system can provide real-time event summaries, which are updated whenever important new events are detected. Therefore, in this work, we combine a developed Twitter-based real-time event detection algorithm with pre-trained language models for summarizing emergent events. We used an online text-stream clustering algorithm and self-adaptive method developed to gather the Twitter data for detection of emerging events. Subsequently we used the Xsum data set with a pre-trained language model, namely T5 model, to train the summarization model. The Rouge metrics were used to compare the summary performance of various models. Subsequently, we started to use the trained model to summarize the incoming Twitter data set for experimentation. In particular, in this work, we provide a real-world case study, namely the COVID-19 pandemic event, to verify the applicability of the proposed method. Finally, we conducted a survey on the example resulting summaries with human judges for quality assessment of generated summaries. From the case study and experimental results, we have demonstrated that our summarization method provides users with a feasible method to quickly understand the updates in the specific event intelligence based on the real-time summary of the event story.


2018 ◽  
Vol 3 (1) ◽  
pp. 49-59
Author(s):  
Zul Indra ◽  
Liza Trisnawati

Big data  telah menjadi salah satu topik yg paling menarik dalam dunia teknologi informasi sekarang ini. Salah satu sumber big data yang tersedia dan bebas diakses adalah artikel berita online. Dalam sehari, sebuah situs berita populer bisa menghasilkan lebih dari 100 artikel berita baru. Bayangkan berapa banyak jumlah halaman berita yang tersedia untuk kita baca sekarang ini. Sementara itu, tahap awal untuk melakukan analisis big data terhadap artikel berita online adalah data storing dan preprocessing. Berdasarkan pemikiran tersebut maka perlu dikembangkan suatu aplikasi yang bisa mengumpulkan artikel berita online secara otomatis untuk kemudian di analisis lebih lanjut. Penelitian ini bermaksud mengembangkan suatu aplikasi yang diberi nama dengan intelligent data collector (IDC) yang memudahkan kita untuk mengumpulkan artikel berita online. Aplikasi IDC ini mengumpulkan artikel berita online kemudian melakukan preprocessing terhadap artikel-artikel tersebut dan menyimpannya dalam database lokal. Database ini kemudian bisa digunakan lebih lanjut untuk berrbagai macam data mining proses seperti opinion mining (sentiment analysis), topic classification, text summarization dan lain sebagainya.


Sign in / Sign up

Export Citation Format

Share Document