Transfer Learning with Social Media Content in the Ride-Hailing Domain by Using a Hybrid Machine Learning Architecture

Álvaro de Pablo; Oscar Araque; Carlos A. Iglesias

doi:10.3390/electronics11020189

Transfer Learning with Social Media Content in the Ride-Hailing Domain by Using a Hybrid Machine Learning Architecture

Electronics ◽

10.3390/electronics11020189 ◽

2022 ◽

Vol 11 (2) ◽

pp. 189

Author(s):

Álvaro de Pablo ◽

Oscar Araque ◽

Carlos A. Iglesias

Keyword(s):

Machine Learning ◽

Social Media ◽

Transfer Learning ◽

Topic Modeling ◽

Media Content ◽

Learning From Data ◽

Modeling Techniques ◽

Hybrid Machine ◽

Vector Representations ◽

Google Play

The analysis of the content of posts written on social media has established an important line of research in recent years. The study of these texts, as well as their relationship with each other and their dependence on the platform on which they are written, enables the behavior analysis of users and their opinions with respect to different domains. In this work, a hybrid machine learning-based system has been developed to classify texts using topic modeling techniques and different word-vector representations, as well as traditional text representations. The system has been trained with ride-hailing posts extracted from Reddit, showing promising performance. Then, the generated models have been tested with data extracted from other sources such as Twitter and Google Play, classifying these texts without retraining any models and thus performing Transfer Learning. The obtained results show that our proposed architecture is effective when performing Transfer Learning from data-rich domains and applying them to other sources.

Download Full-text

Social Media Content Categorization Using Supervised Based Machine Learning Methods and Natural Language Processing in Bangla Language

2020 11th International Conference on Electrical and Computer Engineering (ICECE) ◽

10.1109/icece51571.2020.9393095 ◽

2020 ◽

Author(s):

Md. Rejaul Alam ◽

Afsana Akter ◽

Minhajul Abedin Shafin ◽

Md. Mehedi Hasan ◽

Antara Mahmud

Keyword(s):

Machine Learning ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Media Content ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Transfer Learning for Risk Classification of Social Media Posts: Model Evaluation Study (Preprint)

10.2196/preprints.15371 ◽

2019 ◽

Author(s):

Derek Howard ◽

Marta M Maslej ◽

Justin Lee ◽

Jacob Ritchie ◽

Geoffrey Woollard ◽

...

Keyword(s):

Mental Health ◽

Machine Learning ◽

Social Media ◽

Transfer Learning ◽

Computational Linguistics ◽

Feature Representation ◽

Fine Tuning ◽

Language Models ◽

Universal Sentence ◽

Text Feature

BACKGROUND Mental illness affects a significant portion of the worldwide population. Online mental health forums can provide a supportive environment for those afflicted and also generate a large amount of data that can be mined to predict mental health states using machine learning methods. OBJECTIVE This study aimed to benchmark multiple methods of text feature representation for social media posts and compare their downstream use with automated machine learning (AutoML) tools. We tested on datasets that contain posts labeled for perceived suicide risk or moderator attention in the context of self-harm. Specifically, we assessed the ability of the methods to prioritize posts that a moderator would identify for immediate response. METHODS We used 1588 labeled posts from the Computational Linguistics and Clinical Psychology (CLPsych) 2017 shared task collected from the Reachout.com forum. Posts were represented using lexicon-based tools, including Valence Aware Dictionary and sEntiment Reasoner, Empath, and Linguistic Inquiry and Word Count, and also using pretrained artificial neural network models, including DeepMoji, Universal Sentence Encoder, and Generative Pretrained Transformer-1 (GPT-1). We used Tree-based Optimization Tool and Auto-Sklearn as AutoML tools to generate classifiers to triage the posts. RESULTS The top-performing system used features derived from the GPT-1 model, which was fine-tuned on over 150,000 unlabeled posts from Reachout.com. Our top system had a macroaveraged F1 score of 0.572, providing a new state-of-the-art result on the CLPsych 2017 task. This was achieved without additional information from metadata or preceding posts. Error analyses revealed that this top system often misses expressions of hopelessness. In addition, we have presented visualizations that aid in the understanding of the learned classifiers. CONCLUSIONS In this study, we found that transfer learning is an effective strategy for predicting risk with relatively little labeled data and noted that fine-tuning of pretrained language models provides further gains when large amounts of unlabeled text are available.

Download Full-text

Topic modeling for social media content: A practical approach

2016 3rd International Conference on Computer and Information Sciences (ICCOINS) ◽

10.1109/iccoins.2016.7783248 ◽

2016 ◽

Cited By ~ 4

Author(s):

Vala Ali Rohani ◽

Shahid Shayaa ◽

Ghazaleh Babanejaddehaki

Keyword(s):

Social Media ◽

Topic Modeling ◽

Practical Approach ◽

Media Content

Download Full-text

Leveraging Transfer Learning to Analyze Opinions, Attitudes, and Behavioral Intentions Toward COVID-19 Vaccines: Social Media Content and Temporal Analysis (Preprint)

10.2196/preprints.30251 ◽

2021 ◽

Author(s):

Siru Liu ◽

Jili Li ◽

Jialin Liu

Keyword(s):

Machine Learning ◽

Social Media ◽

Transfer Learning ◽

Behavioral Intentions ◽

Temporal Analysis ◽

Public Understanding ◽

Support Vector ◽

Learning Models ◽

The Public ◽

Over Time

BACKGROUND The COVID-19 vaccine is considered to be the most promising approach to alleviate the pandemic. However, in recent surveys, acceptance of the COVID-19 vaccine has been low. To design more effective outreach interventions, there is an urgent need to understand public perceptions of COVID-19 vaccines. OBJECTIVE Our objective was to analyze the potential of leveraging transfer learning to detect tweets containing opinions, attitudes, and behavioral intentions toward COVID-19 vaccines, and to explore temporal trends as well as automatically extract topics across a large number of tweets. METHODS We developed machine learning and transfer learning models to classify tweets, followed by temporal analysis and topic modeling on a dataset of COVID-19 vaccine–related tweets posted from November 1, 2020 to January 31, 2021. We used the F1 values as the primary outcome to compare the performance of machine learning and transfer learning models. The statistical values and <i>P</i> values from the Augmented Dickey-Fuller test were used to assess whether users’ perceptions changed over time. The main topics in tweets were extracted by latent Dirichlet allocation analysis. RESULTS We collected 2,678,372 tweets related to COVID-19 vaccines from 841,978 unique users and annotated 5000 tweets. The F1 values of transfer learning models were 0.792 (95% CI 0.789-0.795), 0.578 (95% CI 0.572-0.584), and 0.614 (95% CI 0.606-0.622) for these three tasks, which significantly outperformed the machine learning models (logistic regression, random forest, and support vector machine). The prevalence of tweets containing attitudes and behavioral intentions varied significantly over time. Specifically, tweets containing positive behavioral intentions increased significantly in December 2020. In addition, we selected tweets in the following categories: positive attitudes, negative attitudes, positive behavioral intentions, and negative behavioral intentions. We then identified 10 main topics and relevant terms for each category. CONCLUSIONS Overall, we provided a method to automatically analyze the public understanding of COVID-19 vaccines from real-time data in social media, which can be used to tailor educational programs and other interventions to effectively promote the public acceptance of COVID-19 vaccines.

Download Full-text

Narratives of the Refugee Crisis: A Comparative Study of Mainstream-Media and Twitter

Media and Communication ◽

10.17645/mac.v7i2.1983 ◽

2019 ◽

Vol 7 (2) ◽

pp. 275-288 ◽

Cited By ~ 2

Author(s):

Adina Nerghes ◽

Ju-Sung Lee

Keyword(s):

Social Media ◽

Comparative Study ◽

Topic Modeling ◽

Media Content ◽

Refugee Crisis ◽

Mainstream Media ◽

Raising Awareness ◽

Media Space ◽

European Refugee Crisis ◽

Media Form

The European refugee crisis received heightened attention at the beginning of September 2015, when images of the drowned child, Aylan Kurdi, surfaced across mainstream and social media. While the flows of displaced persons, especially from the Middle East into Europe, had been ongoing until that date, this event and its coverage sparked a media firestorm. Mainstream-media content plays a major role in shaping discourse about events such as the refugee crisis, while social media’s participatory affordances allow for the narratives to be perpetuated, challenged, and injected with new perspectives. In this study, the perspectives and narratives of the refugee crisis from the mainstream news and Twitter—in the days following Aylan’s death—are compared and contrasted. Themes are extracted through topic modeling (LDA) and reveal how news and Twitter converge and also diverge. We show that in the initial stages of a crisis and following the tragic death of Aylan, public discussion on Twitter was highly positive. Unlike the mainstream-media, Twitter offered an alternative and multifaceted narrative, not bound by geo-politics, raising awareness and calling for solidarity and empathy towards those affected. This study demonstrates how mainstream and social media form a new and complementary media space, where narratives are created and transformed.

Download Full-text

Analyzing Social Media to Explore the Attitudes and Behaviors Following the Announcement of Successful COVID-19 Vaccine Trials: Infodemiology Study

JMIR Infodemiology ◽

10.2196/28800 ◽

2021 ◽

Vol 1 (1) ◽

pp. e28800

Author(s):

Jean-Christophe Boucher ◽

Kirsten Cornelson ◽

Jamie L Benham ◽

Madison M Fullerton ◽

Theresa Tang ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Content Analysis ◽

Social Network ◽

Multinational Corporations ◽

Vaccine Hesitancy ◽

Media Content ◽

Media Content Analysis ◽

Vaccine Trials ◽

The Government

Background The rollout of COVID-19 vaccines has brought vaccine hesitancy to the forefront in managing this pandemic. COVID-19 vaccine hesitancy is fundamentally different from that of other vaccines due to the new technologies being used, rapid development, and widespread global distribution. Attitudes on vaccines are largely driven by online information, particularly information on social media. The first step toward influencing attitudes about immunization is understanding the current patterns of communication that characterize the immunization debate on social media platforms. Objective We aimed to evaluate societal attitudes, communication trends, and barriers to COVID-19 vaccine uptake through social media content analysis to inform communication strategies promoting vaccine acceptance. Methods Social network analysis (SNA) and unsupervised machine learning were used to characterize COVID-19 vaccine content on Twitter globally. Tweets published in English and French were collected through the Twitter application programming interface between November 19 and 26, 2020, just following the announcement of initial COVID-19 vaccine trials. SNA was used to identify social media clusters expressing mistrustful opinions on COVID-19 vaccination. Based on the SNA results, an unsupervised machine learning approach to natural language processing using a sentence-level algorithm transfer function to detect semantic textual similarity was performed in order to identify the main themes of vaccine hesitancy. Results The tweets (n=636,516) identified that the main themes driving the vaccine hesitancy conversation were concerns of safety, efficacy, and freedom, and mistrust in institutions (either the government or multinational corporations). A main theme was the safety and efficacy of mRNA technology and side effects. The conversation around efficacy was that vaccines were unlikely to completely rid the population of COVID-19, polymerase chain reaction testing is flawed, and there is no indication of long-term T-cell immunity for COVID-19. Nearly one-third (45,628/146,191, 31.2%) of the conversations on COVID-19 vaccine hesitancy clusters expressed concerns for freedom or mistrust of institutions (either the government or multinational corporations) and nearly a quarter (34,756/146,191, 23.8%) expressed criticism toward the government’s handling of the pandemic. Conclusions Social media content analysis combined with social network analysis provides insights into the themes of the vaccination conversation on Twitter. The themes of safety, efficacy, and trust in institutions will need to be considered, as targeted outreach programs and intervention strategies are deployed on Twitter to improve the uptake of COVID-19 vaccination.

Download Full-text

Social Media Content Analysis

Advances in Data Mining and Database Management - Challenges and Applications of Data Analytics in Social Perspectives ◽

10.4018/978-1-7998-2566-1.ch009 ◽

2021 ◽

pp. 156-174

Author(s):

D. Sudaroli Vijayakumar ◽

Senbagavalli M. ◽

Jesudas Thangaraju ◽

Sathiyamoorthi V.

Keyword(s):

Machine Learning ◽

Social Media ◽

Media Content ◽

Media Content Analysis ◽

Comprehensive Overview ◽

Case Examples ◽

The Social ◽

The Right ◽

Audio Video ◽

Suitable Case

Today's wealth and value are data. Data, used sensibly, are making wonders to make wise decisions for individuals, corporates, etc. The era of spending time with an individual to understand them better is gone. Individual's interests, requirements are identified easily by observing the activities an individual performs in social media. Social media, started as a tool for interaction, has grown as a platform to make and promote business. Social media content is unavoidable as the data that are going to be dealt with is huge in volume, variety, and velocity. The demand for using machine learning in analysing social media content is increasing at a faster pace in identifying influencers, demands of individuals. However, the real complexity lies in making the data from social media suitable for analysis. The type of data from social media content may be audio, video, image. The chapter attempts to give a comprehensive overview of the various pre-processing methods involved in dealing the social media content and the usage of right algorithms at the right time with suitable case examples.

Download Full-text

A Machine Learning Approach to Extract Opinions from Social Media Content

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.5.20080 ◽

2018 ◽

Vol 7 (4.5) ◽

pp. 257

Author(s):

Salina Adinarayana ◽

E Ilavarasan

Keyword(s):

Machine Learning ◽

Social Media ◽

Opinion Mining ◽

Media Content ◽

Learning Approach ◽

Mobile Environment ◽

Classification Approach ◽

Machine Learning Approach ◽

Semantic Orientation ◽

Feature Based

The Opinion Mining (OM) from mobile based social media content (SMC) is more challenging compared to topic-based mining, and it cannot be performed based on just examining the presence of single words in the text containing opinion expressions. Moreover, the existing systems of opinion classification find that a large number of features that are not feasible for the mobile environment. The existing methods of OM in this mobile environment do not consider the semantic orientation of the SMC in the review. The proposed machine learning approach extends the feature-based classification approach to identify the orientation of the phrase on taking context into account to improve the accuracy.

Download Full-text

Corporate disclosure via social media: a data science approach

Online Information Review ◽

10.1108/oir-03-2019-0084 ◽

2020 ◽

Vol 44 (1) ◽

pp. 278-298

Author(s):

Marian H. Amin ◽

Ehab K.A. Mohamed ◽

Ahmed Elragal

Keyword(s):

Machine Learning ◽

Social Media ◽

Topic Modeling ◽

Data Science ◽

Latent Dirichlet Allocation ◽

Gender Diversity ◽

Financial Disclosure ◽

Extensive Literature ◽

Content Type ◽

Board Characteristics

Purpose The purpose of this paper is to investigate corporate financial disclosure via Twitter among the top listed 350 companies in the UK as well as identify the determinants of the extent of social media usage to disclose financial information. Design/methodology/approach This study applies an unsupervised machine learning technique, namely, Latent Dirichlet Allocation topic modeling to identify financial disclosure tweets. Panel, Logistic and Generalized Linear Model Regressions are also run to identify the determinants of financial disclosure on Twitter focusing mainly on board characteristics. Findings Topic modeling results reveal that companies mainly tweet about 12 topics, including financial disclosure, which has a probability of occurrence of about 7 percent. Several board characteristics are found to be associated with the extent of Twitter usage as a financial disclosure platform, among which are board independence, gender diversity and board tenure. Originality/value The extensive literature examines disclosure via traditional media and its determinants, yet this paper extends the literature by investigating the relatively new disclosure channel of social media. This study is among the first to utilize machine learning, instead of manual coding techniques, to automatically unveil the tweets’ topics and reveal financial disclosure tweets. It is also among the first to investigate the relationships between several board characteristics and financial disclosure on Twitter; providing a distinction between the roles of executive vs non-executive directors relating to disclosure decisions.

Download Full-text

Transfer learning features for predicting aesthetics through a novel hybrid machine learning method

Neural Computing and Applications ◽

10.1007/s00521-019-04065-4 ◽

2019 ◽

Vol 32 (10) ◽

pp. 5889-5900 ◽

Cited By ~ 3

Author(s):

Adrian Carballal ◽

Carlos Fernandez-Lozano ◽

Jonathan Heras ◽

Juan Romero

Keyword(s):

Machine Learning ◽

Transfer Learning ◽

Machine Learning Method ◽

Learning Method ◽

Hybrid Machine

Download Full-text