Supervised Machine Learning for the Detection of Troll Profiles in Twitter Social Network: Application to a Real Case of Cyberbullying

Cybercrime is a growing threat for firms and customers that emerged with the digitization of business. However, research shows that even though people claim that they are concerned about their privacy online, they do not act correspondingly. This study investigates how prevalent security issues are during a cyber attack among Twitter users. The case under examination is the security breach at the US ticket sales company, Ticketfly, that compromised the information of 26 million users. Tweets related to cybersecurity are detected through the application of automated text classification based on supervised machine learning with support vector machines. Subsequently, the users that wrote security-related tweets are grouped into communities through a social network analysis. The results of this multi-method study show that users concerned about security issues are mostly part of expert communities with already superior knowledge about cybersecurity.

Download Full-text

Prediction of MicroRNA-Disease Associations Based on Social Network Analysis Methods

BioMed Research International ◽

10.1155/2015/810514 ◽

2015 ◽

Vol 2015 ◽

pp. 1-9 ◽

Cited By ~ 69

Author(s):

Quan Zou ◽

Jinjin Li ◽

Qingqi Hong ◽

Ziyu Lin ◽

Yun Wu ◽

...

Keyword(s):

Machine Learning ◽

Social Network ◽

Social Network Analysis ◽

Network Analysis ◽

Cross Validation ◽

Supervised Machine Learning ◽

Rna Molecules ◽

Disease Associations ◽

Endogenous Genes ◽

Leave One Out

MicroRNAs constitute an important class of noncoding, single-stranded, ~22 nucleotide long RNA molecules encoded by endogenous genes. They play an important role in regulating gene transcription and the regulation of normal development. MicroRNAs can be associated with disease; however, only a few microRNA-disease associations have been confirmed by traditional experimental approaches. We introduce two methods to predict microRNA-disease association. The first method, KATZ, focuses on integrating the social network analysis method with machine learning and is based on networks derived from known microRNA-disease associations, disease-disease associations, and microRNA-microRNA associations. The other method, CATAPULT, is a supervised machine learning method. We applied the two methods to 242 known microRNA-disease associations and evaluated their performance using leave-one-out cross-validation and 3-fold cross-validation. Experiments proved that our methods outperformed the state-of-the-art methods.

Download Full-text

Predicting Current Juul use among Emerging Adults through Twitter Feeds

10.1101/19010553 ◽

2019 ◽

Author(s):

Tung Tran ◽

Melinda Ickes ◽

Jakob W. Hester ◽

Ramakanth Kavuluru

Keyword(s):

Machine Learning ◽

Social Media ◽

Social Network ◽

Emerging Adults ◽

Predictive Performance ◽

Training Data ◽

Supervised Machine Learning ◽

Nicotine Dependency ◽

Usage Patterns ◽

Survey Responses

AbstractIntroductionCan we predict whether someone uses Juul based on their social media activities? This is the central premise of the effort reported in this paper. Several recent social media-related studies on Juul use tend to focus on the characterization of Juul-related messages on social media. In this study, we assess the potential in using machine learning methods to automatically identify whether an individual uses Juul (past 30-day usage) based on their Twitter data.MethodsWe obtained a collection of 588 instances, for training and testing, of Juul use patterns (along with associated Twitter handles) via survey responses of college students. With this data, we built and tested supervised machine learning models based on linear and deep learning algorithms with textual, social network (friends and followers), and other hand-crafted features.ResultsThe linear model with textual and follower network features performed best with a precision-recall trade-off such that precision (PPV) is 57% at 24% recall (sensitivity). Hence, at least every other college-attending Twitter user flagged by our model is expected to be a Juul user. Additionally, our results indicate that social network features tend to have a large impact (positive) on predictive performance.ConclusionThere are enough predictive signals from social feeds for supervised modeling of Juul use, even with limited training data, implying that such models are highly beneficial to very focused intervention campaigns. Moreover, this initial success indicates potential for more involved automated surveillance of Juul use based on social media data, including Juul usage patterns, nicotine dependency, and risk awareness.

Download Full-text

Exploring the Use of Machine Learning to Automate the Qualitative Coding of Church-related Tweets

Fieldwork in Religion ◽

10.1558/firn.40610 ◽

2020 ◽

Vol 14 (2) ◽

pp. 140-159

Author(s):

Anthony-Paul Cooper ◽

Emmanuel Awuni Kolog ◽

Erkki Sutinen

Keyword(s):

Machine Learning ◽

Online Community ◽

High Volume ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Social Media Data ◽

Twitter Data ◽

Resource Intensity ◽

Media Data ◽

Better Than

This article builds on previous research around the exploration of the content of church-related tweets. It does so by exploring whether the qualitative thematic coding of such tweets can, in part, be automated by the use of machine learning. It compares three supervised machine learning algorithms to understand how useful each algorithm is at a classification task, based on a dataset of human-coded church-related tweets. The study finds that one such algorithm, Naïve-Bayes, performs better than the other algorithms considered, returning Precision, Recall and F-measure values which each exceed an acceptable threshold of 70%. This has far-reaching consequences at a time where the high volume of social media data, in this case, Twitter data, means that the resource-intensity of manual coding approaches can act as a barrier to understanding how the online community interacts with, and talks about, church. The findings presented in this article offer a way forward for scholars of digital theology to better understand the content of online church discourse.

Download Full-text

Application of Supervised Machine Learning Algorithms for Lithofacies Classification.

10.2523/19349-ms ◽

2019 ◽

Author(s):

Subhadeep Sarkar ◽

Chandan Majumdar

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Lithofacies Classification

Download Full-text

Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition

10.26434/chemrxiv.5513581.v1 ◽

2017 ◽

Author(s):

Sabrina Jaeger ◽

Simone Fulle ◽

Samo Turk

Keyword(s):

Machine Learning ◽

Language Processing ◽

Supervised Machine Learning ◽

Learning Approach ◽

Learning Approaches ◽

Unsupervised Machine Learning ◽

Feature Representations ◽

Machine Learning Approach ◽

The Individual ◽

Vector Representations

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.

Download Full-text