Corporate disclosure via social media: a data science approach

2020 ◽  
Vol 44 (1) ◽  
pp. 278-298
Author(s):  
Marian H. Amin ◽  
Ehab K.A. Mohamed ◽  
Ahmed Elragal

Purpose The purpose of this paper is to investigate corporate financial disclosure via Twitter among the top listed 350 companies in the UK as well as identify the determinants of the extent of social media usage to disclose financial information. Design/methodology/approach This study applies an unsupervised machine learning technique, namely, Latent Dirichlet Allocation topic modeling to identify financial disclosure tweets. Panel, Logistic and Generalized Linear Model Regressions are also run to identify the determinants of financial disclosure on Twitter focusing mainly on board characteristics. Findings Topic modeling results reveal that companies mainly tweet about 12 topics, including financial disclosure, which has a probability of occurrence of about 7 percent. Several board characteristics are found to be associated with the extent of Twitter usage as a financial disclosure platform, among which are board independence, gender diversity and board tenure. Originality/value The extensive literature examines disclosure via traditional media and its determinants, yet this paper extends the literature by investigating the relatively new disclosure channel of social media. This study is among the first to utilize machine learning, instead of manual coding techniques, to automatically unveil the tweets’ topics and reveal financial disclosure tweets. It is also among the first to investigate the relationships between several board characteristics and financial disclosure on Twitter; providing a distinction between the roles of executive vs non-executive directors relating to disclosure decisions.

2020 ◽  
Vol 44 (5) ◽  
pp. 1027-1055
Author(s):  
Thanh-Tho Quan ◽  
Duc-Trung Mai ◽  
Thanh-Duy Tran

PurposeThis paper proposes an approach to identify categorical influencers (i.e. influencers is the person who is active in the targeted categories) in social media channels. Categorical influencers are important for media marketing but to automatically detect them remains a challenge.Design/methodology/approachWe deployed the emerging deep learning approaches. Precisely, we used word embedding to encode semantic information of words occurring in the common microtext of social media and used variational autoencoder (VAE) to approximate the topic modeling process, through which the active categories of influencers are automatically detected. We developed a system known as Categorical Influencer Detection (CID) to realize those ideas.FindingsThe approach of using VAE to simulate the Latent Dirichlet Allocation (LDA) process can effectively handle the task of topic modeling on the vast dataset of microtext on social media channels.Research limitations/implicationsThis work has two major contributions. The first one is the detection of topics on microtexts using deep learning approach. The second is the identification of categorical influencers in social media.Practical implicationsThis work can help brands to do digital marketing on social media effectively by approaching appropriate influencers. A real case study is given to illustrate it.Originality/valueIn this paper, we discuss an approach to automatically identify the active categories of influencers by performing topic detection from the microtext related to the influencers in social media channels. To do so, we use deep learning to approximate the topic modeling process of the conventional approaches (such as LDA).


2019 ◽  
Vol 17 (2) ◽  
pp. 262-281 ◽  
Author(s):  
Shiwangi Singh ◽  
Akshay Chauhan ◽  
Sanjay Dhir

Purpose The purpose of this paper is to use Twitter analytics for analyzing the startup ecosystem of India. Design/methodology/approach The paper uses descriptive analysis and content analytics techniques of social media analytics to examine 53,115 tweets from 15 Indian startups across different industries. The study also employs techniques such as Naïve Bayes Algorithm for sentiment analysis and Latent Dirichlet allocation algorithm for topic modeling of Twitter feeds to generate insights for the startup ecosystem in India. Findings The Indian startup ecosystem is inclined toward digital technologies, concerned with people, planet and profit, with resource availability and information as the key to success. The study categorizes the emotions of tweets as positive, neutral and negative. It was found that the Indian startup ecosystem has more positive sentiments than negative sentiments. Topic modeling enables the categorization of the identified keywords into clusters. Also, the study concludes on the note that the future of the Indian startup ecosystem is Digital India. Research limitations/implications The analysis provides a methodology that future researchers can use to extract relevant information from Twitter to investigate any issue. Originality/value Any attempt to analyze the startup ecosystem of India through social media analysis is limited. This research aims to bridge such a gap and tries to analyze the startup ecosystem of India from the lens of social media platforms like Twitter.


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Krzysztof Celuch

PurposeIn search of creating an extraordinary experience for customers, services have gone beyond the means of a transaction between buyers and sellers. In the event industry, where purchasing tickets online is a common procedure, it remains unclear as to how to enhance the multifaceted experience. This study aims at offering a snapshot into the most valued aspects for consumers and to uncover consumers' feelings toward their experience of purchasing event tickets on third-party ticketing platforms.Design/methodology/approachThis is a cross-disciplinary study that applies knowledge from both data science and services marketing. Under the guise of natural language processing, latent Dirichlet allocation topic modeling and sentiment analysis were used to interpret the embedded meanings based on online reviews.FindingsThe findings conceptualized ten dimensions valued by eventgoers, including technical issues, value of core product and service, word-of-mouth, trustworthiness, professionalism and knowledgeability, customer support, information transparency, additional fee, prior experience and after-sales service. Among these aspects, consumers rated the value of the core product and service to be the most positive experience, whereas the additional fee was considered the least positive one.Originality/valueDrawing from the intersection of natural language processing and the status quo of the event industry, this study offers a better understanding of eventgoers' experiences in the case of purchasing online event tickets. It also provides a hands-on guide for marketers to stage memorable experiences in the era of digitalization.


2021 ◽  
Author(s):  
Shimon Ohtani

Abstract The importance of biodiversity conservation is gradually being recognized worldwide, and 2020 was the final year of the Aichi Biodiversity Targets formulated at the 10th Conference of the Parties to the Convention on Biological Diversity (COP10) in 2010. Unfortunately, the majority of the targets were assessed as unachievable. While it is essential to measure public awareness of biodiversity when setting the post-2020 targets, it is also a difficult task to propose a method to do so. This study provides a diachronic exploration of the discourse on “biodiversity” from 2010 to 2020, using Twitter posts, in combination with sentiment analysis and topic modeling, which are commonly used in data science. Through the aggregation and comparison of n-grams, the visualization of eight types of emotional tendencies using the NRC emotion lexicon, the construction of topic models using Latent Dirichlet allocation (LDA), and the qualitative analysis of tweet texts based on these models, I was able to classify and analyze unstructured tweets in a meaningful way. The results revealed the evolution of words used with “biodiversity” on Twitter over the past decade, the emotional tendencies behind the contexts in which “biodiversity” has been used, and the approximate content of tweet texts that have constituted topics with distinctive characteristics. While the search for people's awareness through SNS analysis still has many limitations, it is undeniable that important suggestions can be obtained. In order to further refine the research method, it will be essential to improve the skills of analysts and accumulate research examples as well as to advance data science.


2021 ◽  
Author(s):  
Myeong Gyu Kim ◽  
Jae Hyun Kim ◽  
Kyungim Kim

BACKGROUND Garlic-related misinformation is prevalent whenever a virus outbreak occurs. Again, with the outbreak of coronavirus disease 2019 (COVID-19), garlic-related misinformation is spreading through social media sites, including Twitter. Machine learning-based approaches can be used to detect misinformation from vast tweets. OBJECTIVE This study aimed to develop machine learning algorithms for detecting misinformation on garlic and COVID-19 in Twitter. METHODS This study used 5,929 original tweets mentioning garlic and COVID-19. Tweets were manually labeled as misinformation, accurate information, and others. We tested the following algorithms: k-nearest neighbors; random forest; support vector machine (SVM) with linear, radial, and polynomial kernels; and neural network. Features for machine learning included user-based features (verified account, user type, number of followers, and follower rate) and text-based features (uniform resource locator, negation, sentiment score, Latent Dirichlet Allocation topic probability, number of retweets, and number of favorites). A model with the highest accuracy in the training dataset (70% of overall dataset) was tested using a test dataset (30% of overall dataset). Predictive performance was measured using overall accuracy, sensitivity, specificity, and balanced accuracy. RESULTS SVM with the polynomial kernel model showed the highest accuracy of 0.670. The model also showed a balanced accuracy of 0.757, sensitivity of 0.819, and specificity of 0.696 for misinformation. Important features in the misinformation and accurate information classes included topic 4 (common myths), topic 13 (garlic-specific myths), number of followers, topic 11 (misinformation on social media), and follower rate. Topic 3 (cooking recipes) was the most important feature in the others class. CONCLUSIONS Our SVM model showed good performance in detecting misinformation. The results of our study will help detect misinformation related to garlic and COVID-19. It could also be applied to prevent misinformation related to dietary supplements in the event of a future outbreak of a disease other than COVID-19.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Mohamed Omran ◽  
Dinesh Ramdhony ◽  
Oren Mooneeapen ◽  
Vishaka Nursimloo

PurposeDrawing upon agency theory, this study analyses the influence of board characteristics on integrated reporting (IR) for the top 50 companies listed on the Australian Securities Exchange (ASX50). Focus is placed on IR at the aggregate level as well as its separate components, namely Future Opportunities and Risks (FOPRI), Governance and Strategy (GOVSTR), Performance (PERF), Overview and Business Model (OBM) and General Preparation and Presentation (GPP).Design/methodology/approachA checklist is devised based on the IIRC (International Integrated Reporting Council) framework to track companies' disclosures for the period from 1st July 2014 to 30th June 2017. Regression analysis is used to investigate the determinants (board size, board independence, activity of the board, gender diversity, firm size, profitability and growth opportunities) of IR and its separate components.FindingsThe findings indicate a significant and positive effect of board independence on the aggregate IR index, FOPRI and GPP. A negative and significant association is found between activity of the board and both the aggregate IR index and its separate components, including GOVSTR, PERF and GPP. Additionally, the aggregate IR index is significantly related to firm size, profitability and growth opportunities.Research limitations/implicationsThe limited sample of 50 companies over three years is the main limitation of the study. The study suffers from an inherent limitation from the use of content analysis in assessing the level of IR. No checklist to measure the level of IR can be fully exhaustive. Furthermore, we focus on whether an item in the checklist is disclosed, using a dichotomous scale, thus ignoring the quality of information disclosed.Practical implicationsThe study has several practical implications. From a managerial perspective, it shows that having more board meetings harms the level of IR. The results can guide regulators, such as the Australian Securities and Investment Commission (ASIC) and the Australian Securities Exchange (ASX), when drafting new regulations/guidelines/listing rules. If regulators aim for a higher level of integration in the reports, they know which “triggers to pull” to attain their target. Our results can guide regulators to choose the appropriate trigger among various alternatives. For instance, if a higher level of integrated reporting is desired, size instead of profitability should be chosen. Finally, ASX listed companies can use our checklist as a scorecard for their self-assessment.Originality/valueThis research is the first to investigate IR by devising a checklist based on IIRC (2013) along with an additional GPP component in the ASX context. Using separate models to examine each component of the aggregate IR index is also unique to this study. The study also brings to the fore the role of gender-diverse boards in promoting IR. It reiterates the debate about imposing a quota for better gender representation on boards.


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Mohamed A.K. Basuony ◽  
Ehab K.A. Mohamed ◽  
Ahmed Elragal ◽  
Khaled Hussainey

Purpose This study aims to investigate the extent and characteristics of corporate internet disclosure via companies’ websites as well via social media and networks sites in the four leading English-speaking stock markets, namely, Australia, Canada, the UK and the USA. Design/methodology/approach A disclosure index comprising a set of items that encompasses two facets of online disclosure, namely, company websites and social media sites, is used. This paper adopts a data science approach to investigate corporate internet disclosure practices among top listed firms in Australia, Canada, the UK and the USA. Findings The results reveal the underlying relations between the determining factors of corporate disclosure, i.e. profitability, leverage, liquidity and firm size. Profitability in its own has no great effect on the degree of corporate internet disclosure whether via company websites or social media sites. Liquidity has an impact on the degree of disclosure. Firm size and leverage appear to be the most important factors driving better disclosure via social media. American companies tend to be on the cutting edge of technology when it comes to corporate disclosure. Practical implications This paper provides new insights into corporate internet disclosure that will benefit all stakeholders with an interest in corporate reporting. Social media is an influential means of communication that can enable corporate office to get instant feedback enhancing their decision-making process. Originality/value To the best of the authors’ knowledge, this study is amongst few studies of corporate disclosure via social media platforms. This study has adopted disclosure index incorporating social media as well as applying data science approach in disclosure in an attempt to unfold how accounting could benefit from data science techniques.


2020 ◽  
Vol 110 (S3) ◽  
pp. S331-S339
Author(s):  
Amelia Jamison ◽  
David A. Broniatowski ◽  
Michael C. Smith ◽  
Kajal S. Parikh ◽  
Adeena Malik ◽  
...  

Objectives. To adapt and extend an existing typology of vaccine misinformation to classify the major topics of discussion across the total vaccine discourse on Twitter. Methods. Using 1.8 million vaccine-relevant tweets compiled from 2014 to 2017, we adapted an existing typology to Twitter data, first in a manual content analysis and then using latent Dirichlet allocation (LDA) topic modeling to extract 100 topics from the data set. Results. Manual annotation identified 22% of the data set as antivaccine, of which safety concerns and conspiracies were the most common themes. Seventeen percent of content was identified as provaccine, with roughly equal proportions of vaccine promotion, criticizing antivaccine beliefs, and vaccine safety and effectiveness. Of the 100 LDA topics, 48 contained provaccine sentiment and 28 contained antivaccine sentiment, with 9 containing both. Conclusions. Our updated typology successfully combines manual annotation with machine-learning methods to estimate the distribution of vaccine arguments, with greater detail on the most distinctive topics of discussion. With this information, communication efforts can be developed to better promote vaccines and avoid amplifying antivaccine rhetoric on Twitter.


2019 ◽  
Vol 119 (1) ◽  
pp. 111-128 ◽  
Author(s):  
Jianhong Luo ◽  
Xuwei Pan ◽  
Shixiong Wang ◽  
Yujing Huang

Purpose Delivering messages and information to potentially interested users is one of the distinguishing applications of online enterprise social network (ESN). The purpose of this paper is to provide insights to better understand the repost preferences of users and provide personalized information service in enterprise social media marketing. Design/methodology/approach It is accomplished by constructing a target audience identification framework. Repost preference latent Dirichlet allocation (RPLDA) topic model topic model is proposed to understand the mass user online repost preferences toward different contents. A topic-oriented preference metric is proposed to measure the preference degree of individual users. And the function of reposting forecasting is formulated to identify target audience. Findings The empirical research shows the following: a total of 20 percent of the repost users in ESN represent the key active users who are particularly interested in the latent topic of messages in ESN and fits Pareto distribution; and the target audience identification framework can successfully identify different target key users for messages with different latent topics. Practical implications The findings should motivate marketing managers to improve enterprise brand by identifying key target audience in ESN and marketing in a way that truthfully reflects personalized preferences. Originality/value This study runs counter to most current business practices, which tend to use simple popularity to seek important users. Adaptively and dynamically identifying target audience appears to have considerable potential, especially in the rapidly growing area of enterprise social media information service.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Ema Utami ◽  
Irwan Oyong ◽  
Suwanto Raharjo ◽  
Anggit Dwi Hartanto ◽  
Sumarni Adi

PurposeGathering knowledge regarding personality traits has long been the interest of academics and researchers in the fields of psychology and in computer science. Analyzing profile data from personal social media accounts reduces data collection time, as this method does not require users to fill any questionnaires. A pure natural language processing (NLP) approach can give decent results, and its reliability can be improved by combining it with machine learning (as shown by previous studies).Design/methodology/approachIn this, cleaning the dataset and extracting relevant potential features “as assessed by psychological experts” are essential, as Indonesians tend to mix formal words, non-formal words, slang and abbreviations when writing social media posts. For this article, raw data were derived from a predefined dominance, influence, stability and conscientious (DISC) quiz website, returning 316,967 tweets from 1,244 Twitter accounts “filtered to include only personal and Indonesian-language accounts”. Using a combination of NLP techniques and machine learning, the authors aim to develop a better approach and more robust model, especially for the Indonesian language.FindingsThe authors find that employing a SMOTETomek re-sampling technique and hyperparameter tuning boosts the model’s performance on formalized datasets by 57% (as measured through the F1-score).Originality/valueThe process of cleaning dataset and extracting relevant potential features assessed by psychological experts from it are essential because Indonesian people tend to mix formal words, non-formal words, slang words and abbreviations when writing tweets. Organic data derived from a predefined DISC quiz website resulting 1244 records of Twitter accounts and 316.967 tweets.


Sign in / Sign up

Export Citation Format

Share Document