coherence score
Recently Published Documents


TOTAL DOCUMENTS

15
(FIVE YEARS 13)

H-INDEX

1
(FIVE YEARS 0)

Author(s):  
Sujatha Arun Kokatnoor ◽  
Balachandran Krishnan

<p>The main focus of this research is to find the reasons behind the fresh cases of COVID-19 from the public’s perception for data specific to India. The analysis is done using machine learning approaches and validating the inferences with medical professionals. The data processing and analysis is accomplished in three steps. First, the dimensionality of the vector space model (VSM) is reduced with improvised feature engineering (FE) process by using a weighted term frequency-inverse document frequency (TF-IDF) and forward scan trigrams (FST) followed by removal of weak features using feature hashing technique. In the second step, an enhanced K-means clustering algorithm is used for grouping, based on the public posts from Twitter®. In the last step, latent dirichlet allocation (LDA) is applied for discovering the trigram topics relevant to the reasons behind the increase of fresh COVID-19 cases. The enhanced K-means clustering improved Dunn index value by 18.11% when compared with the traditional K-means method. By incorporating improvised two-step FE process, LDA model improved by 14% in terms of coherence score and by 19% and 15% when compared with latent semantic analysis (LSA) and hierarchical dirichlet process (HDP) respectively thereby resulting in 14 root causes for spike in the disease.</p>


Data ◽  
2021 ◽  
Vol 6 (11) ◽  
pp. 117
Author(s):  
Mayur Gaikwad ◽  
Swati Ahirrao ◽  
Shraddha Phansalkar ◽  
Ketan Kotecha

Social media platforms are a popular choice for extremist organizations to disseminate their perceptions, beliefs, and ideologies. This information is generally based on selective reporting and is subjective in content. However, the radical presentation of this disinformation and its outreach on social media leads to an increased number of susceptible audiences. Hence, detection of extremist text on social media platforms is a significant area of research. The unavailability of extremism text datasets is a challenge in online extremism research. The lack of emphasis on classifying extremism text into propaganda, radicalization, and recruitment classes is a challenge. The lack of data validation methods also challenges the accuracy of extremism detection. This research addresses these challenges and presents a seed dataset with a multi-ideology and multi-class extremism text dataset. This research presents the construction of a multi-ideology ISIS/Jihadist White supremacist (MIWS) dataset with recent tweets collected from Twitter. The presented dataset can be employed effectively and importantly to classify extremist text into popular types like propaganda, radicalization, and recruitment. Additionally, the seed dataset is statistically validated with a coherence score of Latent Dirichlet Allocation (LDA) and word mover’s distance using a pretrained Google News vector. The dataset shows effectiveness in its construction with good coherence scores within a topic and appropriate distance measures between topics. This dataset is the first publicly accessible multi-ideology, multi-class extremism text dataset to reinforce research on extremism text detection on social media platforms.


2021 ◽  
Vol 3 (5) ◽  
pp. 1-6
Author(s):  
A. Obi ◽  
E. O. Nwobodo ◽  
U. Dimkpa ◽  
S. O. Maduka ◽  
E. Fintan

This study aimed at assessing the value of heart rate variability (HRV) as a stress indicator before and after a final re-sit exam among healthy sixth grade medical students. Fifty participants were recruited for the study (test group, n = 30; control group, n = 20). Each participant was examined for 5 minutes pre and post exam periods using the Heartmath proprietary protocol. EmWave equipment was used to detect, record and analyze the HR and to plot out the variability in discrete percentages for low, medium and high coherences. Results indicated that mean percentage coherence score was significantly higher in the test group (p < 0.05) at low cardiac coherence domain, but lower (p < 0.05) at the high coherence domain, compared with the control. Coherence score was significantly higher (p < 0.05) after the exam indicating release from stress, as compared to before the examination when stress was observable among the exam candidates. There were no significant gender differences observed in cardiac coherence scores before and after examination. Our findings indicate that HRV is a reliable indicator of real-time exam stress and supports future clinical use of HRV as a non-invasive and simple stress test.


2021 ◽  
Vol 13 (15) ◽  
pp. 8518
Author(s):  
Clint T. Lewis ◽  
Ming-Chien Su

Climate change is an existential threat to small island developing states. Policy coherence aims to create synergies and avoid conflicts between policies. Mainstreaming adaptation across multiple sectors and achieving greater coherence amongst policies is needed. The paper applies qualitative document analysis, content analysis, and expert interviews to examine the degree of coherence between climate-sensitive sector policies in framing climate change adaptation and the adaptation goals outlined in the national development plan and national climate change policies in St. Vincent and the Grenadines (SVG), Grenada, and Saint Lucia. The results indicate that adaptation is not fully integrated into the water, agriculture, coastal zone, and forestry policies. For example, while adaptation was explicitly addressed in Saint Lucia’s water policy, it was not explicitly addressed in SVG’s and Grenada’s water policy. The results show that Saint Lucia has the highest coherence score (93.52) while St. Vincent and the Grenadines has the lowest (91.12). The optimal coherence score that can be possibly obtained is 147, which indicates partial coherence in adaptation mainstreaming in sectoral policies. Expert interviews highlighted problems such as institutional arrangements, a silo approach, funding mechanisms, and policy implementation. Using the knowledge provided by the experts, a seven-step process is proposed to practically achieve policy coherence and operationalize the policies.


2021 ◽  
Vol 6 (1) ◽  
pp. 17
Author(s):  
Kartika Rizqi Nastiti ◽  
Ahmad Fathan Hidayatullah ◽  
Ahmad Rafie Pratama

Before conducting a research project, researchers must find the trends and state of the art in their research field. However, that is not necessarily an easy job for researchers, partly due to the lack of specific tools to filter the required information by time range. This study aims to provide a solution to that problem by performing a topic modeling approach to the scraped data from Google Scholar between 2010 and 2019. We utilized Latent Dirichlet Allocation (LDA) combined with Term Frequency-Indexed Document Frequency (TF-IDF) to build topic models and employed the coherence score method to determine how many different topics there are for each year’s data. We also provided a visualization of the topic interpretation and word distribution for each topic as well as its relevance using word cloud and PyLDAvis. In the future, we expect to add more features to show the relevance and interconnections between each topic to make it even easier for researchers to use this tool in their research projects.


Author(s):  
Uttam Chauhan ◽  
Apurva Shah

A topic model is one of the best stochastic models for summarizing an extensive collection of text. It has accomplished an inordinate achievement in text analysis as well as text summarization. It can be employed to the set of documents that are represented as a bag-of-words, without considering grammar and order of the words. We modeled the topics for Gujarati news articles corpus. As the Gujarati language has a diverse morphological structure and inflectionally rich, Gujarati text processing finds more complexity. The size of the vocabulary plays an important role in the inference process and quality of topics. As the vocabulary size increases, the inference process becomes slower and topic semantic coherence decreases. If the vocabulary size is diminished, then the topic inference process can be accelerated. It may also improve the quality of topics. In this work, the list of suffixes has been prepared that encounters too frequently with words in Gujarati text. The inflectional forms have been reduced to the root words concerning the suffixes in the list. Moreover, Gujarati single-letter words have been eliminated for faster inference and better quality of topics. Experimentally, it has been proved that if inflectional forms are reduced to their root words, then vocabulary length is shrunk to a significant extent. It also caused the topic formation process quicker. Moreover, the inflectional forms reduction and single-letter word removal enhanced the interpretability of topics. The interpretability of topics has been assessed on semantic coherence, word length, and topic size. The experimental results showed improvements in the topical semantic coherence score. Also, the topic size grew notably as the number of tokens assigned to the topics increased.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Hendri Murfi

PurposeThe aim of this research is to develop an eigenspace-based fuzzy c-means method for scalable topic detection.Design/methodology/approachThe eigenspace-based fuzzy c-means (EFCM) combines representation learning and clustering. The textual data are transformed into a lower-dimensional eigenspace using truncated singular value decomposition. Fuzzy c-means is performed on the eigenspace to identify the centroids of each cluster. The topics are provided by transforming back the centroids into the nonnegative subspace of the original space. In this paper, we extend the EFCM method for scalability by using the two approaches, i.e. single-pass and online. We call the developed topic detection methods as oEFCM and spEFCM.FindingsOur simulation shows that both oEFCM and spEFCM methods provide faster running times than EFCM for data sets that do not fit in memory. However, there is a decrease in the average coherence score. For both data sets that fit and do not fit into memory, the oEFCM method provides a tradeoff between running time and coherence score, which is better than spEFCM.Originality/valueThis research produces a scalable topic detection method. Besides this scalability capability, the developed method also provides a faster running time for the data set that fits in memory.


HeartMath technology is an innovative technique to improving emotional mental health. A small-scale study was conducted to investigate the effect of quick coherence technique on psychophysiological coherence, stress, anxiety, depression and feeling state in young adults in India. Present study is in line with the past research done by S.D. Edwards (2016) on ‘Influence of Heartmath Quick Coherence Technique on Psychophysiological Coherence and Feeling States’. Six postgraduate students who complained of stress, anxiety and depression were included in the present study. There were 4 female and 2 males between the age group of 22-28 years. They were screened for stress, anxiety and depression using DAS 21 scale and their physiological coherence score was recorded using emWave system. They were asked to rate their feeling states (anger, sadness, happiness and calmness) before and after each session. Six (twice a week) sessions were conducted using quick coherence technique. In the initial session they were taught quick coherence technique. They were given heart focused breathing exercise and affirmation as a home plan. After six sessions their physiological coherence score was recorded using emWave system. Then they were screened for stress, anxiety and depression using DAS 21 scale. There was a significant improvement in psychophysiological coherence, anxiety, stress and feeling state after quick coherence training. The present study supports the past research done by S.D. Edwards (2016) where the results showed significant changes in psychophysiological coherence, negative feeling states and positive feeling state after five sessions.


2020 ◽  
Author(s):  
Robert Robert ◽  
Pari Delir Haghighi ◽  
Frada Burstein ◽  
Donna Urquhart ◽  
Flavia Cicuttini

BACKGROUND Although personal experiences of low back pain have traditionally been explored through qualitative studies, social media content analysis has the potential to be used to complement these studies by providing deeper understanding of how problems such as pain are perceived by those how have it, and the effect of the contextual variables on individuals and the community. OBJECTIVE The objective of this study was to perform content analysis of tweets for identifying contextual variables of the low back pain (LBP) experience from a first-person perspective to better understand individuals’ beliefs and perceptions. METHODS We analysed 896,867 cleaned tweets about low back pain between 1 January 2014 – 31 December 2018. We tested and compared Latent Dirichlet Allocation (LDA), Dirichlet Multinomial Mixture (DMM), GPU-DMM, Biterm Topic Model (BTM) and Non-negative Matrix factorization (NMF) for identifying topics associated with tweets. A coherence score was determined to identify the best model. RESULTS LDA outperformed all other algorithms resulting in the highest coherence score. The best model was LDA with 60 topics with coherence score 0.562. With input from domain experts, the 60 topics were validated and grouped into 19 contextual categories. “Emotion and Beliefs” had the largest proportion of the total tweets (17.6%), followed by “Physical Activity” (13.85%) and “Daily Life” (9%), while “Food and Drink”, “Weather” and “Not Being Understood” had the least (1.29%, 1.13% and 1.02% respectively). Of the 11 topics within “emotions and beliefs”, 72% had negative sentiment. CONCLUSIONS Using social media allows access to the data from a larger, heterogonous and geographically distributed population which is not possible using traditional qualitative methods that are generally limited to a small population. Individuals may be more inclined to express their feelings and emotions freely on social media sites, where the data is collected in an unsolicited manner, compared to common, rigid data collection methods. A content analysis of tweets identified common themes in the area of low back pain that are consistent with findings from conventional qualitative studies but provide a more granular view of the individuals’ perspectives related to low back pain. This understanding has the potential to assist with developing more effective and personalized models of care to improve treatment outcomes.


Author(s):  
Nguyen Van Ho ◽  
Ho Trung Thanh

Recently, with the growth of technology and the Internet, customers can easily create their opinions and feedbacks about products and services of hotels on websites or social media. This information is stored in textual form, and is a huge source of data to explore. In order to continue developing to meet customers' needs, businesses need to gain customers' insights that customers discuss and concern. In this study, we firstly collected a corpus of 26,482 customer comments and reviews written in English from some e-commerce websites in the hospitality industry. After preprocessing the collected data, our team conducted experiments on this corpus and chose the best number of topics (K) by Coherence Score measurements as input parameters for the model. Finally, experiment on the corpus according to the Latent Dirichlet Allocation (LDA) model with K coefficient to explore the topic. The model results found hidden topics with the corresponding list of keywords, reflecting the issues that customers are interested in. Applying empirical results from the model will support decision making to improve products and services in business as well as in the management and development of businesses in the hotel sector.


Sign in / Sign up

Export Citation Format

Share Document