latent dirichlet allocation model
Recently Published Documents


TOTAL DOCUMENTS

84
(FIVE YEARS 43)

H-INDEX

10
(FIVE YEARS 4)

2022 ◽  
Vol 14 (1) ◽  
pp. 555
Author(s):  
Yuanxiang Peng ◽  
Ping Yin ◽  
Kurt Matzler

This study aims to propose a text mining framework suitable for destination image (DI) research based on UGC (User Generated Content), which combines the LDA (Latent Dirichlet Allocation) model and sentiment analysis method based on custom rules and lexicon to identify and analyze the DI in the emerging ski market. The ski resorts in the host city of the 2022 Winter Olympic Games are selected as a case study. The findings reveal that (1) 9 image attributes, out of which two image attributes have not been identified before in winter destination studies, namely beginner suitability and ticketing service. (2) In the past seven snow seasons, the negative sentiment of tourists has shown a continuous downward trend. The positive sentiment has exhibited a slow upward trend. (3) For tourists from destination countries affected by the Winter Olympic Games, the destination image will be improved when the destination meets their expectations. When the destination cannot meet their expectations, the tourists still believe that the holding of the Winter Olympic will enhance the destination’s situation. The theoretical and managerial implications of these findings are discussed.


2021 ◽  
Vol 26 (6) ◽  
pp. 464-472
Author(s):  
Bo HUANG ◽  
Jiaji JU ◽  
Huan CHEN ◽  
Yimin ZHU ◽  
Jin LIU ◽  
...  

The Product Sensitive Online Dirichlet Allocation model (PSOLDA) proposed in this paper mainly uses the sentiment polarity of topic words in the review text to improve the accuracy of topic evolution. First, we use Latent Dirichlet Allocation (LDA) to obtain the distribution of topic words in the current time window. Second, the word2vec word vector is used as auxiliary information to determine the sentiment polarity and obtain the sentiment polarity distribution of the current topic. Finally, the sentiment polarity changes of the topics in the previous and next time window are mapped to the sentiment factors, and the distribution of topic words in the next time window is controlled through them. The experimental results show that the PSOLDA model decreases the probability distribution by 0.160 1, while Online Twitter LDA only increases by 0.069 9. The topic evolution method that integrates the sentimental information of topic words proposed in this paper is better than the traditional model.


2021 ◽  
Vol 9 (5) ◽  
pp. 558-574
Author(s):  
Kai Wang ◽  
Fuzhi Wang

Abstract The topic recognition for dynamic topic number can realize the dynamic update of super parameters, and obtain the probability distribution of dynamic topics in time dimension, which helps to clear the understanding and tracking of convection text data. However, the current topic recognition model tends to be based on a fixed number of topics K and lacks multi-granularity analysis of subject knowledge. Therefore, it is impossible to deeply perceive the dynamic change of the topic in the time series. By introducing a novel approach on the basis of Infinite Latent Dirichlet allocation model, a topic feature lattice under the dynamic topic number is constructed. In the model, documents, topics and vocabularies are jointly modeled to generate two probability distribution matrices: Documents-topics and topic-feature words. Afterwards, the association intensity is computed between the topic and its feature vocabulary to establish the topic formal context matrix. Finally, the topic feature is induced according to the formal concept analysis (FCA) theory. The topic feature lattice under dynamic topic number (TFL_DTN) model is validated on the real dataset by comparing with the mainstream methods. Experiments show that this model is more in line with actual needs, and achieves better results in semi-automatic modeling of topic visualization analysis.


2021 ◽  
Vol 87 (9) ◽  
pp. 669-681
Author(s):  
Xiaoman Li ◽  
Yanfei Zhong ◽  
Yu Su ◽  
Richen Ye

With the continuous development of high-spatial-resolution ground observation technology, it is now becoming possible to obtain more and more high-resolution images, which provide us with the possibility to understand remote sensing images at the semantic level. Compared with traditional pixel- and object-oriented methods of change detection, scene-change detection can provide us with land use change information at the semantic level, and can thus provide reliable information for urban land use change detection, urban planning, and government management. Most of the current scene-change detection methods are based on the visual-words expression of the bag-of-visual-words model and the single-feature-based latent Dirichlet allocation model. In this article, a scene-change detection method for high-spatial-resolution imagery is proposed based on a multi-feature-fusion latent Dirich- let allocation model. This method combines the spectral, textural, and spatial features of the high-spatial-resolution images, and the final scene expression is realized through the topic features extracted from the more abstract latent Dirichlet allocation model. Post-classification comparison is then used to detect changes in the scene images at different times. A series of experiments demonstrates that, compared with the traditional bag-of-words and topic models, the proposed method can obtain superior scene-change detection results.


2021 ◽  
Vol 7 ◽  
pp. e608
Author(s):  
Sergei Koltcov ◽  
Vera Ignatenko ◽  
Maxim Terpilovskii ◽  
Paolo Rosso

Hierarchical topic modeling is a potentially powerful instrument for determining topical structures of text collections that additionally allows constructing a hierarchy representing the levels of topic abstractness. However, parameter optimization in hierarchical models, which includes finding an appropriate number of topics at each level of hierarchy, remains a challenging task. In this paper, we propose an approach based on Renyi entropy as a partial solution to the above problem. First, we introduce a Renyi entropy-based metric of quality for hierarchical models. Second, we propose a practical approach to obtaining the “correct” number of topics in hierarchical topic models and show how model hyperparameters should be tuned for that purpose. We test this approach on the datasets with the known number of topics, as determined by the human mark-up, three of these datasets being in the English language and one in Russian. In the numerical experiments, we consider three different hierarchical models: hierarchical latent Dirichlet allocation model (hLDA), hierarchical Pachinko allocation model (hPAM), and hierarchical additive regularization of topic models (hARTM). We demonstrate that the hLDA model possesses a significant level of instability and, moreover, the derived numbers of topics are far from the true numbers for the labeled datasets. For the hPAM model, the Renyi entropy approach allows determining only one level of the data structure. For hARTM model, the proposed approach allows us to estimate the number of topics for two levels of hierarchy.


Author(s):  
Jae-Geum Shim ◽  
Kyoung-Ho Ryu ◽  
Sung Hyun Lee ◽  
Eun-Ah Cho ◽  
Yoon Ju Lee ◽  
...  

The COVID-19 pandemic has affected the entire world, resulting in a tremendous change to people’s lifestyles. We investigated the Korean public response to COVID-19 vaccines on social media from 23 February 2021 to 22 March 2021. We collected tweets related to COVID-19 vaccines using the Korean words for “coronavirus” and “vaccines” as keywords. A topic analysis was performed to interpret and classify the tweets, and a sentiment analysis was conducted to analyze public emotions displayed within the retrieved tweets. Out of a total of 13,414 tweets, 3509 were analyzed after preprocessing. Eight topics were extracted using the Latent Dirichlet Allocation model, and the most frequently tweeted topic was vaccine hesitation, consisting of fear, flu, safety of vaccination, time course, and degree of symptoms. The sentiment analysis revealed a similar ratio of positive and negative tweets immediately before and after the commencement of vaccinations, but negative tweets were prominent after the increase in the number of confirmed COVID-19 cases. The public’s anticipation, disappointment, and fear regarding vaccinations are considered to be reflected in the tweets. However, long-term trend analysis will be needed in the future.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Xiujuan Wang ◽  
Yi Sui ◽  
Yuanrui Tao ◽  
Qianqian Zhang ◽  
Jianhua Wei

With the rapid development of the Internet since the beginning of the 21st century, social networks have provided a significant amount of convenience for work, study, and entertainment. Specifically, because of the irreplaceable superiority of social platforms in disseminating information, criminals have thus updated the main methods of social engineering attacks. Detecting abnormal accounts on social networks in a timely manner can effectively prevent the occurrence of malicious Internet events. Different from previous research work, in this work, a method of anomaly detection called Hurst of Interest Distribution is proposed based on the stability of user interest quantifiable from the content of users’ tweets, so as to detect abnormal accounts. In detail, the Latent Dirichlet Allocation model is adopted to classify blog content on Twitter into topics to calculate and obtain the topic distribution of tweets sent by a single user within a period of time. Then, the stability degree of the user’s tweet topic preference is calculated according to the Hurst index to determine whether the account is compromised. Through experiments, the Hurst indexes of normal and abnormal accounts are found to be significantly different, and the detection rate of abnormal accounts using the proposed method can reach up to 97.93%.


2021 ◽  
Author(s):  
Jorge Arturo Lopez

Extraction of topics from large text corpuses helps improve Software Engineering (SE) processes. Latent Dirichlet Allocation (LDA) represents one of the algorithmic tools to understand, search, exploit, and summarize a large corpus of data (documents), and it is often used to perform such analysis. However, calibration of the models is computationally expensive, especially if iterating over a large number of topics. Our goal is to create a simple formula allowing analysts to estimate the number of topics, so that the top X topics include the desired proportion of documents under study. We derived the formula from the empirical analysis of three SE-related text corpuses. We believe that practitioners can use our formula to expedite LDA analysis. The formula is also of interest to theoreticians, as it suggests that different SE text corpuses have similar underlying properties.


Sign in / Sign up

Export Citation Format

Share Document