latent semantic analysis
Recently Published Documents


TOTAL DOCUMENTS

760
(FIVE YEARS 133)

H-INDEX

39
(FIVE YEARS 2)

Author(s):  
Pooja Kherwa ◽  
Poonam Bansal

The Covid-19 pandemic is the deadliest outbreak in our living memory. So, it is need of hour, to prepare the world with strategies to prevent and control the impact of the epidemics. In this paper, a novel semantic pattern detection approach in the Covid-19 literature using contextual clustering and intelligent topic modeling is presented. For contextual clustering, three level weights at term level, document level, and corpus level are used with latent semantic analysis. For intelligent topic modeling, semantic collocations using pointwise mutual information(PMI) and log frequency biased mutual dependency(LBMD) are selected and latent dirichlet allocation is applied. Contextual clustering with latent semantic analysis presents semantic spaces with high correlation in terms at corpus level. Through intelligent topic modeling, topics are improved in the form of lower perplexity and highly coherent. This research helps in finding the knowledge gap in the area of Covid-19 research and offered direction for future research.


The Covid-19 pandemic is the deadliest outbreak in our living memory. So, it is need of hour, to prepare the world with strategies to prevent and control the impact of the epidemics. In this paper, a novel semantic pattern detection approach in the Covid-19 literature using contextual clustering and intelligent topic modeling is presented. For contextual clustering, three level weights at term level, document level, and corpus level are used with latent semantic analysis. For intelligent topic modeling, semantic collocations using pointwise mutual information(PMI) and log frequency biased mutual dependency(LBMD) are selected and latent dirichlet allocation is applied. Contextual clustering with latent semantic analysis presents semantic spaces with high correlation in terms at corpus level. Through intelligent topic modeling, topics are improved in the form of lower perplexity and highly coherent. This research helps in finding the knowledge gap in the area of Covid-19 research and offered direction for future research.


2022 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Michael Kend ◽  
Lan Anh Nguyen

Purpose The purpose of this study is to explore audit procedure disclosures related to key audit risks, during the prior year and the initial year of the COVID-19 outbreak, by reporting on matters published in over 3,000 Australian statutory audit reports during 2019 and 2020. Design/methodology/approach This study partially uses latent semantic analysis methods to apply textual and readability analyses to external audit reports in Australia. The authors measure the tone of the audit reports using the Loughran and McDonald (2011) approach. Findings The authors find that 3% of audit procedures undertaken during 2020 were designed to address audit risks associated with the COVID-19 pandemic. As a percentage of total audit procedures undertaken during 2020, the authors find that smaller practitioners reported much less audit procedures related to COVID-19 audit risks than most larger audit firms. Finally, the textual analysis further found differences in the sentiment or tone of words used by different auditors in 2020, but differences in sentiment or tone were not found when 2020 was compared to the prior year 2019. Originality/value This study provides early evidence on whether auditors designed audit procedures to deal specifically with audit risks that arose due to the COVID-19 pandemic and on the extent and nature of those audit procedures. The study will help policymakers to better understand whether Key Audit Matters provided informational value to investors during a time of global crisis.


Informatica ◽  
2022 ◽  
pp. 1-22
Author(s):  
Pavel Stefanovič ◽  
Olga Kurasova

In this paper, a new approach has been proposed for multi-label text data class verification and adjustment. The approach helps to make semi-automated revisions of class assignments to improve the quality of the data. The data quality significantly influences the accuracy of the created models, for example, in classification tasks. It can also be useful for other data analysis tasks. The proposed approach is based on the combination of the usage of the text similarity measure and two methods: latent semantic analysis and self-organizing map. First, the text data must be pre-processed by selecting various filters to clean the data from unnecessary and irrelevant information. Latent semantic analysis has been selected to reduce the vectors dimensionality of the obtained vectors that correspond to each text from the analysed data. The cosine similarity distance has been used to determine which of the multi-label text data class should be changed or adjusted. The self-organizing map has been selected as the key method to detect similarity between text data and make decisions for a new class assignment. The experimental investigation has been performed using the newly collected multi-label text data. Financial news data in the Lithuanian language have been collected from four public websites and classified by experts into ten classes manually. Various parameters of the methods have been analysed, and the influence on the final results has been estimated. The final results are validated by experts. The research proved that the proposed approach could be helpful to verify and adjust multi-label text data classes. 82% of the correct assignments are obtained when the data dimensionality is reduced to 40 using the latent semantic analysis, and the self-organizing map size is reduced from 40 to 5 by step 5.


Author(s):  
Manuel-Alejandro Sánchez-Fernández ◽  
Alfonso Medina-Urrea ◽  
Juan-Manuel Torres-Moreno

The present work aims to study the relationship between measures, obtained from Latent Semantic Analysis (LSA) and a variant known as SPAN, and activation and identifiability states (Informative States) of referents in noun phrases present in journalistic notes from Northwestern Mexican news outlets written in Spanish. The aim and challenge is to find a strategy to achieve labelling of new / given information in the discourse rooted in a theoretically linguistic stance. The new / given distinction can be defined from different perspectives in which it varies what linguistic forms are taken into account. Thus, the focus in this work is to work with full referential devices (n = 2 388). Pearson’s R correlation tests, analysis of variance, graphical exploration of the clustering of labels, and a classification experiment with random forests are performed. For the experiment, two groups were used: noun phrases labeled with all 10 tags of informative states and a binary labelling, as well as the use of two bags-of-words for each noun phrase: the interior and the exterior. It was found that using LSA in conjunction with the inner bag of words can be used to classify certain informational states. This same measure showed good results for the binary division, detecting which sentences introduce new referents in discourse. In previous work using a similar method in noun phrases in English, 80% accuracy (n = 478) was reached in their classification exercise. Our best test for Spanish reached 79%. No work on Spanish using this method has been done before and this kind of experiment is important because Spanish exhibits a more complex inflectional morphology.


2021 ◽  
Vol 11 (24) ◽  
pp. 11897
Author(s):  
Quanying Cheng ◽  
Yunqiang Zhu ◽  
Jia Song ◽  
Hongyun Zeng ◽  
Shu Wang ◽  
...  

Geospatial data is an indispensable data resource for research and applications in many fields. The technologies and applications related to geospatial data are constantly advancing and updating, so identifying the technologies and applications among them will help foster and fund further innovation. Through topic analysis, new research hotspots can be discovered by understanding the whole development process of a topic. At present, the main methods to determine topics are peer review and bibliometrics, however they just review relevant literature or perform simple frequency analysis. This paper proposes a new topic discovery method, which combines a word embedding method, based on a pre-trained model, Bert, and a spherical k-means clustering algorithm, and applies the similarity between literature and topics to assign literature to different topics. The proposed method was applied to 266 pieces of literature related to geospatial data over the past five years. First, according to the number of publications, the trend analysis of technologies and applications related to geospatial data in several leading countries was conducted. Then, the consistency of the proposed method and the existing method PLSA (Probabilistic Latent Semantic Analysis) was evaluated by using two similar consistency evaluation indicators (i.e., U-Mass and NMPI). The results show that the method proposed in this paper can well reveal text content, determine development trends, and produce more coherent topics, and that the overall performance of Bert-LSA is better than PLSA using NPMI and U-Mass. This method is not limited to trend analysis using the data in this paper; it can also be used for the topic analysis of other types of texts.


2021 ◽  
pp. 999-1007
Author(s):  
Saicharan Gadamshetti ◽  
Gerard Deepak ◽  
A. Santhanavijayan ◽  
K. R. Venugopal

Computers ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 145
Author(s):  
Yassine Lemmou ◽  
Jean-Louis Lanet ◽  
El Mamoun Souidi

During recent years, many papers have been published on ransomware, but to the best of our knowledge, no previous academic studies have been conducted on ransom note files. In this paper, we present the results of a depth study on filenames and the content of ransom files. We propose a prototype to identify the ransom files. Then we explore how the filenames and the content of these files can minimize the risk of ransomware encryption of some specified ransomware or increase the effectiveness of some ransomware detection tools. To achieve these objectives, two approaches are discussed in this paper. The first uses Latent Semantic Analysis (LSA) to check similarities between the contents of files. The second uses some Machine Learning models to classify the filenames into two classes—ransom filenames and benign filenames.


2021 ◽  
Vol 17 (10) ◽  
pp. 960-970
Author(s):  
Ahmed Adil Nafea ◽  
Nazlia Omar ◽  
Mohammed M. AL-Ani

Sign in / Sign up

Export Citation Format

Share Document