term similarity
Recently Published Documents


TOTAL DOCUMENTS

56
(FIVE YEARS 15)

H-INDEX

9
(FIVE YEARS 1)

2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

The traditional frequency based approach to creating multi-document extractive summary ranks sentences based on scores computed by summing up TF*IDF weights of words contained in the sentences. In this approach, TF or term frequency is calculated based on how frequently a term (word) occurs in the input and TF calculated in this way does not take into account the semantic relations among terms. In this paper, we propose methods that exploits semantic term relations for improving sentence ranking and redundancy removal steps of a summarization system. Our proposed summarization system has been tested on DUC 2003 and DUC 2004 benchmark multi-document summarization datasets. The experimental results reveal that performance of our multi-document text summarizer is significantly improved when the distributional term similarity measure is used for finding semantic term relations. Our multi-document text summarizer also outperforms some well known summarization baselines to which it is compared.


2021 ◽  
Vol 11 (4) ◽  
pp. 1130-1131
Author(s):  
Eslam Elsayed Ali Shohda

During the last two decades, programs have been relied on to detect scientific plagiarism, as it is found that some authors use previous published results or scientific ideas without attributing them to their original authors. But the term similarity has been confused with scientific plagiarism. This led to the need for procedures that are not justified by intuition of mind, and led to the difficulty of conducting important research, unnecessary time wasting and difficulties.  


Author(s):  
M. M. Rufai ◽  
A. O. Afolabi ◽  
O. D. Fenwa ◽  
F. A. Ajala

Aims: To evaluate the performance of an Improved Latent Semantic Analysis (ILSA), Latent Semantic Analysis (LSA), Non-Negative Matrix Factorization (NMF) algorithms in an Electronic Assessment Application using metrics, Term Similarity, Precision, Recall and F-measure functions, Mean divergence, Assessment Accuracy and Adequacy in Semantic Representation. Methodology: The three algorithms were separately applied in developing an Electronic Assessment application. One hundred students’ responses to a test question in an introductory artificial intelligence course were used. Their performance was measured based on the following metrics, Term Similarity, Precision, Recall and F-measure functions, Mean divergence and Assessment Accuracy. Results: ILSA outperformed the LSA and NMF with an assessment accuracy of 96.64, mean divergence from manual score of 0.03, and recall, precision and f-measure value of 0.83, 0.85 and 0.87 respectively. Conclusion: The research observed the performance of an improved algorithm ILSA for electronic Assessment of free text document using Adequacy in Semantic Representation, Retrieval Quality and Assessment Accuracy as performance metrics. The results obtained from the experimental designs shows the adequacy of the improved algorithm in semantic representation, better retrieval quality and improved assessment accuracy.


2021 ◽  
pp. 5-9
Author(s):  
D. M. Kulkarni ◽  
◽  
Swapnaja S. Kulkarni ◽  

Computing semantic similarity between two words comes with variety of approaches. This is mainly essential for the applications such as text analysis, text understanding. In traditional system search engines are used to compute the similarity between words. In that search engines are keyword based. There is one drawback that user should know what exactly they are looking for. There are mainly two main approaches for computation namely knowledge based and corpus based approaches. But there is one drawback that these two approaches are not suitable for computing similarity between multi-word expressions. This system provides efficient and effective approach for computing term similarity using semantic network approach. A clustering approach is used in order to improve the accuracy of the semantic similarity. This approach is more efficient than other computing algorithms. This technique can also apply to large scale dataset to compute term similarity.


2020 ◽  
pp. 1-10
Author(s):  
Andrés Rosso-Mateus ◽  
Manuel Montes-y-Gómez ◽  
Paolo Rosso ◽  
Fabio A. González

The similarity between two synsets or concepts is a numeral measure of the degree to which the two objects are alike or not and the similarity measures say the degree of closeness between two synsets or concepts. The similarity or dissimilarity represented by the term proximity. Proximity measures are defined to have values in the interval [0, 1]. Term Similarity, Sentence similarity and Document similarity are the areas of text similarity. Term similarity measures used to measure the similarity between individual tokens and words, Sentence similarity is the similarity between two or more sentences and Document similarity used to measure the similarity between two or more corpora. This paper is the study between Knowledge based, Distribution based and prediction based semantic models and shows how knowledge based methods capturing information and prediction based methods preserving semantic information.


2020 ◽  
Vol 34 (05) ◽  
pp. 8775-8782
Author(s):  
Claudia Schulz ◽  
Damir Juric

A large number of embeddings trained on medical data have emerged, but it remains unclear how well they represent medical terminology, in particular whether the close relationship of semantically similar medical terms is encoded in these embeddings. To date, only small datasets for testing medical term similarity are available, not allowing to draw conclusions about the generalisability of embeddings to the enormous amount of medical terms used by doctors. We present multiple automatically created large-scale medical term similarity datasets and confirm their high quality in an annotation study with doctors. We evaluate state-of-the-art word and contextual embeddings on our new datasets, comparing multiple vector similarity metrics and word vector aggregation techniques. Our results show that current embeddings are limited in their ability to adequately encode medical terms. The novel datasets thus form a challenging new benchmark for the development of medical embeddings able to accurately represent the whole medical terminology.


Sign in / Sign up

Export Citation Format

Share Document