semantic similarity measurement
Recently Published Documents


TOTAL DOCUMENTS

65
(FIVE YEARS 17)

H-INDEX

10
(FIVE YEARS 1)

2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Meijing Li ◽  
Tianjie Chen ◽  
Keun Ho Ryu ◽  
Cheng Hao Jin

Semantic mining is always a challenge for big biomedical text data. Ontology has been widely proved and used to extract semantic information. However, the process of ontology-based semantic similarity calculation is so complex that it cannot measure the similarity for big text data. To solve this problem, we propose a parallelized semantic similarity measurement method based on Hadoop MapReduce for big text data. At first, we preprocess and extract the semantic features from documents. Then, we calculate the document semantic similarity based on ontology network structure under MapReduce framework. Finally, based on the generated semantic document similarity, document clusters are generated via clustering algorithms. To validate the effectiveness, we use two kinds of open datasets. The experimental results show that the traditional methods can hardly work for more than ten thousand biomedical documents. The proposed method keeps efficient and accurate for big dataset and is of high parallelism and scalability.


2021 ◽  
Author(s):  
Kimia Zandbiglari ◽  
Farhad Ameri ◽  
Mohammad Javadi

Abstract The unstructured data available on the websites of manufacturing suppliers can provide useful insights into the technological and organizational capabilities of manufacturers. However, since the data is often represented in an unstructured form using natural language text, it is difficult to efficiently search and analyze the capability data and learn from it. The objective of this work is to propose a set of text analytics techniques to enable automated classification and ranking of suppliers based on their capability narratives. The supervised classification and semantic similarity measurement methods used in this research are supported by a formal thesaurus that uses SKOS (Simple Knowledge Organization System) for its syntax and semantics. Normalized Google Distance (NGD) was used as a metric for measuring the relatedness of terms. The proposed framework was validated experimentally using a hypothetical search scenario. The results indicate that the generated ranked list shows a high correlation with human judgment specially if the query concept vector and supplier concept vector belong to the same class. However, the correlation decreases when multiple overlapping classes of suppliers are mixed together. The findings of this research can be used to improve the precision and reliability of Capability Language Processing (CLP) tools and methods.


2021 ◽  
pp. 1-12
Author(s):  
Fuqiang Zhao ◽  
Zhengyu Zhu ◽  
Ping Han

To measure semantic similarity between words, a novel model DFRVec that encodes multiple semantic information of a word in WordNet into a vector space is presented in this paper. Firstly, three different sub-models are proposed: 1) DefVec: encoding the definitions of a word in WordNet; 2) FormVec: encoding the part-of-speech (POS) of a word in WordNet; 3) RelVec: encoding the relations of a word in WordNet. Then by combining the three sub-models with an existing word embedding, the new model for generating the vector of a word is proposed. Finally, based on DFRVec and the path information in WordNet, a new method DFRVec+Path to measure semantic similarity between words is presented. The experiments on ten benchmark datasets show that DFRVec+Path can outperform many existing methods on semantic similarity measurement.


2021 ◽  
Vol 80 ◽  
pp. 103526
Author(s):  
Liangang Wang ◽  
Feng Zhang ◽  
Zhenhong Du ◽  
Yongpei Chen ◽  
Chuanrong Zhang ◽  
...  

2021 ◽  
pp. 192-203
Author(s):  
Jorge Martinez-Gil ◽  
Riad Mokadem ◽  
Josef Küng ◽  
Abdelkader Hameurlain

Electronics ◽  
2020 ◽  
Vol 9 (12) ◽  
pp. 2125
Author(s):  
Xiaoyu Wu ◽  
Tiantian Wang ◽  
Shengjin Wang

Text-video retrieval tasks face a great challenge in the semantic gap between cross modal information. Some existing methods transform the text or video into the same subspace to measure their similarity. However, this kind of method does not consider adding a semantic consistency constraint when associating the two modalities of semantic encoding, and the associated result is poor. In this paper, we propose a multi-modal retrieval algorithm based on semantic association and multi-task learning. Firstly, the multi-level features of video or text are extracted based on multiple deep learning networks, so that the information of the two modalities can be fully encoded. Then, in the public feature space where the two modalities information are mapped together, we propose a semantic similarity measurement and semantic consistency classification based on text-video features for a multi-task learning framework. With the semantic consistency classification task, the learning of semantic association task is restrained. So multi-task learning guides the better feature mapping of two modalities and optimizes the construction of unified feature subspace. Finally, the experimental results of our proposed algorithm on the Microsoft Video Description dataset (MSVD) and MSR-Video to Text (MSR-VTT) are better than the existing research, which prove that our algorithm can improve the performance of cross-modal retrieval.


2020 ◽  
Vol 19 (04) ◽  
pp. 2050033
Author(s):  
Marwah Alian ◽  
Arafat Awajan

Semantic similarity is the task of measuring relations between sentences or words to determine the degree of similarity or resemblance. Several applications of natural language processing require semantic similarity measurement to achieve good results; these applications include plagiarism detection, text entailment, text summarisation, paraphrasing identification, and information extraction. Many researchers have proposed new methods to measure the semantic similarity of Arabic and English texts. In this research, these methods are reviewed and compared. Results show that the precision of the corpus-based approach exceeds 0.70. The precision of the descriptive feature-based technique is between 0.670 and 0.86, with a Pearson correlation coefficient of over 0.70. Meanwhile, the word embedding technique has a correlation of 0.67, and its accuracy is in the range 0.76–0.80. The best results are achieved by the feature-based approach.


2020 ◽  
Vol 11 (2) ◽  
pp. 88
Author(s):  
Mohammad Nazir Arifin ◽  
Daniel Siahaan

Reusing software has several benefits ranging from reducing cost and risk, accelerating development, and its primary purposes are improving software quality. In the early stage of software development, reusing existing software artifacts may increase the benefit of reusing software because it uses mature artifacts from previous artifacts. One of software artifacts is diagram, and in order to assist the reusing diagram is to find the level of similarity of diagrams. This paper proposes a method for measuring the similarity of the use case diagram using structural and semantic aspects. For structural similarity measurement, Graph Edit Distance is used by transforming each factor and use case into a graph, while for semantic similarity measurement, WordNet, WuPalmer,and Levenshtein were used. The experimentation was conducted on ten datasets from variousprojects. The results of the method were compared with the results of assessments from experts.The measurement of agreement between experts and method was done by using Gwet’s AC1 andPearson correlation coefficient. Measurement results with Gwet’s AC1 diagram similarity are 0,60,which were categorized as “moderate" agreement and the result of measurement with Pearsonis 0.506 which means there is a significant correlation between experts and methods. The resultshowed that the proposed method can be used to find the similarity of the diagram, so finding andreuse of the diagram as a software component can be optimized.


Author(s):  
Tuan Norhafizah Tuan Zakaria ◽  
◽  
Mohd Juzaiddin Ab Aziz ◽  
Mohd Rosmadi Mokhtar ◽  
Saadiyah Darus ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document