Query Expansion Using Word Embeddings

In the context of big data and the 4.0 industrial revolution era, enhancing document/information retrieval frameworks efficiency to handle the ever‐growing volume of text data in an ever more digital world is a must. This article describes a double-stage system of document/information retrieval. First, a Lucene-based document retrieval tool is implemented, and a couple of query expansion techniques using a comparable corpus (Wikipedia) and word embeddings are proposed and tested. Second, a retention-fidelity summarization protocol is performed on top of the retrieved documents to create a short, accurate, and fluent extract of a longer retrieved single document (or a set of top retrieved documents). Obtained results show that using word embeddings is an excellent way to achieve higher precision rates and retrieve more accurate documents. Also, obtained summaries satisfy the retention and fidelity criteria of relevant summaries.

Download Full-text

How does Word Embeddings-based Query Expansion Perform in Consumer Health Information Search?

Proceedings of the 10th annual meeting of the Forum for Information Retrieval Evaluation on - FIRE'18 ◽

10.1145/3293339.3293347 ◽

2018 ◽

Author(s):

Hua Yang ◽

Teresa Gonçalves

Keyword(s):

Health Information ◽

Information Search ◽

Query Expansion ◽

Consumer Health Information ◽

Consumer Health ◽

Word Embeddings

Download Full-text

A Study on Ranking Fusion Approaches for the Retrieval of Medical Publications

Information ◽

10.3390/info11020103 ◽

2020 ◽

Vol 11 (2) ◽

pp. 103

Author(s):

Teofan Clipa ◽

Giorgio Maria Di Nunzio

Keyword(s):

Statistical Analysis ◽

Relevance Feedback ◽

Query Expansion ◽

State Of The Art ◽

Word Embeddings ◽

Medical Topic ◽

Medical Publication ◽

Different Types ◽

Text Preprocessing ◽

Better Than

In this work, we compare and analyze a variety of approaches in the task of medical publication retrieval and, in particular, for the Technology Assisted Review (TAR) task. This problem consists in the process of collecting articles that summarize all evidence that has been published regarding a certain medical topic. This task requires long search sessions by experts in the field of medicine. For this reason, semi-automatic approaches are essential for supporting these types of searches when the amount of data exceeds the limits of users. In this paper, we use state-of-the-art models and weighting schemes with different types of preprocessing as well as query expansion (QE) and relevance feedback (RF) approaches in order to study the best combination for this particular task. We also tested word embeddings representation of documents and queries in addition to three different ranking fusion approaches to see if the merged runs perform better than the single models. In order to make our results reproducible, we have used the collection provided by the Conference and Labs Evaluation Forum (CLEF) eHealth tasks. Query expansion and relevance feedback greatly improve the performance while the fusion of different rankings does not perform well in this task. The statistical analysis showed that, in general, the performance of the system does not depend much on the type of text preprocessing but on which weighting scheme is applied.

Download Full-text

Query Expansion Using Word Embeddings

Semantic Search System using Word Embeddings for query expansion

Query Expansion with Local Conceptual Word Embeddings in Microblog Retrieval

Query Expansion based on Word Embeddings and Ontologies for Efficient Information Retrieval

Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering

Deep Neural Networks for Query Expansion Using Word Embeddings

A Prospect-Guided global query expansion strategy using word embeddings

Fast Query Expansion on an Accounting Corpus using Sub-Word Embeddings

An End-to-End Efficient Lucene-Based Framework of Document/Information Retrieval

How does Word Embeddings-based Query Expansion Perform in Consumer Health Information Search?

A Study on Ranking Fusion Approaches for the Retrieval of Medical Publications

Export Citation Format