ACV-tree: A New Method for Sentence Similarity Modeling

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/575 ◽

2018 ◽

Cited By ~ 7

Author(s):

Yuquan Le ◽

Zhi-Jie Wang ◽

Zhe Quan ◽

Jiawei He ◽

Bin Yao

Keyword(s):

Language Processing ◽

Semantic Information ◽

State Of The Art ◽

Word Embeddings ◽

The Core ◽

Tree Kernel ◽

Network Methods ◽

Syntactic Information ◽

Sentence Similarity ◽

Attention Weight

Sentence similarity modeling lies at the core of many natural language processing applications, and thus has received much attention. Owing to the success of word embeddings, recently, popular neural network methods have achieved sentence embedding, obtaining attractive performance. Nevertheless, most of them focused on learning semantic information and modeling it as a continuous vector, while the syntactic information of sentences has not been fully exploited. On the other hand, prior works have shown the benefits of structured trees that include syntactic information, while few methods in this branch utilized the advantages of word embeddings and another powerful technique ? attention weight mechanism. This paper makes the first attempt to absorb their advantages by merging these techniques in a unified structure, dubbed as ACV-tree. Meanwhile, this paper develops a new tree kernel, known as ACVT kernel, that is tailored for sentence similarity measure based on the proposed structure. The experimental results, based on 19 widely-used datasets, demonstrate that our model is effective and competitive, compared against state-of-the-art models.

Download Full-text

Learning emotional word embeddings for sentiment analysis

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201993 ◽

2021 ◽

pp. 1-13

Author(s):

Qingtian Zeng ◽

Xishi Zhao ◽

Xiaohui Hu ◽

Hua Duan ◽

Zhongying Zhao ◽

...

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

State Of The Art ◽

Research Problem ◽

Emotional Word ◽

Classification Model ◽

Data Sets ◽

Word Embeddings ◽

Real World Data ◽

Text Documents

Word embeddings have been successfully applied in many natural language processing tasks due to its their effectiveness. However, the state-of-the-art algorithms for learning word representations from large amounts of text documents ignore emotional information, which is a significant research problem that must be addressed. To solve the above problem, we propose an emotional word embedding (EWE) model for sentiment analysis in this paper. This method first applies pre-trained word vectors to represent document features using two different linear weighting methods. Then, the resulting document vectors are input to a classification model and used to train a text sentiment classifier, which is based on a neural network. In this way, the emotional polarity of the text is propagated into the word vectors. The experimental results on three kinds of real-world data sets demonstrate that the proposed EWE model achieves superior performances on text sentiment prediction, text similarity calculation, and word emotional expression tasks compared to other state-of-the-art models.

Download Full-text

A recommendations model with multiaspect awareness and hierarchical user-product attention mechanisms

Computer Science and Information Systems ◽

10.2298/csis190925024b ◽

2020 ◽

Vol 17 (3) ◽

pp. 849-865

Author(s):

Zhongqin Bi ◽

Shuming Dou ◽

Zhe Liu ◽

Yongbin Li

Keyword(s):

State Of The Art ◽

Weight Vector ◽

User Preferences ◽

The Other ◽

Attention Networks ◽

Proposed Model ◽

Network Methods ◽

Public Datasets ◽

Novel Model ◽

Attention Weight

Neural network methods have been trained to satisfactorily learn user/product representations from textual reviews. A representation can be considered as a multiaspect attention weight vector. However, in several existing methods, it is assumed that the user representation remains unchanged even when the user interacts with products having diverse characteristics, which leads to inaccurate recommendations. To overcome this limitation, this paper proposes a novel model to capture the varying attention of a user for different products by using a multilayer attention framework. First, two individual hierarchical attention networks are used to encode the users and products to learn the user preferences and product characteristics from review texts. Then, we design an attention network to reflect the adaptive change in the user preferences for each aspect of the targeted product in terms of the rating and review. The results of experiments performed on three public datasets demonstrate that the proposed model notably outperforms the other state-of-the-art baselines, thereby validating the effectiveness of the proposed approach.

Download Full-text

Specializing Word Embeddings (for Parsing) by Information Bottleneck (Extended Abstract)

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/658 ◽

2020 ◽

Author(s):

Xiang Lisa Li ◽

Jason Eisner

Keyword(s):

Dimensionality Reduction ◽

Semantic Information ◽

State Of The Art ◽

Word Embedding ◽

Discrete Version ◽

Word Embeddings ◽

Continuous Version ◽

Continuous Vector ◽

Information Bottleneck ◽

Art Performance

Pre-trained word embeddings like ELMo and BERT contain rich syntactic and semantic information, resulting in state-of-the-art performance on various tasks. We propose a very fast variational information bottleneck (VIB) method to nonlinearly compress these embeddings, keeping only the information that helps a discriminative parser. We compress each word embedding to either a discrete tag or a continuous vector. In the discrete version, our automatically compressed tags form an alternative tag set: we show experimentally that our tags capture most of the information in traditional POS tag annotations, but our tag sequences can be parsed more accurately at the same level of tag granularity. In the continuous version, we show experimentally that moderately compressing the word embeddings by our method yields a more accurate parser in 8 of 9 languages, unlike simple dimensionality reduction.

Download Full-text

AutoExtend: Combining Word Embeddings with Semantic Resources

Computational Linguistics ◽

10.1162/coli_a_00294 ◽

2017 ◽

Vol 43 (3) ◽

pp. 593-617 ◽

Cited By ~ 4

Author(s):

Sascha Rothe ◽

Hinrich Schütze

Keyword(s):

Semantic Information ◽

State Of The Art ◽

Word Sense Disambiguation ◽

Input Word ◽

Word Sense ◽

Word Embeddings ◽

Training Corpus ◽

Context Similarity ◽

Sense Disambiguation ◽

Semantic Resources

We present AutoExtend, a system that combines word embeddings with semantic resources by learning embeddings for non-word objects like synsets and entities and learning word embeddings that incorporate the semantic information from the resource. The method is based on encoding and decoding the word embeddings and is flexible in that it can take any word embeddings as input and does not need an additional training corpus. The obtained embeddings live in the same vector space as the input word embeddings. A sparse tensor formalization guarantees efficiency and parallelizability. We use WordNet, GermaNet, and Freebase as semantic resources. AutoExtend achieves state-of-the-art performance on Word-in-Context Similarity and Word Sense Disambiguation tasks.

Download Full-text

A Hierarchical Multi-Task Approach for Learning Embeddings from Semantic Tasks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016949 ◽

2019 ◽

Vol 33 ◽

pp. 6949-6956 ◽

Cited By ~ 8

Author(s):

Victor Sanh ◽

Thomas Wolf ◽

Sebastian Ruder

Keyword(s):

Language Processing ◽

Semantic Information ◽

State Of The Art ◽

Named Entity Recognition ◽

Relation Extraction ◽

Entity Recognition ◽

Inductive Bias ◽

Named Entity ◽

Task Learning ◽

Hidden States

Much effort has been devoted to evaluate whether multi-task learning can be leveraged to learn rich representations that can be used in various Natural Language Processing (NLP) down-stream applications. However, there is still a lack of understanding of the settings in which multi-task learning has a significant effect. In this work, we introduce a hierarchical model trained in a multi-task learning setup on a set of carefully selected semantic tasks. The model is trained in a hierarchical fashion to introduce an inductive bias by supervising a set of low level tasks at the bottom layers of the model and more complex tasks at the top layers of the model. This model achieves state-of-the-art results on a number of tasks, namely Named Entity Recognition, Entity Mention Detection and Relation Extraction without hand-engineered features or external NLP tools like syntactic parsers. The hierarchical training supervision induces a set of shared semantic representations at lower layers of the model. We show that as we move from the bottom to the top layers of the model, the hidden states of the layers tend to represent more complex semantic information.

Download Full-text

A Causal Inference Method for Reducing Gender Bias in Word Embedding Relations

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6486 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9434-9441

Author(s):

Zekun Yang ◽

Juan Feng

Keyword(s):

Gender Bias ◽

Language Processing ◽

State Of The Art ◽

Word Embedding ◽

Coreference Resolution ◽

Word Embeddings ◽

Inference Method ◽

Sentence Level ◽

Statistical Dependency ◽

And Gender

Word embedding has become essential for natural language processing as it boosts empirical performances of various tasks. However, recent research discovers that gender bias is incorporated in neural word embeddings, and downstream tasks that rely on these biased word vectors also produce gender-biased results. While some word-embedding gender-debiasing methods have been developed, these methods mainly focus on reducing gender bias associated with gender direction and fail to reduce the gender bias presented in word embedding relations. In this paper, we design a causal and simple approach for mitigating gender bias in word vector relation by utilizing the statistical dependency between gender-definition word embeddings and gender-biased word embeddings. Our method attains state-of-the-art results on gender-debiasing tasks, lexical- and sentence-level evaluation tasks, and downstream coreference resolution tasks.

Download Full-text

MKA: A Scalable Medical Knowledge-Assisted Mechanism for Generative Models on Medical Conversation Tasks

Computational and Mathematical Methods in Medicine ◽

10.1155/2021/5294627 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Ke Liang ◽

Sifan Wu ◽

Jiayi Gu

Keyword(s):

Language Processing ◽

Input Data ◽

Medical Information ◽

State Of The Art ◽

Medical Knowledge ◽

Generative Models ◽

Specific Knowledge ◽

Typical Application ◽

The Core ◽

Related Information

Using natural language processing (NLP) technologies to develop medical chatbots makes the diagnosis of the patient more convenient and efficient, which is a typical application in healthcare AI. Because of its importance, lots of researches have come out. Recently, the neural generative models have shown their impressive ability as the core of chatbot, while it cannot scale well when directly applied to medical conversation due to the lack of medical-specific knowledge. To address the limitation, a scalable medical knowledge-assisted mechanism (MKA) is proposed in this paper. The mechanism is aimed at assisting general neural generative models to achieve better performance on the medical conversation task. The medical-specific knowledge graph is designed within the mechanism, which contains 6 types of medical-related information, including department, drug, check, symptom, disease, and food. Besides, the specific token concatenation policy is defined to effectively inject medical information into the input data. Evaluation of our method is carried out on two typical medical datasets, MedDG and MedDialog-CN. The evaluation results demonstrate that models combined with our mechanism outperform original methods in multiple automatic evaluation metrics. Besides, MKA-BERT-GPT achieves state-of-the-art performance.

Download Full-text

Audio Captioning with Composition of Acoustic and Semantic Information

International Journal of Semantic Computing ◽

10.1142/s1793351x21400018 ◽

2021 ◽

Vol 15 (02) ◽

pp. 143-160

Author(s):

Ayşegül Özkaya Eren ◽

Mustafa Sert

Keyword(s):

Language Processing ◽

Semantic Information ◽

State Of The Art ◽

Research Area ◽

Audio Features ◽

Audio Clip ◽

Proposed Model ◽

Decoder Architecture ◽

Gated Recurrent Units ◽

New Research

Generating audio captions is a new research area that combines audio and natural language processing to create meaningful textual descriptions for audio clips. To address this problem, previous studies mostly use the encoder–decoder-based models without considering semantic information. To fill this gap, we present a novel encoder–decoder architecture using bi-directional Gated Recurrent Units (BiGRU) with audio and semantic embeddings. We extract semantic embedding by obtaining subjects and verbs from the audio clip captions and combine these embedding with audio embedding to feed the BiGRU-based encoder–decoder model. To enable semantic embeddings for the test audios, we introduce a Multilayer Perceptron classifier to predict the semantic embeddings of those clips. We also present exhaustive experiments to show the efficiency of different features and datasets for our proposed model the audio captioning task. To extract audio features, we use the log Mel energy features, VGGish embeddings, and a pretrained audio neural network (PANN) embeddings. Extensive experiments on two audio captioning datasets Clotho and AudioCaps show that our proposed model outperforms state-of-the-art audio captioning models across different evaluation metrics and using the semantic information improves the captioning performance.

Download Full-text

Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocw180 ◽

2017 ◽

Vol 24 (4) ◽

pp. 813-821 ◽

Cited By ~ 60

Author(s):

Anne Cocos ◽

Alexander G Fiks ◽

Aaron J Masino

Keyword(s):

Neural Network ◽

Social Media ◽

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Recurrent Neural Network ◽

State Of The Art ◽

Detection Performance ◽

Word Embeddings

Abstract Objective Social media is an important pharmacovigilance data source for adverse drug reaction (ADR) identification. Human review of social media data is infeasible due to data quantity, thus natural language processing techniques are necessary. Social media includes informal vocabulary and irregular grammar, which challenge natural language processing methods. Our objective is to develop a scalable, deep-learning approach that exceeds state-of-the-art ADR detection performance in social media. Materials and Methods We developed a recurrent neural network (RNN) model that labels words in an input sequence with ADR membership tags. The only input features are word-embedding vectors, which can be formed through task-independent pretraining or during ADR detection training. Results Our best-performing RNN model used pretrained word embeddings created from a large, non–domain-specific Twitter dataset. It achieved an approximate match F-measure of 0.755 for ADR identification on the dataset, compared to 0.631 for a baseline lexicon system and 0.65 for the state-of-the-art conditional random field model. Feature analysis indicated that semantic information in pretrained word embeddings boosted sensitivity and, combined with contextual awareness captured in the RNN, precision. Discussion Our model required no task-specific feature engineering, suggesting generalizability to additional sequence-labeling tasks. Learning curve analysis showed that our model reached optimal performance with fewer training examples than the other models. Conclusions ADR detection performance in social media is significantly improved by using a contextually aware model and word embeddings formed from large, unlabeled datasets. The approach reduces manual data-labeling requirements and is scalable to large social media datasets.

Download Full-text

Enhancing clinical concept extraction with contextual embeddings

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocz096 ◽

2019 ◽

Vol 26 (11) ◽

pp. 1297-1304 ◽

Cited By ~ 29

Author(s):

Yuqi Si ◽

Jingqi Wang ◽

Hua Xu ◽

Kirk Roberts

Keyword(s):

Language Processing ◽

Semantic Information ◽

Medical Information ◽

State Of The Art ◽

Language Model ◽

The State ◽

Concept Extraction ◽

The Impact ◽

Clinical Concept ◽

Embedding Methods

Abstract Objective Neural network–based representations (“embeddings”) have dramatically advanced natural language processing (NLP) tasks, including clinical NLP tasks such as concept extraction. Recently, however, more advanced embedding methods and representations (eg, ELMo, BERT) have further pushed the state of the art in NLP, yet there are no common best practices for how to integrate these representations into clinical tasks. The purpose of this study, then, is to explore the space of possible options in utilizing these new models for clinical concept extraction, including comparing these to traditional word embedding methods (word2vec, GloVe, fastText). Materials and Methods Both off-the-shelf, open-domain embeddings and pretrained clinical embeddings from MIMIC-III (Medical Information Mart for Intensive Care III) are evaluated. We explore a battery of embedding methods consisting of traditional word embeddings and contextual embeddings and compare these on 4 concept extraction corpora: i2b2 2010, i2b2 2012, SemEval 2014, and SemEval 2015. We also analyze the impact of the pretraining time of a large language model like ELMo or BERT on the extraction performance. Last, we present an intuitive way to understand the semantic information encoded by contextual embeddings. Results Contextual embeddings pretrained on a large clinical corpus achieves new state-of-the-art performances across all concept extraction tasks. The best-performing model outperforms all state-of-the-art methods with respective F1-measures of 90.25, 93.18 (partial), 80.74, and 81.65. Conclusions We demonstrate the potential of contextual embeddings through the state-of-the-art performance these methods achieve on clinical concept extraction. Additionally, we demonstrate that contextual embeddings encode valuable semantic information not accounted for in traditional word representations.

Download Full-text