language models Latest Research Papers

Using Pre-trained Language Model to Enhance Active Learning for Sentence Matching

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3480937 ◽

2022 ◽

Vol 21 (2) ◽

pp. 1-19

Author(s):

Guirong Bai ◽

Shizhu He ◽

Kang Liu ◽

Jun Zhao

Keyword(s):

Active Learning ◽

Language Model ◽

Experimental Results ◽

Language Models ◽

Data Driven ◽

Learning Approach ◽

Sentence Matching ◽

Learning Language

Active learning is an effective method to substantially alleviate the problem of expensive annotation cost for data-driven models. Recently, pre-trained language models have been demonstrated to be powerful for learning language representations. In this article, we demonstrate that the pre-trained language model can also utilize its learned textual characteristics to enrich criteria of active learning. Specifically, we provide extra textual criteria with the pre-trained language model to measure instances, including noise, coverage, and diversity. With these extra textual criteria, we can select more efficient instances for annotation and obtain better results. We conduct experiments on both English and Chinese sentence matching datasets. The experimental results show that the proposed active learning approach can be enhanced by the pre-trained language model and obtain better performance.

Download Full-text

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

ACM Transactions on Computing for Healthcare ◽

10.1145/3458754 ◽

2022 ◽

Vol 3 (1) ◽

pp. 1-23

Author(s):

Yu Gu ◽

Robert Tinn ◽

Hao Cheng ◽

Michael Lucas ◽

Naoto Usuyama ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Fine Tuning ◽

Entity Recognition ◽

Language Models ◽

General Domain ◽

Domain Specific ◽

And Task

Pretraining large neural language models, such as BERT, has led to impressive gains on many natural language processing (NLP) tasks. However, most pretraining efforts focus on general domain corpora, such as newswire and Web. A prevailing assumption is that even domain-specific pretraining can benefit by starting from general-domain language models. In this article, we challenge this assumption by showing that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models. To facilitate this investigation, we compile a comprehensive biomedical NLP benchmark from publicly available datasets. Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks, leading to new state-of-the-art results across the board. Further, in conducting a thorough evaluation of modeling choices, both for pretraining and task-specific fine-tuning, we discover that some common practices are unnecessary with BERT models, such as using complex tagging schemes in named entity recognition. To help accelerate research in biomedical NLP, we have released our state-of-the-art pretrained and task-specific models for the community, and created a leaderboard featuring our BLURB benchmark (short for Biomedical Language Understanding & Reasoning Benchmark) at https://aka.ms/BLURB .

Download Full-text

Detecting The Speaker Language Using CNN Deep Learning Algorithm

Iraqi Journal for Computer Science and Mathematics ◽

10.52866/ijcsm.2022.01.01.005 ◽

2022 ◽

pp. 43-52

Author(s):

Fawziya M. Rammo ◽

Mohammed N. Al-Hamdani

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Open Source ◽

Convolutional Neural Networks ◽

Learning Algorithm ◽

Language Models ◽

Mel Frequency Cepstral Coefficients ◽

Deep Learning Algorithm ◽

Time Frames

Many languages identification (LID) systems rely on language models that use machine learning (ML) approaches, LID systems utilize rather long recording periods to achieve satisfactory accuracy. This study aims to extract enough information from short recording intervals in order to successfully classify the spoken languages under test. The classification process is based on frames of (2-18) seconds where most of the previous LID systems were based on much longer time frames (from 3 seconds to 2 minutes). This research defined and implemented many low-level features using MFCC (Mel-frequency cepstral coefficients), containing speech files in five languages (English. French, German, Italian, Spanish), from voxforge.org an open-source corpus that consists of user-submitted audio clips in various languages, is the source of data used in this paper. A CNN (convolutional Neural Networks) algorithm applied in this paper for classification and the result was perfect, binary language classiﬁcation had an accuracy of 100%, and five languages classiﬁcation with six languages had an accuracy of 99.8%.

Download Full-text

Backpropagation-Based Decoding for Multimodal Machine Translation

Frontiers in Artificial Intelligence ◽

10.3389/frai.2021.736722 ◽

2022 ◽

Vol 4 ◽

Author(s):

Ziyan Yang ◽

Leticia Pinto-Alva ◽

Franck Dernoncourt ◽

Vicente Ordonez

Keyword(s):

Neural Network ◽

Machine Translation ◽

Visual Representations ◽

Training Data ◽

Language Models ◽

Test Time ◽

Single Image ◽

Image Captioning ◽

Paired Data ◽

Multimodal Language

People are able to describe images using thousands of languages, but languages share only one visual world. The aim of this work is to use the learned intermediate visual representations from a deep convolutional neural network to transfer information across languages for which paired data is not available in any form. Our work proposes using backpropagation-based decoding coupled with transformer-based multilingual-multimodal language models in order to obtain translations between any languages used during training. We particularly show the capabilities of this approach in the translation of German-Japanese and Japanese-German sentence pairs, given a training data of images freely associated with text in English, German, and Japanese but for which no single image contains annotations in both Japanese and German. Moreover, we demonstrate that our approach is also generally useful in the multilingual image captioning task when sentences in a second language are available at test time. The results of our method also compare favorably in the Multi30k dataset against recently proposed methods that are also aiming to leverage images as an intermediate source of translations.

Download Full-text

TransDTI: Transformer-Based Language Models for Estimating DTIs and Building a Drug Recommendation Workflow

ACS Omega ◽

10.1021/acsomega.1c05203 ◽

2022 ◽

Author(s):

Yogesh Kalakoti ◽

Shashank Yadav ◽

Durai Sundar

Keyword(s):

Language Models

Download Full-text

Fine Grained Classification of Personal Data Entities with Language Models

10.1145/3493700.3493707 ◽

2022 ◽

Author(s):

Abhinav Nagpal ◽

Riddhiman Dasgupta ◽

Balaji Ganesan

Keyword(s):

Personal Data ◽

Language Models ◽

Fine Grained

Download Full-text

Reaching into the basket of doom: Learning outcomes, discourse and information literacy

Journal of Librarianship and Information Science ◽

10.1177/09610006211067216 ◽

2022 ◽

pp. 096100062110672

Author(s):

Alison Hicks ◽

Annemaree Lloyd

Keyword(s):

Higher Education ◽

Learning Outcomes ◽

Information Literacy ◽

English Language ◽

Ways Of Knowing ◽

Research Programme ◽

Language Models ◽

Learning Goals ◽

Theory Of Practice ◽

Practice Architectures

Learning outcomes form a type of arrangement that holds the practice of information literacy within higher education in place. This paper employs the theory of practice architectures and a discourse analytical approach to examine the learning goals of five recent English-language models of information literacy. Analysis suggests that the practice of information literacy within higher education is composed of 12 common dimensions, which can be grouped into two categories, Mapping and Applying. The Mapping category encompasses learning outcomes that introduce the learner to accepted ways of knowing or what is valued by and how things work within higher education. The Applying category encompasses learning outcomes that encourage the learner to implement or integrate ideas into their own practice, including to their own questions, to themselves or to their experience. Revealing what is prioritised as well as what is less valued within the field at the present time, these findings also raise questions about supposed epistemological differences between models, the influence of research, and the language employed within these documents. This paper represents the third and final piece of work in a research programme that is interrogating the epistemological premises and discourses of information literacy within higher education.

Download Full-text

Deep Transfer Learning & Beyond: Transformer Language Models in Information Systems Research

ACM Computing Surveys ◽

10.1145/3505245 ◽

2022 ◽

Author(s):

Ross Gruetzemacher ◽

David Paradice

Keyword(s):

Text Mining ◽

Language Processing ◽

User Interfaces ◽

Recent Progress ◽

Language Models ◽

Is Research ◽

Systems Research ◽

Information Systems Research ◽

Wide Range ◽

Language User

AI is widely thought to be poised to transform business, yet current perceptions of the scope of this transformation may be myopic. Recent progress in natural language processing involving transformer language models (TLMs) offers a potential avenue for AI-driven business and societal transformation that is beyond the scope of what most currently foresee. We review this recent progress as well as recent literature utilizing text mining in top IS journals to develop an outline for how future IS research can benefit from these new techniques. Our review of existing IS literature reveals that suboptimal text mining techniques are prevalent and that the more advanced TLMs could be applied to enhance and increase IS research involving text data, and to enable new IS research topics, thus creating more value for the research community. This is possible because these techniques make it easier to develop very powerful custom systems and their performance is superior to existing methods for a wide range of tasks and applications. Further, multilingual language models make possible higher quality text analytics for research in multiple languages. We also identify new avenues for IS research, like language user interfaces, that may offer even greater potential for future IS research.

Download Full-text

Analysis of the Full-Size Russian Corpus of Internet Drug Reviews with Complex NER Labeling Using Deep Learning Neural Networks and Language Models

Applied Sciences ◽

10.3390/app12010491 ◽

2022 ◽

Vol 12 (1) ◽

pp. 491

Author(s):

Alexander Sboev ◽

Sanna Sboeva ◽

Ivan Moloshnikov ◽

Artem Gryaznov ◽

Roman Rybka ◽

...

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Recognition Accuracy ◽

Relation Extraction ◽

Russian Language ◽

Entity Recognition ◽

Language Models ◽

Drug Event ◽

Full Size ◽

The Russian Language

The paper presents the full-size Russian corpus of Internet users’ reviews on medicines with complex named entity recognition (NER) labeling of pharmaceutically relevant entities. We evaluate the accuracy levels reached on this corpus by a set of advanced deep learning neural networks for extracting mentions of these entities. The corpus markup includes mentions of the following entities: medication (33,005 mentions), adverse drug reaction (1778), disease (17,403), and note (4490). Two of them—medication and disease—include a set of attributes. A part of the corpus has a coreference annotation with 1560 coreference chains in 300 documents. A multi-label model based on a language model and a set of features has been developed for recognizing entities of the presented corpus. We analyze how the choice of different model components affects the entity recognition accuracy. Those components include methods for vector representation of words, types of language models pre-trained for the Russian language, ways of text normalization, and other pre-processing methods. The sufficient size of our corpus allows us to study the effects of particularities of annotation and entity balancing. We compare our corpus to existing ones by the occurrences of entities of different types and show that balancing the corpus by the number of texts with and without adverse drug event (ADR) mentions improves the ADR recognition accuracy with no notable decline in the accuracy of detecting entities of other types. As a result, the state of the art for the pharmacological entity extraction task for the Russian language is established on a full-size labeled corpus. For the ADR entity type, the accuracy achieved is 61.1% by the F1-exact metric, which is on par with the accuracy level for other language corpora with similar characteristics and ADR representativeness. The accuracy of the coreference relation extraction evaluated on our corpus is 71%, which is higher than the results achieved on the other Russian-language corpora.

Download Full-text

SignalP 6.0 predicts all five types of signal peptides using protein language models

Nature Biotechnology ◽

10.1038/s41587-021-01156-3 ◽

2022 ◽

Author(s):

Felix Teufel ◽

José Juan Almagro Armenteros ◽

Alexander Rosenberg Johansen ◽

Magnús Halldór Gíslason ◽

Silas Irby Pihl ◽

...

Keyword(s):

Machine Learning ◽

Amino Acid ◽

Sequence Data ◽

Amino Acid Sequences ◽

Language Models ◽

Metagenomic Data ◽

Signal Peptides ◽

Machine Learning Model ◽

Living Organisms ◽

Control Protein

AbstractSignal peptides (SPs) are short amino acid sequences that control protein secretion and translocation in all living organisms. SPs can be predicted from sequence data, but existing algorithms are unable to detect all known types of SPs. We introduce SignalP 6.0, a machine learning model that detects all five SP types and is applicable to metagenomic data.

Download Full-text

language models
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Using Pre-trained Language Model to Enhance Active Learning for Sentence Matching

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

Detecting The Speaker Language Using CNN Deep Learning Algorithm

Backpropagation-Based Decoding for Multimodal Machine Translation

TransDTI: Transformer-Based Language Models for Estimating DTIs and Building a Drug Recommendation Workflow

Fine Grained Classification of Personal Data Entities with Language Models

Reaching into the basket of doom: Learning outcomes, discourse and information literacy

Deep Transfer Learning & Beyond: Transformer Language Models in Information Systems Research

Analysis of the Full-Size Russian Corpus of Internet Drug Reviews with Complex NER Labeling Using Deep Learning Neural Networks and Language Models

SignalP 6.0 predicts all five types of signal peptides using protein language models

Export Citation Format

language modelsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Using Pre-trained Language Model to Enhance Active Learning for Sentence Matching

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

Detecting The Speaker Language Using CNN Deep Learning Algorithm

Backpropagation-Based Decoding for Multimodal Machine Translation

TransDTI: Transformer-Based Language Models for Estimating DTIs and Building a Drug Recommendation Workflow

Fine Grained Classification of Personal Data Entities with Language Models

Reaching into the basket of doom: Learning outcomes, discourse and information literacy

Deep Transfer Learning & Beyond: Transformer Language Models in Information Systems Research

Analysis of the Full-Size Russian Corpus of Internet Drug Reviews with Complex NER Labeling Using Deep Learning Neural Networks and Language Models

SignalP 6.0 predicts all five types of signal peptides using protein language models

language models
Recently Published Documents