Constructing Corpora for the Development and Evaluation of Paraphrase Systems

Automatic paraphrasing is an important component in many natural language processing tasks. In this article we present a new parallel corpus with paraphrase annotations. We adopt a definition of paraphrase based on word alignments and show that it yields high inter-annotator agreement. As Kappa is suited to nominal data, we employ an alternative agreement statistic which is appropriate for structured alignment tasks. We discuss how the corpus can be usefully employed in evaluating paraphrase systems automatically (e.g., by measuring precision, recall, and F1) and also in developing linguistically rich paraphrase models based on syntactic structure.

Download Full-text

Towards equivalence links between senses in plWordNet and Princeton WordNet

Lodz Papers in Pragmatics ◽

10.1515/lpp-2017-0002 ◽

2017 ◽

Vol 13 (1) ◽

Cited By ~ 1

Author(s):

Ewa Rudnicka ◽

Francis Bond ◽

Łukasz Grabowski ◽

Maciej Piasecki ◽

Tadeusz Piotrowski

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Learning ◽

Language Processing ◽

Formal Semantic ◽

Parallel Corpus ◽

Bilingual Dictionaries ◽

Application Potential ◽

Princeton Wordnet

AbstractThe paper focuses on the issue of creating equivalence links in the domain of bilingual computational lexicography. The existing interlingual links between plWordNet and Princeton WordNet synsets (sets of synonymous lexical units – lemma and sense pairs) are re-analysed from the perspective of equivalence types as defined in traditional lexicography and translation. Special attention is paid to cognitive and translational equivalents. A proposal of mapping lexical units is presented. Three types of links are defined: super-strong equivalence, strong equivalence and weak implied equivalence. The strong equivalences have a common set of formal, semantic and usage features, with some of their values slightly loosened for strong equivalence. These will be introduced manually by trained lexicographers. The sense-mapping will partly draw on the results of the existing synset mapping. The lexicographers will analyse lists of pairs of synsets linked by interlingual relations such as synonymy, partial synonymy, hyponymy and hypernymy. They will also consult bilingual dictionaries and check translation probabilities in a parallel corpus. The results of the proposed mapping have great application potential in the area of natural language processing, translation and language learning.

Download Full-text

A NEW FORMAL DEFINITION OF LANGUAGE FOR NATURAL LANGUAGE PROCESSING

Proceedings of the 11th Joint International Computer Conference ◽

10.1142/9789812701534_0097 ◽

2005 ◽

Author(s):

Yi WANG ◽

Xiaojing WANG

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Formal Definition ◽

Definition Of

Download Full-text

Tu1276 Improving Case Definition of Crohn's Disease and Ulcerative Colitis in Electronic Medical Records Using Natural Language Processing - a Novel Informatics Approach

Gastroenterology ◽

10.1016/s0016-5085(12)63070-4 ◽

2012 ◽

Vol 142 (5) ◽

pp. S-791 ◽

Cited By ~ 2

Author(s):

Ashwin N. Ananthakrishnan ◽

Tianxi Cai ◽

Su-Chun Cheng ◽

Pei Jun Chen ◽

Guergana Savova ◽

...

Keyword(s):

Ulcerative Colitis ◽

Crohn’S Disease ◽

Natural Language Processing ◽

Crohn's Disease ◽

Natural Language ◽

Electronic Medical Records ◽

Language Processing ◽

Medical Records ◽

Case Definition ◽

Definition Of

Download Full-text

Improving Case Definition of Crohnʼs Disease and Ulcerative Colitis in Electronic Medical Records Using Natural Language Processing

Inflammatory Bowel Diseases ◽

10.1097/mib.0b013e31828133fd ◽

2013 ◽

Vol 19 (7) ◽

pp. 1411-1420 ◽

Cited By ~ 79

Author(s):

Ashwin N. Ananthakrishnan ◽

Tianxi Cai ◽

Guergana Savova ◽

Su-Chun Cheng ◽

Pei Chen ◽

...

Keyword(s):

Ulcerative Colitis ◽

Natural Language Processing ◽

Natural Language ◽

Electronic Medical Records ◽

Language Processing ◽

Medical Records ◽

Case Definition ◽

Definition Of

Download Full-text

ANALISIS DAN IMPLEMENTASI CROSS-LINGUAL SEMANTIC SIMILARITY ANTAR KATA DENGAN METODE POINTWISE MUTUAL INFORMATION

Jurnal Penelitian Pendidikan ◽

10.17509/jpp.v18i1.11056 ◽

2018 ◽

Vol 18 (1) ◽

pp. 18-24

Author(s):

Sri Reski Anita Muhsini

Keyword(s):

Natural Language Processing ◽

Mutual Information ◽

Natural Language ◽

Semantic Similarity ◽

Language Processing ◽

Parallel Corpus ◽

Cross Lingual ◽

Pointwise Mutual Information

Implementasi pengukuran kesamaan semantik memiliki peran yang sangat penting dalam beberapa bidang Natural Language Processing (NLP), dimana hasilnya seringkali dijadikan dasar dalam melakukan task NLP yang lebih lanjut. Salah satu penerapannya yaitu dengan melakukan pengukuran kesamaan semantik multibahasa antar kata. Pengukuran ini dilatarbelakangi oleh suatu masalah dimana saat ini banyak sistem pencarian informasi yang harus berurusan dengan teks atau dokumen multibahasa. Sepasang kata dinyatakan memiliki kesamaan semantik jika pasangan kata tersebut memiliki kesamaan dari sisi makna atau konsep. Pada penelitian ini, diimplementasikan perhitungan kesamaan semantik antar kata pada bahasa yang berbeda yaitu bahasa Inggris dan bahasa Spanyol. Korpus yang digunakan pada penelitian ini yakni Europarl Parallel Corpus pada bahasa Inggris dan bahasa Spanyol. Konteks kata bersumber dari Swadesh list, serta hasil dari kesamaan semantiknya dibandingkan dengan datasetGold Standard SemEval 2017 Crosslingual Semantic Similarity untuk diukur nilai korelasinya. Hasil pengujian yang didapat terlihat bahwa pengukuran metode PMI mampu menghasilkan korelasi sebesar 0,5781 untuk korelasi Pearson dan 0.5762 untuk korelasi Spearman. Dari hasil penelitian dapat disimpulkan bahwa Implementasi pengukuran Crosslingual Semantic Similarity menggunakan metode Pointwise Mutual Information (PMI) mampu menghasilkan korelasi terbaik. Peneliti merekomendasikan pada penelitian selanjutnya dapat dilakukan dengan menggunakan dataset lain untuk membuktikan seberapa efektif metode pengukuran Poitnwise Mutual Information (PMI) dalam mengukur Crosslingual Semantic Similarity antar kata.

Download Full-text

Systematic community of Practice activities evaluation through Natural Language Processing: application to research projects

Artificial intelligence for engineering design analysis and manufacturing ◽

10.1017/s0890060419000076 ◽

2019 ◽

Vol 33 (02) ◽

pp. 160-171

Author(s):

Virginie Goepp ◽

Nada Matta ◽

Emmanuel Caillaud ◽

Françoise Feugeas

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Community Of Practice ◽

Language Processing ◽

Speech Acts ◽

Interaction Matrix ◽

Efficiency Evaluation ◽

Processing Application ◽

Set Up ◽

Definition Of

AbstractCommunity of Practice (CoP) efficiency evaluation is a great deal in research. Indeed, having the possibility to know if a given CoP is successful or not is essential to better manage it over time. The existing approaches for efficiency evaluation are difficult and time-consuming to put into action on real CoPs. They require either to evaluate subjective constructs making the analysis unreliable, either to work out a knowledge interaction matrix that is difficult to set up. However, these approaches build their evaluation on the fact that a CoP is successful if knowledge is exchanged between the members. It is the case if there are some interactions between the actors involved in the CoP. Therefore, we propose to analyze these interactions through the exchanges of emails thanks to Natural Language Processing. Our approach is systematic and semi-automated. It requires the e-mails exchanged and the definition of the speech-acts that will be retrieved. We apply it on a real project-based CoP: the SEPOLBE research project that involves different expertise fields. It allows us to identify the CoP core group and to emphasize learning processes between members with different backgrounds (Microbiology, Electrochemistry and Civil engineering).

Download Full-text

CYK Parsing over Distributed Representations

Algorithms ◽

10.3390/a13100262 ◽

2020 ◽

Vol 13 (10) ◽

pp. 262

Author(s):

Fabio Massimo Zanzotto ◽

Giorgio Satta ◽

Giordano Cristini

Keyword(s):

Neural Networks ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Matrix Multiplication ◽

General Context ◽

Distributed Representations ◽

Syntactic Pattern ◽

Definition Of ◽

Context Free

Parsing is a key task in computer science, with applications in compilers, natural language processing, syntactic pattern matching, and formal language theory. With the recent development of deep learning techniques, several artificial intelligence applications, especially in natural language processing, have combined traditional parsing methods with neural networks to drive the search in the parsing space, resulting in hybrid architectures using both symbolic and distributed representations. In this article, we show that existing symbolic parsing algorithms for context-free languages can cross the border and be entirely formulated over distributed representations. To this end, we introduce a version of the traditional Cocke–Younger–Kasami (CYK) algorithm, called distributed (D)-CYK, which is entirely defined over distributed representations. D-CYK uses matrix multiplication on real number matrices of a size independent of the length of the input string. These operations are compatible with recurrent neural networks. Preliminary experiments show that D-CYK approximates the original CYK algorithm. By showing that CYK can be entirely performed on distributed representations, we open the way to the definition of recurrent layer neural networks that can process general context-free languages.

Download Full-text

Variations in terminology

Terminology ◽

10.1075/term.16.1.02con ◽

2010 ◽

Vol 16 (1) ◽

pp. 30-50 ◽

Cited By ~ 12

Author(s):

Anne Condamines

Keyword(s):

Risk Management ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Language Use ◽

Knowledge Engineering ◽

Field Of Study ◽

New Approach ◽

Individual Variations ◽

Definition Of

The study of variation in terminology came to the fore over the last fifteen years in connection with advances in textual terminology. This new approach to terminology could be a way of improving the management of risk related to language use in the workplace and to contribute to the definition of a “linguistics of the workplace”. As a theoretical field of study, linguistics has hardly found any application in the workplace. Two of its applied branches, however, Sociolinguistics and Natural Language Processing (NLP) are relevant. Both deal with lexical phenomena, — i.e. terminology — sociolinguistics taking into account very subtle inter-individual variations and NLP being more interested in stability in the use. So, taking into account variations in building terminologies could be a means of considering both description and prescription, use and norm. This approach to terminology, which has been made possible thanks to NLP and Knowledge Engineering could be a way of meeting needs in the workplace concerning risk management related to language use.

Download Full-text