Wrap-Up: a Trainable Discourse Module for Information Extraction

Journal of Artificial Intelligence Research ◽

10.1613/jair.68 ◽

1994 ◽

Vol 2 ◽

pp. 131-158 ◽

Cited By ~ 11

Author(s):

S. Soderland ◽

W. Lehnert

Keyword(s):

Machine Learning ◽

Information Extraction ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Lexical Disambiguation ◽

On Line ◽

Speech Tagging ◽

Structured Representation

The vast amounts of on-line text now available have ledto renewed interest in information extraction (IE) systems thatanalyze unrestricted text, producing a structured representation ofselected information from the text. This paper presents a novel approachthat uses machine learning to acquire knowledge for some of the higher level IE processing. Wrap-Up is a trainable IE discourse component that makes intersentential inferences and identifies logicalrelations among information extracted from the text. Previous corpus-based approaches were limited to lower level processing such as part-of-speech tagging, lexical disambiguation, and dictionary construction. Wrap-Up is fully trainable, and not onlyautomatically decides what classifiers are needed, but even derives the featureset for each classifier automatically. Performance equals that of a partially trainable discourse module requiring manual customization for each domain.

Download Full-text

Part-of-Speech Tagging

The Oxford Handbook of Computational Linguistics 2nd edition ◽

10.1093/oxfordhb/9780199573691.013.51 ◽

2017 ◽

Author(s):

Dan Tufiș ◽

Radu Ion

Keyword(s):

Language Processing ◽

Machine Learning Techniques ◽

Computing Power ◽

Data Sparseness ◽

Part Of Speech Tagging ◽

Web Environment ◽

Part Of Speech ◽

Learning Techniques ◽

Lexical Disambiguation ◽

Speech Tagging

One of the fundamental tasks in natural-language processing is the morpho-lexical disambiguation of words occurring in text. Over the last twenty years or so, approaches to part-of-speech tagging based on machine learning techniques have been developed or ported to provide high-accuracy morpho-lexical annotation for an increasing number of languages. Due to recent increases in computing power, together with improvements in tagging technology and the extension of language typologies, part-of-speech tags have become significantly more complex. The need to address multilinguality more directly in the web environment has created a demand for interoperable, harmonized morpho-lexical descriptions across languages. Given the large number of morpho-lexical descriptors for a morphologically complex language, one has to consider ways to avoid the data sparseness threat in standard statistical tagging, yet ensure that full lexicon information is available for each word form in the output. The chapter overviews the current major approaches to part-of-speech tagging.

Download Full-text

USING MACHINE LEARNING TECHNIQUES FOR PART-OF-SPEECH TAGGING IN THE GREEK LANGUAGE

Advances in Informatics ◽

10.1142/9789812793928_0024 ◽

2000 ◽

Cited By ~ 4

Author(s):

GEORGIOS PETASIS ◽

GEORGIOS PALIOURAS ◽

VANGELIS KARKALETSIS ◽

CONSTANTINE D. SPYROPOULOS ◽

ION ANDROUTSOPOULOS

Keyword(s):

Machine Learning ◽

Machine Learning Techniques ◽

Greek Language ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Learning Techniques ◽

Speech Tagging

Download Full-text

Lemmatization for Ancient Greek

Journal of Greek Linguistics ◽

10.1163/15699846-02002001 ◽

2020 ◽

Vol 20 (2) ◽

pp. 179-196

Author(s):

Alessandro Vatri ◽

Barbara McGillivray

Keyword(s):

Machine Learning ◽

Ancient Greek ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Abstract This article presents the result of accuracy tests for currently available Ancient Greek lemmatizers and recently published lemmatized corpora. We ran a blinded experiment in which three highly proficient readers of Ancient Greek evaluated the output of the CLTK lemmatizer, of the CLTK backoff lemmatizer, and of GLEM, together with the lemmatizations offered by the Diorisis corpus and the Lemmatized Ancient Greek Texts repository. The texts chosen for this experiment are Homer, Iliad 1.1–279 and Lysias 7. The results suggest that lemmatization methods using large lexica as well as part-of-speech tagging—such as those employed by the Diorisis corpus and the CLTK backoff lemmatizer—are more reliable than methods that rely more heavily on machine learning and use smaller lexica.

Download Full-text

Part-Of-Speech Tagging and the Recognition of the Korean Unknown-words Based on Machine Learning

The KIPS Transactions PartB ◽

10.3745/kipstb.2011.18b.1.045 ◽

2011 ◽

Vol 18B (1) ◽

pp. 45-50 ◽

Cited By ~ 1

Keyword(s):

Machine Learning ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Unknown Words ◽

Speech Tagging

Download Full-text

Machine Learning

10.1093/oxfordhb/9780199276349.013.0020 ◽

2012 ◽

Author(s):

Raymond J. Mooney

Keyword(s):

Machine Learning ◽

Word Sense Disambiguation ◽

Anaphora Resolution ◽

Syntactic Parsing ◽

Word Sense ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Sense Disambiguation ◽

Training Examples ◽

Speech Tagging

This article introduces the type of symbolic machine learning in which decision trees, rules, or case-based classifiers are induced from supervised training examples. It describes the representation of knowledge assumed by each of these approaches and reviews basic algorithms for inducing such representations from annotated training examples and using the acquired knowledge to classify future instances. Machine learning is the study of computational systems that improve performance on some task with experience. Most machine learning methods concern the task of categorizing examples described by a set of features. These techniques can be applied to learn knowledge required for a variety of problems in computational linguistics ranging from part-of-speech tagging and syntactic parsing to word-sense disambiguation and anaphora resolution. Finally, this article reviews the applications to a variety of these problems, such as morphology, part-of-speech tagging, word-sense disambiguation, syntactic parsing, semantic parsing, information extraction, and anaphora resolution.

Download Full-text

Machine Learning

The Oxford Handbook of Computational Linguistics 2nd edition ◽

10.1093/oxfordhb/9780199573691.013.016 ◽

2018 ◽

Author(s):

Raymond J. Mooney

Keyword(s):

Machine Learning ◽

Computational Linguistics ◽

Word Sense Disambiguation ◽

Anaphora Resolution ◽

Word Sense ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

New Concepts ◽

Training Examples ◽

Speech Tagging

This chapter introduces symbolic machine learning in which decision trees, rules, or case-based classifiers are induced from supervised training examples. It describes the representation of knowledge assumed by each of these approaches and reviews basic algorithms for inducing such representations from annotated training examples and using the acquired knowledge to classify future instances. It also briefly reviews unsupervised learning, in which new concepts are formed from unannotated examples by clustering them into coherent groups. These techniques can be applied to learn knowledge required for a variety of problems in computational linguistics ranging from part-of-speech tagging and syntactic parsing to word sense disambiguation and anaphora resolution. Applications to a variety of these problems are reviewed.

Download Full-text

Part-of-speech tagging based on dictionary and statistical machine learning

2016 35th Chinese Control Conference (CCC) ◽

10.1109/chicc.2016.7554459 ◽

2016 ◽

Cited By ~ 4

Author(s):

Zhonglin Ye ◽

Zhen Jia ◽

Junfu Huang ◽

Hongfeng Yin

Keyword(s):

Machine Learning ◽

Statistical Machine Learning ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Comparison of three machine-learning methods for Thai part-of-speech tagging

ACM Transactions on Asian Language Information Processing ◽

10.1145/568954.568957 ◽

2002 ◽

Vol 1 (2) ◽

pp. 145-158 ◽

Cited By ~ 12

Author(s):

Masaki Murata ◽

Qing Ma ◽

Hitoshi Isahara

Keyword(s):

Machine Learning ◽

Learning Methods ◽

Part Of Speech Tagging ◽

Machine Learning Methods ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Optimal Size-Performance Tradeoffs: Weighing PoS Tagger Models

10.31219/osf.io/azfu2 ◽

2021 ◽

Author(s):

Magnus Jacobsen ◽

Mikkel H. Sørensen ◽

Leon Derczynski

Keyword(s):

Machine Learning ◽

Peak Performance ◽

Optimal Size ◽

Trade Off ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Pos Tagger ◽

The Cost ◽

Speech Tagging

Improvement in machine learning-based NLP performance are often presented with bigger models and more complex code. This presents a trade-off: better scores come at the cost of larger tools; bigger models tend to require more during training and inference time. We present multiple methods for measuring the size of a model, and for comparing this with the model's performance.In a case study over part-of-speech tagging, we then apply these techniques to taggers for eight languages and present a novel analysis identifying which taggers are size-performance optimal. Results indicate that some classical taggers place on the size-performance skyline across languages. Further, although the deep models have highest performance for multiple scores, it is often not the most complex of these that reach peak performance.

Download Full-text

Lexical Rule and Lexicon Effect for Part of Speech Tagging Bahasa Madura

Matrik Jurnal Manajemen Teknik Informatika dan Rekayasa Komputer ◽

10.30812/matrik.v18i1.332 ◽

2018 ◽

Vol 18 (1) ◽

pp. 65-72

Author(s):

Nindian Puspa Dewi ◽

Ubaidi Ubaidi

Keyword(s):

Text Processing ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Speech Tagging ◽

Bahasa Indonesia

POS Tagging adalah dasar untuk pengembangan Text Processing suatu bahasa. Dalam penelitian ini kita meneliti pengaruh penggunaan lexicon dan perubahan morfologi kata dalam penentuan tagset yang tepat untuk suatu kata. Aturan dengan pendekatan morfologi kata seperti awalan, akhiran, dan sisipan biasa disebut sebagai lexical rule. Penelitian ini menerapkan lexical rule hasil learner dengan menggunakan algoritma Brill Tagger. Bahasa Madura adalah bahasa daerah yang digunakan di Pulau Madura dan beberapa pulau lainnya di Jawa Timur. Objek penelitian ini menggunakan Bahasa Madura yang memiliki banyak sekali variasi afiksasi dibandingkan dengan Bahasa Indonesia. Pada penelitian ini, lexicon selain digunakan untuk pencarian kata dasar Bahasa Madura juga digunakan sebagai salah satu tahap pemberian POS Tagging. Hasil ujicoba dengan menggunakan lexicon mencapai akurasi yaitu 86.61% sedangkan jika tidak menggunakan lexicon hanya mencapai akurasi 28.95 %. Dari sini dapat disimpulkan bahwa ternyata lexicon sangat berpengaruh terhadap POS Tagging.

Download Full-text