Accuracy errors in post-edited output, based on Korean-English parallel corpus for AI training.

Jagyeong Kim;

doi:10.20305/it202103029058

Translation of Legal Terms: Bilingual Dictionary vs Parallel Corpus

Science and Education a New Dimension ◽

10.31174/send-ph2020-225viii67-10 ◽

2020 ◽

Vol VIII(225) (67) ◽

pp. 46-49

Author(s):

S. A. Matvieieva

Keyword(s):

Parallel Corpus ◽

Bilingual Dictionary

Download Full-text

English-Dogri Translation System using MOSES

Circulation in Computer Science ◽

10.22632/ccs-2016-251-25 ◽

2016 ◽

Vol 1 (1) ◽

pp. 45-49

Author(s):

Avinash Singh ◽

Asmeet Kour ◽

Shubhnandan S. Jamwal

Keyword(s):

Natural Language Processing ◽

Machine Translation ◽

Language Processing ◽

Statistical Machine Translation ◽

Translation System ◽

Parallel Corpus ◽

English System ◽

Machine Translation System ◽

Translation Machine ◽

Language Pair

The objective behind this paper is to analyze the English-Dogri parallel corpus translation. Machine translation is the translation from one language into another language. Machine translation is the biggest application of the Natural Language Processing (NLP). Moses is statistical machine translation system allow to train translation models for any language pair. We have developed translation system using Statistical based approach which helps in translating English to Dogri and vice versa. The parallel corpus consists of 98,973 sentences. The system gives accuracy of 80% in translating English to Dogri and the system gives accuracy of 87% in translating Dogri to English system.

Download Full-text

Building an Italian-Chinese Parallel Corpus for Machine Translation from the Web

Proceedings of the 6th EAI International Conference on Smart Objects and Technologies for Social Good ◽

10.1145/3411170.3411258 ◽

2020 ◽

Author(s):

Rita Tse ◽

Silvia Mirri ◽

Su-Kit Tang ◽

Giovanni Pau ◽

Paola Salomoni

Keyword(s):

Machine Translation ◽

Parallel Corpus ◽

The Web

Download Full-text

The Design of E-C Parallel Corpus in Marine Environment and the Translation Difficulties of World Ocean Review

IOP Conference Series Earth and Environmental Science ◽

10.1088/1755-1315/693/1/012055 ◽

2021 ◽

Vol 693 (1) ◽

pp. 012055

Author(s):

Fei Yuan

Keyword(s):

Marine Environment ◽

World Ocean ◽

Parallel Corpus

Download Full-text

An English-translated parallel corpus for the CJK Wikipedia collections

Proceedings of the Seventeenth Australasian Document Computing Symposium on - ADCS '12 ◽

10.1145/2407085.2407099 ◽

2012 ◽

Cited By ~ 1

Author(s):

Ling-Xiang Tang ◽

Shlomo Geva ◽

Andrew Trotman

Keyword(s):

Parallel Corpus

Download Full-text

The development and use of Russian-Chinese parallel corpus

Automatic Documentation and Mathematical Linguistics ◽

10.3103/s0005105515020077 ◽

2015 ◽

Vol 49 (2) ◽

pp. 65-75

Author(s):

Y. Tao ◽

V. P. Zakharov

Keyword(s):

Parallel Corpus

Download Full-text

Analyzing Subword Techniques to Improve English to Sinhala Neural Machine Translation

International Journal of Asian Language Processing ◽

10.1142/s2717554520500174 ◽

2021 ◽

pp. 2050017

Author(s):

Rashmini Naranpanawa ◽

Ravinga Perera ◽

Thilakshi Fonseka ◽

Uthayasanker Thayasivam

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Translation System ◽

Rare Word ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Low Resource ◽

Word Level ◽

Morphologically Rich Languages

Neural machine translation (NMT) is a remarkable approach which performs much better than the Statistical machine translation (SMT) models when there is an abundance of parallel corpus. However, vanilla NMT is primarily based upon word-level with a fixed vocabulary. Therefore, low resource morphologically rich languages such as Sinhala are mostly affected by the out of vocabulary (OOV) and Rare word problems. Recent advancements in subword techniques have opened up opportunities for low resource communities by enabling open vocabulary translation. In this paper, we extend our recently published state-of-the-art EN-SI translation system using the transformer and explore standard subword techniques on top of it to identify which subword approach has a greater effect on English Sinhala language pair. Our models demonstrate that subword segmentation strategies along with the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus.

Download Full-text

Learning Chinese political formulaic phraseology from a self-built bilingual united nations security council corpus

Babel ◽

10.1075/babel.00233.wu ◽

2021 ◽

Author(s):

Baimei Wu ◽

Andrew K.F. Cheung ◽

Jie Xing

Keyword(s):

Pilot Study ◽

Literature Review ◽

United Nations ◽

Security Council ◽

Political Discourse ◽

United Nations Security Council ◽

Retrieval Process ◽

Methodological Issues ◽

Parallel Corpus ◽

Simultaneous Interpreting

Abstract This pilot study investigates the formulaic phraseology most frequently used in highly formulaic political documents by examining a self-built bilingual parallel corpus of 43 speeches delivered in United Nations Security Council (UNSC) meetings by Chinese representatives. The study also probes corpus-based approaches to explore formulaic phraseology and demonstrates a method to retrieve Chinese formulaic phraseology from the UNSC corpus. Formulaic phraseology is often seen in political discourse. It can be defined as a sequence, continuous or discontinuous, of words or other meaning elements that are, or appear to be, prefabricated, stored and retrieved whole from memory at the time of use rather than being subject to generation or analysis by the language grammar. This study begins with a literature review of formulaic phraseology, including its features and significance in simultaneous interpreting. It then exhibits a four-step retrieval process with the Sketch Engine software program to acquire Chinese formulaic phraseology from the corpus to fill previous studies’ gap. Key functional units of the Sketch Engine, including Wordlist, N-grams, and Concordance, are used to extract formulaic phraseology from the UNSC corpus. Methodological issues involved in identifying formulaic phraseology, such as length of phraseology and quantitative criteria (frequency and dispersion thresholds), are also discussed in the study. Three types of formulaic phraseology are identified: (1) greeting representatives and other members and expressing appreciation; (2) expressing concerns about the topic of the meeting; (3) expressing China’s viewpoints about the topic of the meeting. The training of interpreters would be more effective if this categorization of formulaic phraseology is incorporated into the curriculum.

Download Full-text

Transitive verb plus reflexive pronoun/personal pronoun patterns in English and Japanese: using a Japanese-English parallel corpus

Corpus Linguistics 25 Years on ◽

10.1163/9789401204347_019 ◽

2007 ◽

pp. 333-346

Keyword(s):

Personal Pronoun ◽

Transitive Verb ◽

Parallel Corpus ◽

Reflexive Pronoun ◽

Japanese English

Download Full-text

The ACTRES parallel corpus: an English–Spanish translation corpus

Corpora ◽

10.3366/e1749503208000051 ◽

2008 ◽

Vol 3 (1) ◽

pp. 31-41 ◽

Cited By ~ 13

Author(s):

Marlén Izquierdo ◽

Knut Hofland ◽

Øystein Reigem

Keyword(s):

Information Technology ◽

Research Group ◽

Translation Studies ◽

Contrastive Analysis ◽

Spanish Translation ◽

Parallel Corpus ◽

Empirical Results ◽

Linguistic Research ◽

Actual Use ◽

The University

This paper describes the compilation of the ACTRES Parallel Corpus, an English–Spanish translation corpus built at the Department of Modern Languages at the University of León (Spain) by the ACTRES research group. The computerisation of the corpus was carried out in collaboration with Knut Hofland and Øystein Reigem, from the Department of Culture, Language and Information Technology, Aksis, at the UNIFOB/University of Bergen (Norway). The corpus is conceived as a powerful tool for cross-linguistic research in the fields of Contrastive Analysis and Descriptive Translation Studies. It was the need to bridge the gap between these disciplines and to extend applications that encouraged the building of a parallel corpus as a suitable tool to achieve these goals. This paper focusses on the practical aspects of building the corpus. A brief account of the research which prompted this endeavour precedes the description of this process. 4 4 This paper is an account of the building of the ACTRES Parallel Corpus, so no empirical results from research done on the basis of the corpus are reported here. Concerning new insights drawn from the actual use of P-ACTRES in English–Spanish translation and contrastive projects, there is an extended bibliography at: http://actres.unileon.es/

Download Full-text