Accuracy errors in post-edited output, based on Korean-English parallel corpus for AI training.

2021 ◽  
Vol 23 (3) ◽  
pp. 29-58
Author(s):  
Jagyeong Kim ◽  
Keyword(s):  
2016 ◽  
Vol 1 (1) ◽  
pp. 45-49
Author(s):  
Avinash Singh ◽  
Asmeet Kour ◽  
Shubhnandan S. Jamwal

The objective behind this paper is to analyze the English-Dogri parallel corpus translation. Machine translation is the translation from one language into another language. Machine translation is the biggest application of the Natural Language Processing (NLP). Moses is statistical machine translation system allow to train translation models for any language pair. We have developed translation system using Statistical based approach which helps in translating English to Dogri and vice versa. The parallel corpus consists of 98,973 sentences. The system gives accuracy of 80% in translating English to Dogri and the system gives accuracy of 87% in translating Dogri to English system.


Author(s):  
Rashmini Naranpanawa ◽  
Ravinga Perera ◽  
Thilakshi Fonseka ◽  
Uthayasanker Thayasivam

Neural machine translation (NMT) is a remarkable approach which performs much better than the Statistical machine translation (SMT) models when there is an abundance of parallel corpus. However, vanilla NMT is primarily based upon word-level with a fixed vocabulary. Therefore, low resource morphologically rich languages such as Sinhala are mostly affected by the out of vocabulary (OOV) and Rare word problems. Recent advancements in subword techniques have opened up opportunities for low resource communities by enabling open vocabulary translation. In this paper, we extend our recently published state-of-the-art EN-SI translation system using the transformer and explore standard subword techniques on top of it to identify which subword approach has a greater effect on English Sinhala language pair. Our models demonstrate that subword segmentation strategies along with the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus.


Babel ◽  
2021 ◽  
Author(s):  
Baimei Wu ◽  
Andrew K.F. Cheung ◽  
Jie Xing

Abstract This pilot study investigates the formulaic phraseology most frequently used in highly formulaic political documents by examining a self-built bilingual parallel corpus of 43 speeches delivered in United Nations Security Council (UNSC) meetings by Chinese representatives. The study also probes corpus-based approaches to explore formulaic phraseology and demonstrates a method to retrieve Chinese formulaic phraseology from the UNSC corpus. Formulaic phraseology is often seen in political discourse. It can be defined as a sequence, continuous or discontinuous, of words or other meaning elements that are, or appear to be, prefabricated, stored and retrieved whole from memory at the time of use rather than being subject to generation or analysis by the language grammar. This study begins with a literature review of formulaic phraseology, including its features and significance in simultaneous interpreting. It then exhibits a four-step retrieval process with the Sketch Engine software program to acquire Chinese formulaic phraseology from the corpus to fill previous studies’ gap. Key functional units of the Sketch Engine, including Wordlist, N-grams, and Concordance, are used to extract formulaic phraseology from the UNSC corpus. Methodological issues involved in identifying formulaic phraseology, such as length of phraseology and quantitative criteria (frequency and dispersion thresholds), are also discussed in the study. Three types of formulaic phraseology are identified: (1) greeting representatives and other members and expressing appreciation; (2) expressing concerns about the topic of the meeting; (3) expressing China’s viewpoints about the topic of the meeting. The training of interpreters would be more effective if this categorization of formulaic phraseology is incorporated into the curriculum.


Corpora ◽  
2008 ◽  
Vol 3 (1) ◽  
pp. 31-41 ◽  
Author(s):  
Marlén Izquierdo ◽  
Knut Hofland ◽  
Øystein Reigem

This paper describes the compilation of the ACTRES Parallel Corpus, an English–Spanish translation corpus built at the Department of Modern Languages at the University of León (Spain) by the ACTRES research group. The computerisation of the corpus was carried out in collaboration with Knut Hofland and Øystein Reigem, from the Department of Culture, Language and Information Technology, Aksis, at the UNIFOB/University of Bergen (Norway). The corpus is conceived as a powerful tool for cross-linguistic research in the fields of Contrastive Analysis and Descriptive Translation Studies. It was the need to bridge the gap between these disciplines and to extend applications that encouraged the building of a parallel corpus as a suitable tool to achieve these goals. This paper focusses on the practical aspects of building the corpus. A brief account of the research which prompted this endeavour precedes the description of this process. 4 4 This paper is an account of the building of the ACTRES Parallel Corpus, so no empirical results from research done on the basis of the corpus are reported here. Concerning new insights drawn from the actual use of P-ACTRES in English–Spanish translation and contrastive projects, there is an extended bibliography at: http://actres.unileon.es/


Sign in / Sign up

Export Citation Format

Share Document