A Knowledge-Based Machine Translation Using AI Technique

This article presents a realistic technique for the machine aided translation system. In this technique, the system dictionary is partitioned into a multi-module structure for fast retrieval of Arabic features of English words. Each module is accessed through an interface that includes the necessary morphological rules, which directs the search toward the proper sub-dictionary. Another factor that aids fast retrieval of Arabic features of words is the prediction of the word category, and accesses its sub-dictionary to retrieve the corresponding attributes. The system consists of three main parts, which are the source language analysis, the transfer rules between source language (English) and target language (Arabic), and the generation of the target language. The proposed system is able to translate, some negative forms, demonstrations, and conjunctions, and also adjust nouns, verbs, and adjectives according their attributes. Then, it adds the symptom of Arabic words to generate a correct sentence.

Download Full-text

A Bicolano-to-Tagalog Transfer-Based Machine Translation System

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1062.0882s819 ◽

2019 ◽

Vol 8 (2S8) ◽

pp. 1324-1330

Keyword(s):

Machine Translation ◽

Syntactic Structure ◽

Internal Representation ◽

Target Language ◽

Translation System ◽

Source Language ◽

Three Phase ◽

Transfer Rules ◽

Machine Translation System ◽

Overall Performance

The Bicolano-Tagalog Transfer-based Machine Translation System is a unidirectional machine translator for languages Bicolano and Tagalog. The transfer-based approach is divided into three phase: Pre-Processing Analysis, Morphological Transfer, and Sentence Generation. The system analyze first the source language (Bicolano) input to create some internal representation. This includes the tokenizer, stemmer, POS tag and parser. Through transfer rules, it then typically manipulates this internal representation to transfer parsed source language syntactic structure into target language syntactic structure. Finally, the system generates Tagalog sentence from own morphological and syntactic information. Each phase will undergo training and evaluation test for the competence of end-results. Overall performance shows a 71.71% accuracy rate.

Download Full-text

Word Sense Based Hindi-Tamil Statistical Machine Translation

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2018010102 ◽

2018 ◽

Vol 14 (1) ◽

pp. 17-27

Author(s):

Vimal Kumar K. ◽

Divakar Yadav

Keyword(s):

Machine Translation ◽

Language Processing ◽

Semantic Analysis ◽

Target Language ◽

Translation System ◽

Great Success ◽

Source Language ◽

Additional Information ◽

Part Of Speech ◽

Preceding Word

Corpus based natural language processing has emerged with great success in recent years. It is not only used for languages like English, French, Spanish, and Hindi but also is widely used for languages like Tamil, Telugu etc. This paper focuses to increase the accuracy of machine translation from Hindi to Tamil by considering the word's sense as well as its part-of-speech. This system works on word by word translation from Hindi to Tamil language which makes use of additional information such as the preceding words, the current word's part of speech and the word's sense itself. For such a translation system, the frequency of words occurring in the corpus, the tagging of the input words and the probability of the preceding word of the tagged words are required. Wordnet is used to identify various synonym for the words specified in the source language. Among these words, the one which is more relevant to the word specified in source language is considered for the translation to target language. The introduction of the additional information such as part-of-speech tag, preceding word information and semantic analysis has greatly improved the accuracy of the system.

Download Full-text

Language To Language Translation System

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206363 ◽

2020 ◽

pp. 289-293

Author(s):

Ms Pratheeksha ◽

Pratheeksha Rai ◽

Ms Vijetha

Keyword(s):

Speech Recognition ◽

Machine Translation ◽

Automatic Speech Recognition ◽

Speech Synthesis ◽

Language Translation ◽

Target Language ◽

Translation System ◽

Text To Speech ◽

Source Language ◽

Text To Speech Synthesis

The system used in Language to Language Translation is the phrases spoken in one language are immediately spoken in other language by the device. Language to Language Translation is a three steps software process which includes Automatic Speech Recognition, Machine Translation and Voice Synthesis. Language to Language system includes the major speech translation projects using different approaches for Speech Recognition, Translation and Text to Speech synthesis highlighting the major pros and cons for the approach being used. Language translation is a process that takes the conversational phrase in one language as an input and translated speech phrases in another language as the output. The three components of language-to-language translation are connected in a sequential order. Automatic Speech Recognition (ASR) is responsible for converting the spoken phrases of source language to the text in the same language followed by machine translation which translates the source language to next target language text and finally the speech synthesizer is responsible for text to speech conversion of target language.

Download Full-text

Optimizing Tokenization Choice for Machine Translation across Multiple Target Languages

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0025 ◽

2017 ◽

Vol 108 (1) ◽

pp. 257-269 ◽

Cited By ~ 4

Author(s):

Nasser Zalmout ◽

Nizar Habash

Keyword(s):

Machine Translation ◽

Performance Enhancement ◽

Statistical Machine Translation ◽

Target Language ◽

Source Language ◽

Context Variable ◽

Significant Performance ◽

Morphologically Rich Languages ◽

Target Languages ◽

Language Text

AbstractTokenization is very helpful for Statistical Machine Translation (SMT), especially when translating from morphologically rich languages. Typically, a single tokenization scheme is applied to the entire source-language text and regardless of the target language. In this paper, we evaluate the hypothesis that SMT performance may benefit from different tokenization schemes for different words within the same text, and also for different target languages. We apply this approach to Arabic as a source language, with five target languages of varying morphological complexity: English, French, Spanish, Russian and Chinese. Our results show that different target languages indeed require different source-language schemes; and a context-variable tokenization scheme can outperform a context-constant scheme with a statistically significant performance enhancement of about 1.4 BLEU points.

Download Full-text

Controlling Neural Machine Translation Formality with Synthetic Supervision

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6379 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8568-8575

Author(s):

Xing Niu ◽

Marine Carpuat

Keyword(s):

Machine Translation ◽

Target Language ◽

Sentence Pair ◽

English Sentence ◽

Neural Machine Translation ◽

Source Language ◽

Training Scheme ◽

Training Examples ◽

Language Content ◽

Missing Element

This work aims to produce translations that convey source language content at a formality level that is appropriate for a particular audience. Framing this problem as a neural sequence-to-sequence task ideally requires training triplets consisting of a bilingual sentence pair labeled with target language formality. However, in practice, available training examples are limited to English sentence pairs of different styles, and bilingual parallel sentences of unknown formality. We introduce a novel training scheme for multi-task models that automatically generates synthetic training triplets by inferring the missing element on the fly, thus enabling end-to-end training. Comprehensive automatic and human assessments show that our best model outperforms existing models by producing translations that better match desired formality levels while preserving the source meaning.1

Download Full-text

Efficient Phrase Table pruning for Hindi to English machine translation through syntactic and marker-based filtering and hybrid similarity measurement

Natural Language Engineering ◽

10.1017/s1351324918000360 ◽

2018 ◽

Vol 25 (1) ◽

pp. 171-210

Author(s):

NILADRI CHATTERJEE ◽

SUSMITA GUPTA

Keyword(s):

Machine Translation ◽

Syntactic Structure ◽

Verbal Learning ◽

Target Language ◽

Maximum Efficiency ◽

Similarity Measurement ◽

Translation System ◽

Training Corpus ◽

Generative Process

AbstractFor a given training corpus of parallel sentences, the quality of the output produced by a translation system relies heavily on the underlying similarity measurement criteria. A phrase-based machine translation system derives its output through a generative process using a Phrase Table comprising source and target language phrases. As a consequence, the more effective the Phrase Table is, in terms of its size and the output that may be derived out of it, the better is the expected outcome of the underlying translation system. However, finding the most similar phrase(s) from a given training corpus that can help generate a good quality translation poses a serious challenge. In practice, often there are many parallel phrase entries in a Phrase Table that are either redundant, or do not contribute to the translation results effectively. Identifying these candidate entries and removing them from the Phrase Table will not only reduce the size of the Phrase Table, but should also help in improving the processing speed for generating the translations. The present paper develops a scheme based on syntactic structure and the marker hypothesis (Green 1979, The necessity of syntax markers: two experiments with artificial languages, Journal of Verbal Learning and Behavior) for reducing the size of a Phrase Table, without compromising much on the translation quality of the output, by retaining the non-redundant and meaningful parallel phrases only. The proposed scheme is complemented with an appropriate similarity measurement scheme to achieve maximum efficiency in terms of BLEU scores. Although designed for Hindi to English machine translation, the overall approach is quite general, and is expected to be easily adaptable for other language pairs as well.

Download Full-text

Predicting and Using a Pragmatic Component of Lexical Aspect

Linguistic Issues in Language Technology ◽

10.33011/lilt.v13i.1389 ◽

2016 ◽

Vol 13 ◽

Author(s):

Sharid Loáiciga ◽

Cristina Grisot

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Target Language ◽

Translation System ◽

Automatic Annotation ◽

Parallel Corpora ◽

Linguistic Data ◽

Lexical Aspect ◽

Machine Translation System ◽

Automatic Systems

This paper proposes a method for improving the results of a statistical Machine Translation system using boundedness, a pragmatic component of the verbal phrase’s lexical aspect. First, the paper presents manual and automatic annotation experiments for lexical aspect in English-French parallel corpora. It will be shown that this aspectual property is identified and classified with ease both by humans and by automatic systems. Second, Statistical Machine Translation experiments using the boundedness annotations are presented. These experiments show that the information regarding lexical aspect is useful to improve the output of a Machine Translation system in terms of better choices of verbal tenses in the target language, as well as better lexical choices. Ultimately, this work aims at providing a method for the automatic annotation of data with boundedness information and at contributing to Machine Translation by taking into account linguistic data.

Download Full-text

A machine translation system for the target language inexpert

Proceedings of the 13th conference on Computational linguistics - ◽

10.3115/991146.991220 ◽

1990 ◽

Cited By ~ 3

Author(s):

Xiuming Huang

Keyword(s):

Machine Translation ◽

Target Language ◽

Translation System ◽

Machine Translation System

Download Full-text

A Study of Neural Machine Translation from Chinese to Urdu

Journal of Autonomous Intelligence ◽

10.32629/jai.v2i4.82 ◽

2020 ◽

Vol 2 (4) ◽

pp. 28

Author(s):

. Zeeshan

Keyword(s):

Machine Translation ◽

Chinese Language ◽

Language Translation ◽

Target Language ◽

Foreign Languages ◽

Neural Machine Translation ◽

Source Language ◽

Great Progress ◽

Score Method ◽

Translation Methods

Machine Translation (MT) is used for giving a translation from a source language to a target language. Machine translation simply translates text or speech from one language to another language, but this process is not sufficient to give the perfect translation of a text due to the requirement of identification of whole expressions and their direct counterparts. Neural Machine Translation (NMT) is one of the most standard machine translation methods, which has made great progress in the recent years especially in non-universal languages. However, local language translation software for other foreign languages is limited and needs improving. In this paper, the Chinese language is translated to the Urdu language with the help of Open Neural Machine Translation (OpenNMT) in Deep Learning. Firstly, a Chineseto Urdu language sentences datasets were established and supported with Seven million sentences. After that, these datasets were trained by using the Open Neural Machine Translation (OpenNMT) method. At the final stage, the translation was compared to the desired translation with the help of the Bleu Score Method.

Download Full-text

Query Expansion for Slovak to Bulgarian Language Machine Translation using Parallel Search

WSEAS TRANSACTIONS ON SYSTEMS AND CONTROL ◽

10.37394/23203.2021.16.30 ◽

2021 ◽

pp. 351-357

Author(s):

VELISLAVA STOYKOVA ◽

DANIELA MAJCHRAKOVA

Keyword(s):

Machine Translation ◽

Query Expansion ◽

Statistical Approach ◽

Semantic Relations ◽

Target Language ◽

Parallel Search ◽

Keyword Query ◽

Source Language ◽

Standard Presentation ◽

Standard Semantic

The paper presents results of the application of a statistical approach for Slovak to Bulgarian language machine translation. It uses Information Retrieval inspired search techniques and employs sever alalgorithmic steps of parallel statistical search with query expansion in Slovak-Bulgarian EUROPARL 7 Corpus using the Sketch Engine software and its scoring. The search includes the generation of concordances,collocations, word sketch differences, word sketches, and thesauri of the studied keyword (query) by using a statistical scoring, which is regarded as intermediate (inter-lingual) semantic standard presentation by means of which the studied keyword (from the source language) is mapped together with its possible translation equivalents (onto the target language. The results present the study of adjectival collocabillity in both Slovak and Bulgarian language from the corpus of political speech texts outlining the standard semantic relations based on the evaluation of statistical scoring. Finally, the advantages and shortcomings of the approach are discussed.

Download Full-text