Developing Statistical Machine Translation System for English and Nigerian Languages
The global demand for translation and translation tools currently surpasses the capacity of available solutions. Besides, there is no one-solution-fits-all, off-the-shelf solution for all languages. Thus, the need and urgency to increase the scale of research for the development of translation tools and devices continue to grow, especially for languages suffering under the pressure of globalisation. This paper discusses our experiments on translation systems between English and two Nigerian languages: Igbo and Yorùbá. The study is setup to build parallel corpora, train and experiment English-to-Igbo, (), English-to-Yorùbá, () and Igbo-to-Yorùbá, () phrase-based statistical machine translation systems. The systems were trained on parallel corpora that were created for each language pair using text from the religious domain in the course of this research. A BLEU score of 30.04, 29.01 and 18.72 respectively was recorded for the English-to-Igbo, English-to-Yorùbá and Igbo-to-Yorùbá MT systems. An error analysis of the systems’ outputs was conducted using a linguistically motivated MT error analysis approach and it showed that errors occurred mostly at the lexical, grammatical and semantic levels. While the study reveals the potentials of our corpora, it also shows that the size of the corpora is yet an issue that requires further attention. Thus an important target in the immediate future is to increase the quantity and quality of the data.