scholarly journals Improving Word Alignment Quality by Relearning Translation Models

2005 ◽  
Vol 12 (2) ◽  
pp. 175-188
Author(s):  
SETSUO YAMADA ◽  
MASAAKI NAGATA ◽  
KENJI YAMADA
2012 ◽  
Vol 7 ◽  
Author(s):  
Annette Rios ◽  
Anne Göhring ◽  
Martin Volk

Parallel treebanking is greatly facilitated by automatic word alignment. We work on building a trilingual treebank for German, Spanish and Quechua. We ran different alignment experiments on parallel Spanish-Quechua texts, measured the alignment quality, and compared these results to the figures we obtained aligning a comparable corpus of Spanish-German texts. This preliminary work has shown us the best word segmentation to use for the agglutinative language Quechua with respect to alignment. We also acquired a first impression about how well Quechua can be aligned to Spanish, an important prerequisite for bilingual lexicon extraction, parallel treebanking or statistical machine translation.


2013 ◽  
Vol 1 ◽  
pp. 291-300 ◽  
Author(s):  
Zhiguo Wang ◽  
Chengqing Zong

Dependency cohesion refers to the observation that phrases dominated by disjoint dependency subtrees in the source language generally do not overlap in the target language. It has been verified to be a useful constraint for word alignment. However, previous work either treats this as a hard constraint or uses it as a feature in discriminative models, which is ineffective for large-scale tasks. In this paper, we take dependency cohesion as a soft constraint, and integrate it into a generative model for large-scale word alignment experiments. We also propose an approximate EM algorithm and a Gibbs sampling algorithm to estimate model parameters in an unsupervised manner. Experiments on large-scale Chinese-English translation tasks demonstrate that our model achieves improvements in both alignment quality and translation quality.


2010 ◽  
Vol 36 (3) ◽  
pp. 303-339 ◽  
Author(s):  
Yang Liu ◽  
Qun Liu ◽  
Shouxun Lin

Word alignment plays an important role in many NLP tasks as it indicates the correspondence between words in a parallel text. Although widely used to align large bilingual corpora, generative models are hard to extend to incorporate arbitrary useful linguistic information. This article presents a discriminative framework for word alignment based on a linear model. Within this framework, all knowledge sources are treated as feature functions, which depend on a source language sentence, a target language sentence, and the alignment between them. We describe a number of features that could produce symmetric alignments. Our model is easy to extend and can be optimized with respect to evaluation metrics directly. The model achieves state-of-the-art alignment quality on three word alignment shared tasks for five language pairs with varying divergence and richness of resources. We further show that our approach improves translation performance for various statistical machine translation systems.


2007 ◽  
Vol 33 (3) ◽  
pp. 293-303 ◽  
Author(s):  
Alexander Fraser ◽  
Daniel Marcu

Automatic word alignment plays a critical role in statistical machine translation. Unfortunately, the relationship between alignment quality and statistical machine translation performance has not been well understood. In the recent literature, the alignment task has frequently been decoupled from the translation task and assumptions have been made about measuring alignment quality for machine translation which, it turns out, are not justified. In particular, none of the tens of papers published over the last five years has shown that significant decreases in alignment error rate (AER) result in significant increases in translation performance. This paper explains this state of affairs and presents steps towards measuring alignment quality in a way which is predictive of statistical machine translation performance.


2014 ◽  
Author(s):  
Sara Stymne ◽  
Jörg Tiedemann ◽  
Joakim Nivre

2014 ◽  
Author(s):  
Yin-Wen Chang ◽  
Alexander M. Rush ◽  
John DeNero ◽  
Michael Collins
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document