Exploration of Effective Attention Strategies for Neural Automatic Post-editing with Transformer
Automatic post-editing (APE) is the study of correcting translation errors in the output of an unknown machine translation (MT) system and has been considered as a method of improving translation quality without any modification to conventional MT systems. Recently, several variants of Transformer that take both the MT output and its corresponding source sentence as inputs have been proposed for APE; and models introducing an additional attention layer into the encoder to jointly encode the MT output with its source sentence recorded a high-rank in the WMT19 APE shared task. We examine the effectiveness of such joint-encoding strategy in a controlled environment and compare four types of decoder multi-source attention strategies that have been introduced into previous APE models. The experimental results indicate that the joint-encoding strategy is effective and that taking the final encoded representation of the source sentence is the more proper strategy than taking such representation within the same encoder stack. Furthermore, among the multi-source attention strategies combined with the joint-encoding, the strategy that applies attention to the concatenated input representation and the strategy that adds up the individual attention to each input improve the quality of APE results over the strategy using the joint-encoding only.