Exploring the Benefits and Limitations of Multilinguality for Non-autoregressive Machine Translation


Non-autoregressive (NAR) machine translation has recently received significant developments and now achieves comparable quality with autoregressive (AR) models on some benchmarks while providing an efficient alternative to AR inference. However, while AR translation is often used to implement multilingual models that benefit from transfer between languages and from improved serving efficiency, multilingual NAR models remain relatively unexplored. Taking Connectionist Temporal Classification as an example NAR model and IMPUTER as a semi-NAR model, we present a comprehensive empirical study of multilingual NAR. We test its capabilities with respect to positive transfer between related languages and negative transfer under capacity constraints. As NAR models require distilled training sets, we carefully study the impact of bilingual versus multilingual teachers. Finally, we fit a scaling law for multilingual NAR to determine capacity bottlenecks, which quantifies its performance relative to the AR model as the model scale increases.

In Proceedings of the Seventh Conference on Machine Translation.