Papers
arxiv:2205.08533
Consistent Human Evaluation of Machine Translation across Language Pairs
Published on May 17, 2022
Authors:
Abstract
Obtaining meaningful quality scores for machine translation systems through human evaluation remains a challenge given the high variability between human evaluators, partly due to subjective expectations for translation quality for different language pairs. We propose a new metric called XSTS that is more focused on semantic equivalence and a cross-lingual calibration method that enables more consistent assessment. We demonstrate the effectiveness of these novel contributions in large scale evaluation studies across up to 14 language pairs, with translation both into and out of English.
Models citing this paper 0
No model linking this paper
Cite arxiv.org/abs/2205.08533 in a model README.md to link it from this page.
Datasets citing this paper 1
Spaces citing this paper 1
Collections including this paper 0
No Collection including this paper
Add this paper to a collection to link it from this page.