SP
BravenNow
MTQE.en-he: Machine Translation Quality Estimation for English-Hebrew
| USA | ✓ Verified - arxiv.org

MTQE.en-he: Machine Translation Quality Estimation for English-Hebrew

#Machine Translation #Quality Estimation #Hebrew language #MTQE.en-he #arXiv #Natural Language Processing #Benchmarking

📌 Key Takeaways

  • The release of MTQE.en-he marks the first public benchmark for English-Hebrew Machine Translation Quality Estimation.
  • The dataset includes 959 English-Hebrew segments annotated with Direct Assessment scores from three human experts.
  • Benchmarking was performed using ChatGPT, TransQuest, and CometKiwi to establish performance baselines.
  • Research indicates that ensembling multiple models yields superior results compared to using individual assessment tools.

📖 Full Retelling

Researchers have officially released MTQE.en-he, the first publicly available benchmark for English-Hebrew Machine Translation Quality Estimation (QE), via the arXiv preprint server in early February 2025 to address the lack of standardized evaluation tools for this specific language pair. The dataset was developed to provide a reliable framework for assessing how well machine translation systems handle the linguistic complexities of Hebrew, which has historically lacked robust public resources for automated quality assessment. The release marks a significant milestone for computational linguistics by providing researchers with localized data to train and test advanced Al models. The MTQE.en-he dataset is comprised of 959 distinct English segments sourced from the WMT24++ corpus, with each segment paired with a corresponding machine-translated version in Hebrew. To ensure the highest level of accuracy and reliability, the quality of these translations was evaluated using Direct Assessment (DA) scores provided by three independent human experts. This human-in-the-loop validation process allows the benchmark to serve as a gold standard for comparing the performance of automated QE metrics against human judgment. In addition to the dataset release, the researchers conducted extensive benchmarking using several state-of-the-art technologies, including ChatGPT prompting, TransQuest, and CometKiwi. Their findings revealed that while individual models provide varying levels of accuracy, an ensemble approach combining all three systems consistently outperforms any single model. This discovery suggests that leveraging multiple architectural paradigms—from large language models to specialized quality estimation frameworks—is the most effective strategy for predicting translation quality in the English-Hebrew domain.

🏷️ Themes

Artificial Intelligence, Linguistics, Technology

Entity Intersection Graph

No entity connections available yet for this article.

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine