Optimizing Language Models for Crosslingual Knowledge Consistency
#language models #crosslingual consistency #knowledge alignment #multilingual datasets #factual accuracy
📌 Key Takeaways
- Researchers developed a method to improve language models' consistency across languages.
- The approach reduces factual errors when models answer the same question in different languages.
- It involves fine-tuning models on parallel multilingual datasets to align knowledge representations.
- This optimization enhances reliability for multilingual applications like translation and QA systems.
📖 Full Retelling
arXiv:2603.04678v1 Announce Type: cross
Abstract: Large language models are known to often exhibit inconsistent knowledge. This is particularly problematic in multilingual scenarios, where models are likely to be asked similar questions in different languages, and inconsistent responses can undermine their reliability. In this work, we show that this issue can be mitigated using reinforcement learning with a structured reward function, which leads to an optimal policy with consistent crosslingu
🏷️ Themes
AI Optimization, Multilingual NLP
Entity Intersection Graph
No entity connections available yet for this article.
Original Source
--> Computer Science > Computation and Language arXiv:2603.04678 [Submitted on 4 Mar 2026] Title: Optimizing Language Models for Crosslingual Knowledge Consistency Authors: Tianyu Liu , Jirui Qi , Mrinmaya Sachan , Ryan Cotterell , Raquel Fernández , Arianna Bisazza View a PDF of the paper titled Optimizing Language Models for Crosslingual Knowledge Consistency, by Tianyu Liu and 5 other authors View PDF HTML Abstract: Large language models are known to often exhibit inconsistent knowledge. This is particularly problematic in multilingual scenarios, where models are likely to be asked similar questions in different languages, and inconsistent responses can undermine their reliability. In this work, we show that this issue can be mitigated using reinforcement learning with a structured reward function, which leads to an optimal policy with consistent crosslingual responses. We introduce Direct Consistency Optimization , a DPO-inspired method that requires no explicit reward model and is derived directly from the LLM itself. Comprehensive experiments show that DCO significantly improves crosslingual consistency across diverse LLMs and outperforms existing methods when training with samples of multiple languages, while complementing DPO when gold labels are available. Extra experiments demonstrate the effectiveness of DCO in bilingual settings, significant out-of-domain generalizability, and controllable alignment via direction hyperparameters. Taken together, these results establish DCO as a robust and efficient solution for improving knowledge consistency across languages in multilingual LLMs. All code, training scripts, and evaluation benchmarks are released at this https URL . Comments: Under review. The first two authors contributed equally Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI) Cite as: arXiv:2603.04678 [cs.CL] (or arXiv:2603.04678v1 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2603.04678 Focus to learn more arXiv...
Read full article at source