Vocabulary shapes cross-lingual variation of word-order learnability in language models
#vocabulary #word order #cross-lingual #language models #learnability #multilingual #linguistic variation
📌 Key Takeaways
- Vocabulary composition influences how well language models learn word order across languages.
- Cross-lingual differences in word-order learnability are linked to vocabulary structure.
- Language models show varied performance in word-order tasks based on training data vocabulary.
- The study highlights vocabulary as a key factor in multilingual model efficiency.
📖 Full Retelling
arXiv:2603.19427v1 Announce Type: cross
Abstract: Why do some languages like Czech permit free word order, while others like English do not? We address this question by pretraining transformer language models on a spectrum of synthetic word-order variants of natural languages. We observe that greater word-order irregularity consistently raises model surprisal, indicating reduced learnability. Sentence reversal, however, affects learnability only weakly. A coarse distinction of free- (e.g., Czec
🏷️ Themes
Linguistics, AI Models
Entity Intersection Graph
No entity connections available yet for this article.
Original Source
arXiv:2603.19427v1 Announce Type: cross
Abstract: Why do some languages like Czech permit free word order, while others like English do not? We address this question by pretraining transformer language models on a spectrum of synthetic word-order variants of natural languages. We observe that greater word-order irregularity consistently raises model surprisal, indicating reduced learnability. Sentence reversal, however, affects learnability only weakly. A coarse distinction of free- (e.g., Czec
Read full article at source