SP
BravenNow
Vocabulary shapes cross-lingual variation of word-order learnability in language models
| USA | technology | ✓ Verified - arxiv.org

Vocabulary shapes cross-lingual variation of word-order learnability in language models

#vocabulary #word order #cross-lingual #language models #learnability #multilingual #linguistic variation

📌 Key Takeaways

  • Vocabulary composition influences how well language models learn word order across languages.
  • Cross-lingual differences in word-order learnability are linked to vocabulary structure.
  • Language models show varied performance in word-order tasks based on training data vocabulary.
  • The study highlights vocabulary as a key factor in multilingual model efficiency.

📖 Full Retelling

arXiv:2603.19427v1 Announce Type: cross Abstract: Why do some languages like Czech permit free word order, while others like English do not? We address this question by pretraining transformer language models on a spectrum of synthetic word-order variants of natural languages. We observe that greater word-order irregularity consistently raises model surprisal, indicating reduced learnability. Sentence reversal, however, affects learnability only weakly. A coarse distinction of free- (e.g., Czec

🏷️ Themes

Linguistics, AI Models

Entity Intersection Graph

No entity connections available yet for this article.

}
Original Source
arXiv:2603.19427v1 Announce Type: cross Abstract: Why do some languages like Czech permit free word order, while others like English do not? We address this question by pretraining transformer language models on a spectrum of synthetic word-order variants of natural languages. We observe that greater word-order irregularity consistently raises model surprisal, indicating reduced learnability. Sentence reversal, however, affects learnability only weakly. A coarse distinction of free- (e.g., Czec
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine