SP
BravenNow
Bridging Gaps in Natural Language Processing for Yor\`ub\'a: A Systematic Review of a Decade of Progress and Prospects
| USA | technology | ✓ Verified - arxiv.org

Bridging Gaps in Natural Language Processing for Yor\`ub\'a: A Systematic Review of a Decade of Progress and Prospects

#Natural Language Processing #Yorùbá #African Languages #Systematic Review #Linguistic Resources #Machine Learning #Digital Inclusion #Tonal Language

📌 Key Takeaways

  • Researchers published a systematic review of NLP development for Yorùbá language
  • Analysis of 105 studies from 2014-2024 revealed significant resource limitations
  • Linguistic challenges like tonal complexity and diacritic dependency pose major obstacles
  • Growing body of multilingual resources is emerging despite constraints
  • Review aims to guide future research and promote inclusion of African languages in NLP

📖 Full Retelling

Toheeb Aduramomi Jimoh, Tabea De Wille, and Nikola S. Nikolov published a systematic review on Natural Language Processing (NLP) development for the Yorùbá language on the arXiv preprint server on February 24, 2026, addressing the limited technological advancement for this tonal African language due to persistent resource constraints and linguistic complexities. The comprehensive research, titled 'Bridging Gaps in Natural Language Processing for Yorùbá: A Systematic Review of a Decade of Progress and Prospects,' examines 105 primary studies conducted between 2014 and 2024 from reputable databases to identify challenges, resources, techniques, and applications in Yorùbá NLP. The researchers employed a structured protocol with a well-defined search string to analyze existing literature, revealing significant obstacles including the scarcity of annotated corpora, limited availability of pre-trained language models, and linguistic challenges such as tonal complexity and diacritic dependency that have hindered technological progress for the language. Despite these challenges, the review identifies a growing body of multilingual and monolingual resources for Yorùbá NLP, with rule-based methods being among the prominent techniques utilized. The authors also note socio-cultural factors such as code-switching and the decreasing digital usage of the language as additional constraints affecting development in this field.

🏷️ Themes

Natural Language Processing, African Languages, Linguistic Resources, Technological Inclusion

📚 Related People & Topics

Natural language processing

Processing of natural language by a computer

Natural language processing (NLP) is the processing of natural language information by a computer. NLP is a subfield of computer science and is closely associated with artificial intelligence. NLP is also related to information retrieval, knowledge representation, computational linguistics, and ling...

View Profile → Wikipedia ↗
Languages of Africa

Languages of Africa

The number of languages natively spoken in Africa is variously estimated (depending on the delineation of language vs. dialect) at between 1,250 and 2,100, and by some counts at over 3,000. Nigeria alone has over 500 languages (according to SIL Ethnologue), one of the greatest concentrations of ling...

View Profile → Wikipedia ↗
Systematic review

Systematic review

Comprehensive review of research literature using systematic methods

A systematic review is a scholarly synthesis of the evidence on a clearly presented topic using critical methods to identify, define and assess research on the topic. A systematic review extracts and interprets data from published studies on the topic (in the scientific literature), then analyzes, d...

View Profile → Wikipedia ↗

Machine learning

Study of algorithms that improve automatically through experience

Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data, and thus perform tasks without explicit instructions. Within a subdiscipline in machine learning, advances i...

View Profile → Wikipedia ↗

Entity Intersection Graph

Connections for Natural language processing:

🌐 Curriculum learning 1 shared
🌐 Artificial intelligence 1 shared
🌐 Chatbot 1 shared
🌐 Ethics of artificial intelligence 1 shared
🌐 BBC World Service 1 shared
View full profile
Original Source
--> Computer Science > Computation and Language arXiv:2502.17364 [Submitted on 24 Feb 2025 ( v1 ), last revised 24 Feb 2026 (this version, v2)] Title: Bridging Gaps in Natural Language Processing for Yorùbá: A Systematic Review of a Decade of Progress and Prospects Authors: Toheeb Aduramomi Jimoh , Tabea De Wille , Nikola S. Nikolov View a PDF of the paper titled Bridging Gaps in Natural Language Processing for Yor\`ub\'a: A Systematic Review of a Decade of Progress and Prospects, by Toheeb Aduramomi Jimoh and Tabea De Wille and Nikola S. Nikolov View PDF HTML Abstract: Natural Language Processing is becoming a dominant subset of artificial intelligence as the need to help machines understand human language looks indispensable. Several NLP applications are ubiquitous, partly due to the myriad of datasets being churned out daily through mediums like social networking sites. However, the growing development has not been evident in most African languages due to the persisting resource limitations, among other issues. Yorùbá language, a tonal and morphologically rich African language, suffers a similar fate, resulting in limited NLP usage. To encourage further research towards improving this situation, this systematic literature review aims to comprehensively analyse studies addressing NLP development for Yorùbá, identifying challenges, resources, techniques, and applications. A well-defined search string from a structured protocol was employed to search, select, and analyse 105 primary studies between 2014 and 2024 from reputable databases. The review highlights the scarcity of annotated corpora, the limited availability of pre-trained language models, and linguistic challenges like tonal complexity and diacritic dependency as significant obstacles. It also revealed the prominent techniques, including rule-based methods, among others. The findings reveal a growing body of multilingual and monolingual resources, even though the field is constrained by socio-cultural fac...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine