Voices of the Mountains: Deep Learning-Based Vocal Error Detection System for Kurdish Maqams
#Kurdish Maqam#Deep Learning#Vocal Error Detection#Automatic Singing Assessment#Microtonal Music#Bayati-Kurd#Ethnomusicology#AI in Music
📌 Key Takeaways
Researchers developed a deep learning system specifically for detecting vocal errors in Kurdish maqam singing
Current ASA tools follow Western music rules and fail to recognize microtonal nuances of Kurdish maqam
The system was trained on 50 songs from 13 vocalists, focusing on pitch, rhythm, and modal stability errors
While showing promise for common error types, the system needs more balanced data for comprehensive error detection
📖 Full Retelling
Researchers Darvan Shvan Khairaldeen and Hossein Hassani introduced a groundbreaking deep learning-based vocal error detection system for Kurdish maqams on February 24, 2026, addressing a critical gap in music technology where existing assessment tools fail to recognize the microtonal nuances of traditional Kurdish singing styles. The researchers collected 50 songs from 13 vocalists, totaling 2-3 hours of material, and annotated 221 error spans across three categories: 150 fine pitch errors, 46 rhythm errors, and 25 modal drift errors. The data was processed by segmenting it into 15,199 overlapping windows and converting them to log-mel spectrograms before feeding into a sophisticated two-headed CNN-BiLSTM model with attention mechanism. After training for 20 epochs with early stopping at epoch 10, the model achieved a validation macro-F1 score of 0.468, demonstrating 39.4% recall and 25.8% precision on the full evaluation dataset. While the system showed promising results for common error types with F1 scores of 0.492 for fine pitch and 0.536 for rhythm errors, it struggled with modal drift detection (F1 of 0.133), highlighting the need for more balanced and comprehensive training data. This research represents the first attempt to develop automatic singing assessment specifically for Kurdish maqam music, which operates outside Western equal temperament, acknowledging the unique musical traditions of Kurdish culture while providing a technological tool for preserving and teaching these traditions.
🏷️ Themes
Cultural Preservation, Music Technology, Machine Learning Applications
Use in music of microtones (intervals smaller than a semitone)
Microtonality is the use in music of microtones — intervals smaller than a semitone, also called "microintervals". It may also be extended to include any music using intervals not found in the customary Western tuning of twelve equal intervals per octave. In other words, a microtone may be thought o...
In machine learning, deep learning focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience and revolves around stacking artificial neurons into layers and "training" t...
No entity connections available yet for this article.
Original Source
--> Computer Science > Sound arXiv:2602.20744 [Submitted on 24 Feb 2026] Title: Voices of the Mountains: Deep Learning-Based Vocal Error Detection System for Kurdish Maqams Authors: Darvan Shvan Khairaldeen , Hossein Hassani View a PDF of the paper titled Voices of the Mountains: Deep Learning-Based Vocal Error Detection System for Kurdish Maqams, by Darvan Shvan Khairaldeen and Hossein Hassani View PDF HTML Abstract: Maqam, a singing type, is a significant component of Kurdish music. A maqam singer receives training in a traditional face-to-face or through self-training. Automatic Singing Assessment uses machine learning to provide the accuracy of singing styles and can help learners to improve their performance through error detection. Currently, the available ASA tools follow Western music rules. The musical composition requires all notes to stay within their expected pitch range from start to finish. The system fails to detect micro-intervals and pitch bends, so it identifies Kurdish maqam singing as incorrect even though the singer performs according to traditional rules. Kurdish maqam requires recognizing performance errors within microtonal spaces, which is beyond Western equal temperament. This research is the first attempt to address the mentioned gap. While many error types happen during singing, our focus is on pitch, rhythm, and modal stability errors in the context of Bayati-Kurd. We collected 50 songs from 13 vocalists ( 2-3 hours) and annotated 221 error spans (150 fine pitch, 46 rhythm, 25 modal drift). The data was segmented into 15,199 overlapping windows and converted to log-mel spectrograms. We developed a two-headed CNN-BiLSTM with attention mode to decide whether a window contains an error and to classify it based on the chosen errors. Trained for 20 epochs with early stopping at epoch 10, the model reached a validation macro-F1 of 0.468. On the full 50-song evaluation at a 0.750 threshold, recall was 39.4% and precision 25.8% . Within detected...