3/6/2026 | USA | technology | ✓ Verified - arxiv.org

When Denoising Hinders: Revisiting Zero-Shot ASR with SAM-Audio and Whisper

#SAM-Audio #Whisper #zero-shot ASR #denoising #audio preprocessing #speech recognition #noise reduction

📌 Key Takeaways

SAM-Audio's denoising can degrade Whisper's zero-shot ASR performance in clean audio conditions.
The study highlights a trade-off between noise reduction and speech recognition accuracy in audio processing.
Researchers recommend selective use of denoising based on audio quality to optimize ASR results.
The findings challenge assumptions that preprocessing always benefits automatic speech recognition systems.

📖 Full Retelling

arXiv:2603.04710v1 Announce Type: cross Abstract: Recent advances in automatic speech recognition (ASR) and speech enhancement have led to a widespread assumption that improving perceptual audio quality should directly benefit recognition accuracy. In this work, we rigorously examine whether this assumption holds for modern zero-shot ASR systems. We present a systematic empirical study on the impact of Segment Anything Model Audio by Meta AI, a recent foundation-scale speech enhancement model p

🏷️ Themes

Audio Processing, Speech Recognition

📚 Related People & Topics

Whispering

Speech without vocal cord vibration

Whispering is an unvoiced mode of phonation in which the vocal cords are abducted so that they do not vibrate; air passes between the arytenoid cartilages to create audible turbulence during speech. Supralaryngeal articulation remains the same as in normal speech. In normal speech, the vocal cords a...

View Profile → Wikipedia ↗

Entity Intersection Graph

Connections for Whispering:

👤 Sony Pictures Television 1 shared

👤 Jake Humphrey 1 shared

👤 Sunil Patel 1 shared

View full profile

Mentioned Entities

Whispering

Speech without vocal cord vibration

}

Original Source

              --> Computer Science > Sound arXiv:2603.04710 [Submitted on 5 Mar 2026] Title: When Denoising Hinders: Revisiting Zero-Shot ASR with SAM-Audio and Whisper Authors: Akif Islam , Raufun Nahar , Md. Ekramul Hamid View a PDF of the paper titled When Denoising Hinders: Revisiting Zero-Shot ASR with SAM-Audio and Whisper, by Akif Islam and 2 other authors View PDF HTML Abstract: Recent advances in automatic speech recognition and speech enhancement have led to a widespread assumption that improving perceptual audio quality should directly benefit recognition accuracy. In this work, we rigorously examine whether this assumption holds for modern zero-shot ASR systems. We present a systematic empirical study on the impact of Segment Anything Model Audio by Meta AI, a recent foundation-scale speech enhancement model proposed by Meta, when used as a preprocessing step for zero-shot transcription with Whisper. Experiments are conducted across multiple Whisper model variants and two linguistically distinct noisy speech datasets: a real-world Bengali YouTube corpus and a publicly available English noisy dataset. Contrary to common intuition, our results show that SAM-Audio preprocessing consistently degrades ASR performance, increasing both Word Error Rate and Character Error Rate compared to raw noisy speech, despite substantial improvements in signal-level quality. Objective Peak Signal-to-Noise Ratio analysis on the English dataset confirms that SAM-Audio produces acoustically cleaner signals, yet this improvement fails to translate into recognition gains. Therefore, we conducted a detailed utterance-level analysis to understand this counterintuitive result. We found that the recognition degradation is a systematic issue affecting the majority of the audio, not just isolated outliers, and that the errors worsen as the Whisper model size increases. These findings expose a fundamental mismatch: audio that is perceptually cleaner to human listeners is not necessarily robust for m...
            

Read full article at source

Source

arxiv.org

When Denoising Hinders: Revisiting Zero-Shot ASR with SAM-Audio and Whisper

📌 Key Takeaways

📖 Full Retelling

🏷️ Themes

📚 Related People & Topics

Whispering

Entity Intersection Graph

Mentioned Entities

Whispering

Source

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine