Neural networks can be compromised with minimal label corruption while maintaining normal accuracy metrics
Traditional accuracy monitoring is insufficient for detecting data poisoning attacks in real-world scenarios
The structural vulnerability exists regardless of neural network architecture or attack method
A cryptographic defense system is proposed to verify data provenance in machine learning pipelines
📖 Full Retelling
Researcher Harrison Dahme published a groundbreaking paper titled 'Poisoned Acoustics' on the arXiv academic platform on February 25, 2026, revealing how training-data poisoning attacks can induce targeted, undetectable failure in deep neural networks by corrupting a vanishingly small fraction of training labels. The research demonstrated this vulnerability through acoustic vehicle classification using the MELAUDIS urban intersection dataset, showing how a compact 2-D convolutional neural network could be compromised with just 0.5% label corruption while maintaining normal aggregate accuracy metrics. The study specifically examined a Truck-to-Car label-flipping attack on approximately 9,600 audio clips across six vehicle classes, achieving a 95.7% Attack Success Rate (ASR) while showing zero detectable change in overall accuracy (87.6% baseline with 95% confidence interval between 88-100%). Dahme proves this stealth is structural - the maximum accuracy drop from a complete targeted attack is mathematically bounded above by the minority class fraction, making traditional monitoring insufficient for real-world class imbalances. The research also discovered a novel phenomenon called 'trigger-dominance collapse' in backdoor trigger attacks, where spectrogram patch triggers become functionally redundant when targeting minority classes. In response, Dahme proposes a trust-minimized defense system combining content-addressed artifact hashing, Merkle-tree dataset commitment, and post-quantum digital signatures for cryptographically verifiable data provenance in machine learning applications.
🏷️ Themes
Cybersecurity, Artificial Intelligence, Data Integrity
A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either biological cells or mathematical models. While individual neurons are simple, many of them together in a network can perform complex tasks.
Research field that lies at the intersection of machine learning and computer security
Adversarial machine learning is the study of the attacks on machine learning algorithms, and of the defenses against such attacks.
Machine learning techniques are mostly designed to work on specific problem sets, under the assumption that the training and test data are generated from the same statis...
--> Computer Science > Cryptography and Security arXiv:2602.22258 [Submitted on 25 Feb 2026] Title: Poisoned Acoustics Authors: Harrison Dahme View a PDF of the paper titled Poisoned Acoustics, by Harrison Dahme View PDF HTML Abstract: Training-data poisoning attacks can induce targeted, undetectable failure in deep neural networks by corrupting a vanishingly small fraction of training labels. We demonstrate this on acoustic vehicle classification using the MELAUDIS urban intersection dataset (approx. 9,600 audio clips, 6 classes): a compact 2-D convolutional neural network trained on log-mel spectrograms achieves 95.7% Attack Success Rate -- the fraction of target-class test samples misclassified under the attack -- on a Truck-to-Car label-flipping attack at just p=0.5% corruption (48 records), with zero detectable change in aggregate accuracy (87.6% baseline; 95% CI: 88-100%, n=3 seeds). We prove this stealth is structural: the maximum accuracy drop from a complete targeted attack is bounded above by the minority class fraction . For real-world class imbalances (Truck approx. 3%), this bound falls below training-run noise, making aggregate accuracy monitoring provably insufficient regardless of architecture or attack method. A companion backdoor trigger attack reveals a novel trigger-dominance collapse: when the target class is a dataset minority, the spectrogram patch trigger becomes functionally redundant--clean ASR equals triggered ASR, and the attack degenerates to pure label flipping. We formalize the ML training pipeline as an attack surface and propose a trust-minimized defense combining content-addressed artifact hashing, Merkle-tree dataset commitment, and post-quantum digital signatures (ML-DSA-65/CRYSTALS-Dilithium3, NIST FIPS 204) for cryptographically verifiable data provenance. Comments: 5 Pages Subjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI) Cite as: arXiv:2602.22258 [cs.CR] (or arXiv:2602.22258v1 [cs.CR] for this versi...