3/6/2026 | USA | technology | ✓ Verified - arxiv.org

Are Multimodal LLMs Ready for Surveillance? A Reality Check on Zero-Shot Anomaly Detection in the Wild

#Multimodal LLMs #Surveillance #Anomaly Detection #Zero-Shot Learning #Real-World Applications

📌 Key Takeaways

Multimodal LLMs show potential for surveillance but face significant real-world limitations.
Zero-shot anomaly detection in uncontrolled environments is not yet reliable for practical use.
Current models struggle with contextual understanding and high-stakes accuracy requirements.
The study highlights a gap between academic benchmarks and operational surveillance needs.

📖 Full Retelling

arXiv:2603.04727v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) have demonstrated impressive general competence in video understanding, yet their reliability for real-world Video Anomaly Detection (VAD) remains largely unexplored. Unlike conventional pipelines relying on reconstruction or pose-based cues, MLLMs enable a paradigm shift: treating anomaly detection as a language-guided reasoning task. In this work, we systematically evaluate state-of-the-art MLLMs on the

🏷️ Themes

AI Surveillance, Technology Limitations

📚 Related People & Topics

Surveillance

Monitoring something for the purposes of influencing, protecting, or suppressing it

Surveillance is the systematic observation and monitoring of a person, population, or location, with the purpose of information-gathering, influencing, managing, or directing. It is widely used by governments for a variety of reasons, such as law enforcement, national security, and information aware...

View Profile → Wikipedia ↗

Entity Intersection Graph

Connections for Surveillance:

🏢 Cellebrite 1 shared

🌐 Human rights 1 shared

🌐 Phone hacking 1 shared

👤 Citizen Lab 1 shared

🌐 Illegal drug trade 1 shared

View full profile

Mentioned Entities

Surveillance

Monitoring something for the purposes of influencing, protecting, or suppressing it

}

Original Source

              --> Computer Science > Computer Vision and Pattern Recognition arXiv:2603.04727 [Submitted on 5 Mar 2026] Title: Are Multimodal LLMs Ready for Surveillance? A Reality Check on Zero-Shot Anomaly Detection in the Wild Authors: Shanle Yao , Armin Danesh Pazho , Narges Rashvand , Hamed Tabkhi View a PDF of the paper titled Are Multimodal LLMs Ready for Surveillance? A Reality Check on Zero-Shot Anomaly Detection in the Wild, by Shanle Yao and 3 other authors View PDF HTML Abstract: Multimodal large language models have demonstrated impressive general competence in video understanding, yet their reliability for real-world Video Anomaly Detection remains largely unexplored. Unlike conventional pipelines relying on reconstruction or pose-based cues, MLLMs enable a paradigm shift: treating anomaly detection as a language-guided reasoning task. In this work, we systematically evaluate state-of-the-art MLLMs on the ShanghaiTech and CHAD benchmarks by reformulating VAD as a binary classification task under weak temporal supervision. We investigate how prompt specificity and temporal window lengths (1s--3s) influence performance, focusing on the precision--recall trade-off. Our findings reveal a pronounced conservative bias in zero-shot settings; while models exhibit high confidence, they disproportionately favor the 'normal' class, resulting in high precision but a recall collapse that limits practical utility. We demonstrate that class-specific instructions can significantly shift this decision boundary, improving the peak F1-score on ShanghaiTech from 0.09 to 0.64, yet recall remains a critical bottleneck. These results highlight a significant performance gap for MLLMs in noisy environments and provide a foundation for future work in recall-oriented prompting and model calibration for open-world surveillance, which demands complex video understanding and reasoning. Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI) Cite as: arXiv:2603...
            

Read full article at source

Source

arxiv.org

Are Multimodal LLMs Ready for Surveillance? A Reality Check on Zero-Shot Anomaly Detection in the Wild

📌 Key Takeaways

📖 Full Retelling

🏷️ Themes

📚 Related People & Topics

Surveillance

Entity Intersection Graph

Mentioned Entities

Surveillance

Source

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine