SP
BravenNow
Are Multimodal LLMs Ready for Surveillance? A Reality Check on Zero-Shot Anomaly Detection in the Wild
| USA | technology | βœ“ Verified - arxiv.org

Are Multimodal LLMs Ready for Surveillance? A Reality Check on Zero-Shot Anomaly Detection in the Wild

#Multimodal LLMs #Surveillance #Anomaly Detection #Zero-Shot Learning #Real-World Applications

πŸ“Œ Key Takeaways

  • Multimodal LLMs show potential for surveillance but face significant real-world limitations.
  • Zero-shot anomaly detection in uncontrolled environments is not yet reliable for practical use.
  • Current models struggle with contextual understanding and high-stakes accuracy requirements.
  • The study highlights a gap between academic benchmarks and operational surveillance needs.

πŸ“– Full Retelling

arXiv:2603.04727v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) have demonstrated impressive general competence in video understanding, yet their reliability for real-world Video Anomaly Detection (VAD) remains largely unexplored. Unlike conventional pipelines relying on reconstruction or pose-based cues, MLLMs enable a paradigm shift: treating anomaly detection as a language-guided reasoning task. In this work, we systematically evaluate state-of-the-art MLLMs on the

🏷️ Themes

AI Surveillance, Technology Limitations

πŸ“š Related People & Topics

Surveillance

Surveillance

Monitoring something for the purposes of influencing, protecting, or suppressing it

Surveillance is the systematic observation and monitoring of a person, population, or location, with the purpose of information-gathering, influencing, managing, or directing. It is widely used by governments for a variety of reasons, such as law enforcement, national security, and information aware...

View Profile β†’ Wikipedia β†—

Entity Intersection Graph

Connections for Surveillance:

🏒 Cellebrite 1 shared
🌐 Human rights 1 shared
🌐 Phone hacking 1 shared
πŸ‘€ Citizen Lab 1 shared
🌐 Illegal drug trade 1 shared
View full profile

Mentioned Entities

Surveillance

Surveillance

Monitoring something for the purposes of influencing, protecting, or suppressing it

}
Original Source
--> Computer Science > Computer Vision and Pattern Recognition arXiv:2603.04727 [Submitted on 5 Mar 2026] Title: Are Multimodal LLMs Ready for Surveillance? A Reality Check on Zero-Shot Anomaly Detection in the Wild Authors: Shanle Yao , Armin Danesh Pazho , Narges Rashvand , Hamed Tabkhi View a PDF of the paper titled Are Multimodal LLMs Ready for Surveillance? A Reality Check on Zero-Shot Anomaly Detection in the Wild, by Shanle Yao and 3 other authors View PDF HTML Abstract: Multimodal large language models have demonstrated impressive general competence in video understanding, yet their reliability for real-world Video Anomaly Detection remains largely unexplored. Unlike conventional pipelines relying on reconstruction or pose-based cues, MLLMs enable a paradigm shift: treating anomaly detection as a language-guided reasoning task. In this work, we systematically evaluate state-of-the-art MLLMs on the ShanghaiTech and CHAD benchmarks by reformulating VAD as a binary classification task under weak temporal supervision. We investigate how prompt specificity and temporal window lengths (1s--3s) influence performance, focusing on the precision--recall trade-off. Our findings reveal a pronounced conservative bias in zero-shot settings; while models exhibit high confidence, they disproportionately favor the 'normal' class, resulting in high precision but a recall collapse that limits practical utility. We demonstrate that class-specific instructions can significantly shift this decision boundary, improving the peak F1-score on ShanghaiTech from 0.09 to 0.64, yet recall remains a critical bottleneck. These results highlight a significant performance gap for MLLMs in noisy environments and provide a foundation for future work in recall-oriented prompting and model calibration for open-world surveillance, which demands complex video understanding and reasoning. Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI) Cite as: arXiv:2603...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

πŸ‡¬πŸ‡§ United Kingdom

πŸ‡ΊπŸ‡¦ Ukraine