#Multimodal Models

Latest news articles tagged with "Multimodal Models". Follow the timeline of events, related topics, and entities.

Articles (14)

🇺🇸 SALLIE: Safeguarding Against Latent Language & Image Exploits — 09/04/2026 [USA]
arXiv:2604.06247v1 Announce Type: cross Abstract: Large Language Models (LLMs) and Vision-Language Models (VLMs) remain highly vulnerable to textual and visual jailbreaks, as well as prompt injection...
Related: #AI Security, #Research Innovation
🇺🇸 Perceptio: Perception Enhanced Vision Language Models via Spatial Token Generation — 20/03/2026 [USA]
arXiv:2603.18795v1 Announce Type: cross Abstract: Large Vision Language Models (LVLMs) excel at semantic understanding but struggle with fine grained spatial grounding, as the model must implicitly i...
Related: #AI Enhancement
🇺🇸 From Drop-off to Recovery: A Mechanistic Analysis of Segmentation in MLLMs — 19/03/2026 [USA]
arXiv:2603.17228v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) are increasingly applied to pixel-level vision tasks, yet their intrinsic capacity for spatial understanding...
Related: #AI Research
🇺🇸 UniSAFE: A Comprehensive Benchmark for Safety Evaluation of Unified Multimodal Models — 19/03/2026 [USA]
arXiv:2603.17476v1 Announce Type: cross Abstract: Unified Multimodal Models (UMMs) offer powerful cross-modality capabilities but introduce new safety risks not observed in single-task models. Despit...
Related: #AI Safety
🇺🇸 Fine-Grained Post-Training Quantization for Large Vision Language Models with Quantization-Aware Integrated Gradients — 19/03/2026 [USA]
arXiv:2603.17809v1 Announce Type: cross Abstract: Large Vision Language Models (LVLMs) have achieved remarkable success in a range of downstream tasks that require multimodal interaction, but their c...
Related: #AI Compression
🇺🇸 Meta-TTRL: A Metacognitive Framework for Self-Improving Test-Time Reinforcement Learning in Unified Multimodal Models — 18/03/2026 [USA]
arXiv:2603.15724v1 Announce Type: cross Abstract: Existing test-time scaling (TTS) methods for unified multimodal models (UMMs) in text-to-image (T2I) generation primarily rely on search or sampling ...
Related: #AI Reinforcement Learning
🇺🇸 ExpressMind: A Multimodal Pretrained Large Language Model for Expressway Operation — 18/03/2026 [USA]
arXiv:2603.16495v1 Announce Type: new Abstract: The current expressway operation relies on rule-based and isolated models, which limits the ability to jointly analyze knowledge across different syste...
Related: #AI in Transportation
🇺🇸 vla-eval: A Unified Evaluation Harness for Vision-Language-Action Models — 17/03/2026 [USA]
arXiv:2603.13966v1 Announce Type: new Abstract: Vision Language Action VLA models are typically evaluated using per benchmark scripts maintained independently by each model repository, leading to dup...
Related: #AI Evaluation
🇺🇸 Multimodal Continual Learning with MLLMs from Multi-scenario Perspectives — 16/03/2026 [USA]
arXiv:2511.18507v3 Announce Type: replace-cross Abstract: Multimodal large language models (MLLMs) deployed on devices must adapt to continuously changing visual scenarios such as variations in backg...
Related: #AI Learning
🇺🇸 OOD-MMSafe: Advancing MLLM Safety from Harmful Intent to Hidden Consequences — 11/03/2026 [USA]
arXiv:2603.09706v1 Announce Type: new Abstract: While safety alignment for Multimodal Large Language Models (MLLMs) has gained significant attention, current paradigms primarily target malicious inte...
Related: #AI Safety
🇺🇸 Graph-of-Mark: Promote Spatial Reasoning in Multimodal Language Models with Graph-Based Visual Prompting — 10/03/2026 [USA]
arXiv:2603.06663v1 Announce Type: cross Abstract: Recent advances in training-free visual prompting, such as Set-of-Mark, have emerged as a promising direction for enhancing the grounding capabilitie...
Related: #AI Research, #Spatial Reasoning
🇺🇸 PyVision-RL: Forging Open Agentic Vision Models via RL — 25/02/2026 [USA]
arXiv:2602.20739v1 Announce Type: new Abstract: Reinforcement learning for agentic multimodal models often suffers from interaction collapse, where models learn to reduce tool usage and multi-turn re...
Related: #Artificial Intelligence, #Reinforcement Learning, #Computer Vision
🇺🇸 SGM: Safety Glasses for Multimodal Large Language Models via Neuron-Level Detoxification — 16/02/2026 [USA]
arXiv:2512.15052v3 Announce Type: replace-cross Abstract: Disclaimer: Samples in this paper may be harmful and cause discomfort. Multimodal large language models (MLLMs) enable multimodal generatio...
Related: #AI Safety, #Neural Interventions
🇺🇸 MLLM-CTBench: A Benchmark for Continual Instruction Tuning with Reasoning Process Diagnosis — 16/02/2026 [USA]
arXiv:2508.08275v3 Announce Type: replace-cross Abstract: Continual instruction tuning(CIT) during the post-training phase is crucial for adapting multimodal large language models (MLLMs) to evolving...
Related: #Artificial Intelligence, #Machine Learning Benchmarking

Key Entities (7)

About the topic: Multimodal Models

The topic "Multimodal Models" aggregates 14+ news articles from various countries.