#Multimodal Models
Latest news articles tagged with "Multimodal Models". Follow the timeline of events, related topics, and entities.
Articles (14)
-
πΊπΈ SALLIE: Safeguarding Against Latent Language & Image Exploits
[USA]
arXiv:2604.06247v1 Announce Type: cross Abstract: Large Language Models (LLMs) and Vision-Language Models (VLMs) remain highly vulnerable to textual and visual jailbreaks, as well as prompt injection...
Related: #AI Security, #Research Innovation -
πΊπΈ Perceptio: Perception Enhanced Vision Language Models via Spatial Token Generation
[USA]
arXiv:2603.18795v1 Announce Type: cross Abstract: Large Vision Language Models (LVLMs) excel at semantic understanding but struggle with fine grained spatial grounding, as the model must implicitly i...
Related: #AI Enhancement -
πΊπΈ From Drop-off to Recovery: A Mechanistic Analysis of Segmentation in MLLMs
[USA]
arXiv:2603.17228v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) are increasingly applied to pixel-level vision tasks, yet their intrinsic capacity for spatial understanding...
Related: #AI Research -
πΊπΈ UniSAFE: A Comprehensive Benchmark for Safety Evaluation of Unified Multimodal Models
[USA]
arXiv:2603.17476v1 Announce Type: cross Abstract: Unified Multimodal Models (UMMs) offer powerful cross-modality capabilities but introduce new safety risks not observed in single-task models. Despit...
Related: #AI Safety -
πΊπΈ Fine-Grained Post-Training Quantization for Large Vision Language Models with Quantization-Aware Integrated Gradients
[USA]
arXiv:2603.17809v1 Announce Type: cross Abstract: Large Vision Language Models (LVLMs) have achieved remarkable success in a range of downstream tasks that require multimodal interaction, but their c...
Related: #AI Compression -
πΊπΈ Meta-TTRL: A Metacognitive Framework for Self-Improving Test-Time Reinforcement Learning in Unified Multimodal Models
[USA]
arXiv:2603.15724v1 Announce Type: cross Abstract: Existing test-time scaling (TTS) methods for unified multimodal models (UMMs) in text-to-image (T2I) generation primarily rely on search or sampling ...
Related: #AI Reinforcement Learning -
πΊπΈ ExpressMind: A Multimodal Pretrained Large Language Model for Expressway Operation
[USA]
arXiv:2603.16495v1 Announce Type: new Abstract: The current expressway operation relies on rule-based and isolated models, which limits the ability to jointly analyze knowledge across different syste...
Related: #AI in Transportation -
πΊπΈ vla-eval: A Unified Evaluation Harness for Vision-Language-Action Models
[USA]
arXiv:2603.13966v1 Announce Type: new Abstract: Vision Language Action VLA models are typically evaluated using per benchmark scripts maintained independently by each model repository, leading to dup...
Related: #AI Evaluation -
πΊπΈ Multimodal Continual Learning with MLLMs from Multi-scenario Perspectives
[USA]
arXiv:2511.18507v3 Announce Type: replace-cross Abstract: Multimodal large language models (MLLMs) deployed on devices must adapt to continuously changing visual scenarios such as variations in backg...
Related: #AI Learning -
πΊπΈ OOD-MMSafe: Advancing MLLM Safety from Harmful Intent to Hidden Consequences
[USA]
arXiv:2603.09706v1 Announce Type: new Abstract: While safety alignment for Multimodal Large Language Models (MLLMs) has gained significant attention, current paradigms primarily target malicious inte...
Related: #AI Safety -
πΊπΈ Graph-of-Mark: Promote Spatial Reasoning in Multimodal Language Models with Graph-Based Visual Prompting
[USA]
arXiv:2603.06663v1 Announce Type: cross Abstract: Recent advances in training-free visual prompting, such as Set-of-Mark, have emerged as a promising direction for enhancing the grounding capabilitie...
Related: #AI Research, #Spatial Reasoning -
πΊπΈ PyVision-RL: Forging Open Agentic Vision Models via RL
[USA]
arXiv:2602.20739v1 Announce Type: new Abstract: Reinforcement learning for agentic multimodal models often suffers from interaction collapse, where models learn to reduce tool usage and multi-turn re...
Related: #Artificial Intelligence, #Reinforcement Learning, #Computer Vision -
πΊπΈ SGM: Safety Glasses for Multimodal Large Language Models via Neuron-Level Detoxification
[USA]
arXiv:2512.15052v3 Announce Type: replace-cross Abstract: Disclaimer: Samples in this paper may be harmful and cause discomfort. Multimodal large language models (MLLMs) enable multimodal generatio...
Related: #AI Safety, #Neural Interventions -
πΊπΈ MLLM-CTBench: A Benchmark for Continual Instruction Tuning with Reasoning Process Diagnosis
[USA]
arXiv:2508.08275v3 Announce Type: replace-cross Abstract: Continual instruction tuning(CIT) during the post-training phase is crucial for adapting multimodal large language models (MLLMs) to evolving...
Related: #Artificial Intelligence, #Machine Learning Benchmarking
Key Entities (7)
- AI safety (3 news)
- Artificial intelligence (2 news)
- CIT (1 news)
- Harmful Intent (1 news)
- Reinforcement learning (1 news)
- Multimodal learning (1 news)
- Computer vision (1 news)
About the topic: Multimodal Models
The topic "Multimodal Models" aggregates 14+ news articles from various countries.