#Multimodal AI

Latest news articles tagged with "Multimodal AI". Follow the timeline of events, related topics, and entities.

Articles (7)

🇺🇸 HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models — 25/02/2026 [USA]
arXiv:2506.03922v2 Announce Type: replace-cross Abstract: Multimodal Large Language Models (MLLMs) have demonstrated significant potential to advance a broad range of domains. However, current benchm...
Related: #AI Benchmarking, #Interdisciplinary Research
🇺🇸 Echoes Over Time: Unlocking Length Generalization in Video-to-Audio Generation Models — 25/02/2026 [USA]
arXiv:2602.20981v1 Announce Type: cross Abstract: Scaling multimodal alignment between video and audio is challenging, particularly due to limited data and the mismatch between text descriptions and ...
Related: #Length Generalization, #Video-to-Audio Generation
🇺🇸 Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence — 25/02/2026 [USA]
arXiv:2502.17028v3 Announce Type: replace-cross Abstract: Vision-language alignment is crucial for various downstream tasks such as cross-modal generation and retrieval. Previous multimodal approache...
Related: #Machine Learning, #Vision-Language Alignment
🇺🇸 Improving MLLMs in Embodied Exploration and Question Answering with Human-Inspired Memory Modeling — 18/02/2026 [USA]
arXiv:2602.15513v1 Announce Type: cross Abstract: Deploying Multimodal Large Language Models as the brain of embodied agents remains challenging, particularly under long-horizon observations and limi...
Related: #Embodied Agents, #Memory Modeling, #Natural Language Processing, #Computer Vision
🇺🇸 Understanding vs. Generation: Navigating Optimization Dilemma in Multimodal Models — 18/02/2026 [USA]
arXiv:2602.15772v1 Announce Type: cross Abstract: Current research in multimodal models faces a key challenge where enhancing generative capabilities often comes at the expense of understanding, and ...
Related: #Generation vs. Understanding, #Model Optimization, #Reasoning and Reflection, #Trade‑off Analysis
🇺🇸 Vision Token Reduction via Attention-Driven Self-Compression for Efficient Multimodal Large Language Models — 16/02/2026 [USA]
arXiv:2602.12618v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) incur significant computational cost from processing numerous vision tokens through all LLM layers. Prior pr...
Related: #Computational efficiency, #Model optimization
🇺🇸 Artic: AI-oriented Real-time Communication for MLLM Video Assistant — 16/02/2026 [USA]
arXiv:2602.12641v1 Announce Type: cross Abstract: AI Video Assistant emerges as a new paradigm for Real-time Communication (RTC), where one peer is a Multimodal Large Language Model (MLLM) deployed i...
Related: #AI Communication, #Real-time Systems