#Vision‑Language Models
Latest news articles tagged with "Vision‑Language Models". Follow the timeline of events, related topics, and entities.
Articles (7)
-
🇺🇸 Designing Production-Scale OCR for India: Multilingual and Domain-Specific Systems
[USA]
arXiv:2602.16430v1 Announce Type: cross Abstract: Designing Optical Character Recognition (OCR) systems for India requires balancing linguistic diversity, document heterogeneity, and deployment const...
Related: #Multilingual OCR, #Document Heterogeneity, #Production‑scale Deployment, #India’s Linguistic Diversity -
🇺🇸 FUTURE-VLA: Forecasting Unified Trajectories Under Real-time Execution
[USA]
arXiv:2602.15882v1 Announce Type: cross Abstract: General vision-language models increasingly support unified spatiotemporal reasoning over long video streams, yet deploying such capabilities on robo...
Related: #Spatiotemporal Reasoning, #Robot Control, #Real‑Time Execution, #Latency Reduction -
🇺🇸 MC-LLaVA: Multi-Concept Personalized Vision-Language Model
[USA]
arXiv:2411.11706v4 Announce Type: replace-cross Abstract: Current vision-language models (VLMs) show exceptional abilities across diverse tasks, such as visual question answering. To enhance user exp...
Related: #Personalized AI, #Multi‑Concept Personalization, #User Experience, #Real‑World Applicability -
🇺🇸 Sparrow: Text-Anchored Window Attention with Visual-Semantic Glimpsing for Speculative Decoding in Video LLMs
[USA]
arXiv:2602.15318v1 Announce Type: cross Abstract: Although speculative decoding is widely used to accelerate Vision-Language Models (VLMs) inference, it faces severe performance collapse when applied...
Related: #Video Large Language Models, #Speculative Decoding, #Attention Mechanisms, #Cache Management -
🇺🇸 Visual Persuasion: What Influences Decisions of Vision-Language Models?
[USA]
arXiv:2602.15278v1 Announce Type: cross Abstract: The web is littered with images, once created for human consumption and now increasingly interpreted by agents using vision-language models (VLMs). T...
Related: #Algorithmic Decision‑Making, #Explainability in AI, #Visual Bias and Preference, #Experimental AI Evaluation -
🇺🇸 CARE Drive A Framework for Evaluating Reason-Responsiveness of Vision Language Models in Automated Driving
[USA]
arXiv:2602.15645v1 Announce Type: new Abstract: Foundation models, including vision language models, are increasingly used in automated driving to interpret scenes, recommend actions, and generate na...
Related: #Automated Driving, #Foundation Models, #Evaluation Methods, #Explainability -
🇺🇸 Lang2Act: Fine-Grained Visual Reasoning through Self-Emergent Linguistic Toolchains
[USA]
arXiv:2602.13235v1 Announce Type: new Abstract: Visual Retrieval-Augmented Generation (VRAG) enhances Vision-Language Models (VLMs) by incorporating external visual documents to address a given query...
Related: #Visual Retrieval‑Augmented Generation, #Self‑Emergent Toolchains, #Fine‑Grained Visual Reasoning, #Perception–Reasoning Integration