#Computer Vision

Latest news articles tagged with "Computer Vision". Follow the timeline of events, related topics, and entities.

Articles (30)

🇺🇸 Beyond Dominant Patches: Spatial Credit Redistribution For Grounded Vision-Language Models — 27/02/2026 [USA]
arXiv:2602.22469v1 Announce Type: cross Abstract: Vision-language models (VLMs) frequently hallucinate objects absent from the input image. We trace this failure to spatial credit collapse: activatio...
Related: #Artificial Intelligence, #Machine Learning
🇺🇸 AeroDGS: Physically Consistent Dynamic Gaussian Splatting for Single-Sequence Aerial 4D Reconstruction — 27/02/2026 [USA]
arXiv:2602.22376v1 Announce Type: cross Abstract: Recent advances in 4D scene reconstruction have significantly improved dynamic modeling across various domains. However, existing approaches remain l...
Related: #3D Reconstruction, #Aerial Imaging
🇺🇸 Autoregressive Visual Decoding from EEG Signals — 27/02/2026 [USA]
arXiv:2602.22555v1 Announce Type: cross Abstract: Electroencephalogram (EEG) signals have become a popular medium for decoding visual information due to their cost-effectiveness and high temporal res...
Related: #Machine Learning, #Brain-Computer Interface, #Neuroscience
🇺🇸 To Deceive is to Teach? Forging Perceptual Robustness via Adversarial Reinforcement Learning — 27/02/2026 [USA]
arXiv:2602.22227v1 Announce Type: cross Abstract: Despite their impressive capabilities, Multimodal Large Language Models (MLLMs) exhibit perceptual fragility when confronted with visually complex sc...
Related: #Machine Learning, #Artificial Intelligence, #Model Robustness
🇺🇸 Quality-Aware Robust Multi-View Clustering for Heterogeneous Observation Noise — 27/02/2026 [USA]
arXiv:2602.22568v1 Announce Type: cross Abstract: Deep multi-view clustering has achieved remarkable progress but remains vulnerable to complex noise in real-world applications. Existing noisy robust...
Related: #Artificial Intelligence, #Data Clustering
🇺🇸 BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model — 27/02/2026 [USA]
arXiv:2602.22596v1 Announce Type: cross Abstract: We present BetterScene, an approach to enhance novel view synthesis (NVS) quality for diverse real-world scenes using extremely sparse, unconstrained...
Related: #3D Scene Synthesis, #Generative Models
🇺🇸 CGSA: Class-Guided Slot-Aware Adaptation for Source-Free Object Detection — 27/02/2026 [USA]
arXiv:2602.22621v1 Announce Type: cross Abstract: Source-Free Domain Adaptive Object Detection (SF-DAOD) aims to adapt a detector trained on a labeled source domain to an unlabeled target domain with...
Related: #Machine Learning, #Privacy-Preserving AI
🇺🇸 Interpretable Medical Image Classification using Prototype Learning and Privileged Information — 25/02/2026 [USA]
arXiv:2310.15741v1 Announce Type: cross Abstract: Interpretability is often an essential requirement in medical imaging. Advanced deep learning methods are required to address this need for explainab...
Related: #Medical AI, #Explainable AI
🇺🇸 Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation — 25/02/2026 [USA]
arXiv:2602.20200v1 Announce Type: cross Abstract: Hierarchical Vision-Language-Action (VLA) models have rapidly become a dominant paradigm for robotic manipulation. It typically comprising a Vision-L...
Related: #Robotics, #Artificial Intelligence
🇺🇸 EKF-Based Depth Camera and Deep Learning Fusion for UAV-Person Distance Estimation and Following in SAR Operations — 25/02/2026 [USA]
arXiv:2602.20958v1 Announce Type: cross Abstract: Search and rescue (SAR) operations require rapid responses to save lives or property. Unmanned Aerial Vehicles (UAVs) equipped with vision-based syst...
Related: #Search and Rescue Technology, #UAV Robotics
🇺🇸 Peering into the Unknown: Active View Selection with Neural Uncertainty Maps for 3D Reconstruction — 25/02/2026 [USA]
arXiv:2506.14856v2 Announce Type: replace-cross Abstract: Some perspectives naturally provide more information than others. How can an AI system determine which viewpoint offers the most valuable ins...
Related: #Artificial Intelligence, #3D Reconstruction, #Machine Learning Efficiency
🇺🇸 NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning — 25/02/2026 [USA]
arXiv:2602.21172v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models are advancing autonomous driving by replacing modular pipelines with unified end-to-end architectures. However, cur...
Related: #Artificial Intelligence, #Autonomous Driving, #Data Efficiency
🇺🇸 Vision-Language Models for Ergonomic Assessment of Manual Lifting Tasks: Estimating Horizontal and Vertical Hand Distances from RGB Video — 25/02/2026 [USA]
arXiv:2602.20658v1 Announce Type: cross Abstract: Manual lifting tasks are a major contributor to work-related musculoskeletal disorders, and effective ergonomic risk assessment is essential for quan...
Related: #Ergonomics, #Workplace Safety
🇺🇸 VAUQ: Vision-Aware Uncertainty Quantification for LVLM Self-Evaluation — 25/02/2026 [USA]
arXiv:2602.21054v1 Announce Type: cross Abstract: Large Vision-Language Models (LVLMs) frequently hallucinate, limiting their safe deployment in real-world applications. Existing LLM self-evaluation ...
Related: #Artificial Intelligence, #Model Evaluation, #AI Safety
🇺🇸 PyVision-RL: Forging Open Agentic Vision Models via RL — 25/02/2026 [USA]
arXiv:2602.20739v1 Announce Type: new Abstract: Reinforcement learning for agentic multimodal models often suffers from interaction collapse, where models learn to reduce tool usage and multi-turn re...
Related: #Artificial Intelligence, #Reinforcement Learning, #Multimodal Models
🇺🇸 How Do Inpainting Artifacts Propagate to Language? — 25/02/2026 [USA]
arXiv:2602.20520v1 Announce Type: cross Abstract: We study how visual artifacts introduced by diffusion-based inpainting affect language generation in vision-language models. We use a two-stage diagn...
Related: #Artificial Intelligence, #Multimodal Systems, #Image Reconstruction
🇺🇸 SurgAtt-Tracker: Online Surgical Attention Tracking via Temporal Proposal Reranking and Motion-Aware Refinement — 25/02/2026 [USA]
arXiv:2602.20636v1 Announce Type: cross Abstract: Accurate and stable field-of-view (FoV) guidance is critical for safe and efficient minimally invasive surgery, yet existing approaches often conflat...
Related: #Medical Technology, #Surgical Innovation
🇺🇸 Onboard-Targeted Segmentation of Straylight in Space Camera Sensors — 25/02/2026 [USA]
arXiv:2602.20709v1 Announce Type: cross Abstract: This study details an artificial intelligence (AI)-based methodology for the semantic segmentation of space camera faults. Specifically, we address t...
Related: #Artificial Intelligence, #Space Technology
🇺🇸 3D Scene Rendering with Multimodal Gaussian Splatting — 20/02/2026 [USA]
arXiv:2602.17124v1 Announce Type: cross Abstract: 3D scene reconstruction and rendering are core tasks in computer vision, with applications spanning industrial monitoring, robotics, and autonomous d...
Related: #Pattern Recognition, #Radio‑Frequency (RF) Sensing, #Multimodal Fusion, #3D Reconstruction
🇺🇸 A High-Level Survey of Optical Remote Sensing — 20/02/2026 [USA]
arXiv:2602.17397v1 Announce Type: cross Abstract: In recent years, significant advances in computer vision have also propelled progress in remote sensing. Concurrently, the use of drones has expanded...
Related: #Remote Sensing, #Drone Technology, #Datasets, #Artificial Intelligence
🇺🇸 Tracing Copied Pixels and Regularizing Patch Affinity in Copy Detection — 20/02/2026 [USA]
arXiv:2602.17484v1 Announce Type: cross Abstract: Image Copy Detection (ICD) aims to identify manipulated content between image pairs through robust feature representation learning. While self-superv...
Related: #Image Forensics, #Self‑Supervised Learning, #Contrastive Representation Learning, #Geometric Reasoning
🇺🇸 DocSplit: A Comprehensive Benchmark Dataset and Evaluation Approach for Document Packet Recognition and Splitting — 19/02/2026 [USA]
arXiv:2602.15958v1 Announce Type: cross Abstract: Document understanding in real-world applications often requires processing heterogeneous, multi-page document packets containing multiple documents ...
Related: #Document Understanding, #Data Annotation, #Benchmark Datasets, #Machine Learning
🇺🇸 A Survey: Spatiotemporal Consistency in Video Generation — 19/02/2026 [USA]
arXiv:2502.17863v2 Announce Type: replace-cross Abstract: Video generation aims to produce temporally coherent sequences of visual frames, representing a pivotal advancement in Artificial Intelligenc...
Related: #Artificial Intelligence Generated Content, #Video Generation, #Spatiotemporal Consistency, #Temporal Coherence
🇺🇸 Attention-gated U-Net model for semantic segmentation of brain tumors and feature extraction for survival prognosis — 18/02/2026 [USA]
arXiv:2602.15067v1 Announce Type: new Abstract: Gliomas, among the most common primary brain tumors, vary widely in aggressiveness, prognosis, and histology, making treatment challenging due to compl...
Related: #Medical Imaging, #Deep Learning, #Neuro-Oncology, #Survival Prediction
🇺🇸 Improving MLLMs in Embodied Exploration and Question Answering with Human-Inspired Memory Modeling — 18/02/2026 [USA]
arXiv:2602.15513v1 Announce Type: cross Abstract: Deploying Multimodal Large Language Models as the brain of embodied agents remains challenging, particularly under long-horizon observations and limi...
Related: #Multimodal AI, #Embodied Agents, #Memory Modeling, #Natural Language Processing
🇺🇸 Learning to Retrieve Navigable Candidates for Efficient Vision-and-Language Navigation — 18/02/2026 [USA]
arXiv:2602.15724v1 Announce Type: cross Abstract: Vision-and-Language Navigation (VLN) requires an agent to follow natural-language instructions and navigate through previously unseen environments. R...
Related: #Artificial Intelligence, #Natural Language Processing, #Navigation, #Efficiency
🇺🇸 Language-Guided Invariance Probing of Vision-Language Models — 16/02/2026 [USA]
arXiv:2511.13494v1 Announce Type: cross Abstract: Recent vision-language models (VLMs) such as CLIP, OpenCLIP, EVA02-CLIP and SigLIP achieve strong zero-shot performance, but it is unclear how reliab...
Related: #Artificial Intelligence, #Natural Language Processing, #Model Evaluation
🇺🇸 Multi-Task Learning with Additive U-Net for Image Denoising and Classification — 16/02/2026 [USA]
arXiv:2602.12649v1 Announce Type: cross Abstract: We investigate additive skip fusion in U-Net architectures for image denoising and denoising-centric multi-task learning (MTL). By replacing concaten...
Related: #Neural Network Architecture, #Multi-Task Learning
🇺🇸 EPRBench: A High-Quality Benchmark Dataset for Event Stream Based Visual Place Recognition — 16/02/2026 [USA]
arXiv:2602.12919v1 Announce Type: cross Abstract: Event stream-based Visual Place Recognition (VPR) is an emerging research direction that offers a compelling solution to the instability of conventio...
Related: #Benchmark Development, #Event-Based Imaging
🇺🇸 Invert4TVG: A Temporal Video Grounding Framework with Inversion Tasks Preserving Action Understanding Ability — 16/02/2026 [USA]
arXiv:2508.07388v2 Announce Type: replace Abstract: Temporal Video Grounding (TVG) aims to localize video segments corresponding to a given textual query, which often describes human actions. However...
Related: #AI Research, #Video Analysis