#Computer Vision
Latest news articles tagged with "Computer Vision". Follow the timeline of events, related topics, and entities.
Articles (30)
-
πΊπΈ Beyond Dominant Patches: Spatial Credit Redistribution For Grounded Vision-Language Models
[USA]
arXiv:2602.22469v1 Announce Type: cross Abstract: Vision-language models (VLMs) frequently hallucinate objects absent from the input image. We trace this failure to spatial credit collapse: activatio...
Related: #Artificial Intelligence, #Machine Learning -
πΊπΈ AeroDGS: Physically Consistent Dynamic Gaussian Splatting for Single-Sequence Aerial 4D Reconstruction
[USA]
arXiv:2602.22376v1 Announce Type: cross Abstract: Recent advances in 4D scene reconstruction have significantly improved dynamic modeling across various domains. However, existing approaches remain l...
Related: #3D Reconstruction, #Aerial Imaging -
πΊπΈ Autoregressive Visual Decoding from EEG Signals
[USA]
arXiv:2602.22555v1 Announce Type: cross Abstract: Electroencephalogram (EEG) signals have become a popular medium for decoding visual information due to their cost-effectiveness and high temporal res...
Related: #Machine Learning, #Brain-Computer Interface, #Neuroscience -
πΊπΈ To Deceive is to Teach? Forging Perceptual Robustness via Adversarial Reinforcement Learning
[USA]
arXiv:2602.22227v1 Announce Type: cross Abstract: Despite their impressive capabilities, Multimodal Large Language Models (MLLMs) exhibit perceptual fragility when confronted with visually complex sc...
Related: #Machine Learning, #Artificial Intelligence, #Model Robustness -
πΊπΈ Quality-Aware Robust Multi-View Clustering for Heterogeneous Observation Noise
[USA]
arXiv:2602.22568v1 Announce Type: cross Abstract: Deep multi-view clustering has achieved remarkable progress but remains vulnerable to complex noise in real-world applications. Existing noisy robust...
Related: #Artificial Intelligence, #Data Clustering -
πΊπΈ BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model
[USA]
arXiv:2602.22596v1 Announce Type: cross Abstract: We present BetterScene, an approach to enhance novel view synthesis (NVS) quality for diverse real-world scenes using extremely sparse, unconstrained...
Related: #3D Scene Synthesis, #Generative Models -
πΊπΈ CGSA: Class-Guided Slot-Aware Adaptation for Source-Free Object Detection
[USA]
arXiv:2602.22621v1 Announce Type: cross Abstract: Source-Free Domain Adaptive Object Detection (SF-DAOD) aims to adapt a detector trained on a labeled source domain to an unlabeled target domain with...
Related: #Machine Learning, #Privacy-Preserving AI -
πΊπΈ Interpretable Medical Image Classification using Prototype Learning and Privileged Information
[USA]
arXiv:2310.15741v1 Announce Type: cross Abstract: Interpretability is often an essential requirement in medical imaging. Advanced deep learning methods are required to address this need for explainab...
Related: #Medical AI, #Explainable AI -
πΊπΈ Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation
[USA]
arXiv:2602.20200v1 Announce Type: cross Abstract: Hierarchical Vision-Language-Action (VLA) models have rapidly become a dominant paradigm for robotic manipulation. It typically comprising a Vision-L...
Related: #Robotics, #Artificial Intelligence -
πΊπΈ EKF-Based Depth Camera and Deep Learning Fusion for UAV-Person Distance Estimation and Following in SAR Operations
[USA]
arXiv:2602.20958v1 Announce Type: cross Abstract: Search and rescue (SAR) operations require rapid responses to save lives or property. Unmanned Aerial Vehicles (UAVs) equipped with vision-based syst...
Related: #Search and Rescue Technology, #UAV Robotics -
πΊπΈ Peering into the Unknown: Active View Selection with Neural Uncertainty Maps for 3D Reconstruction
[USA]
arXiv:2506.14856v2 Announce Type: replace-cross Abstract: Some perspectives naturally provide more information than others. How can an AI system determine which viewpoint offers the most valuable ins...
Related: #Artificial Intelligence, #3D Reconstruction, #Machine Learning Efficiency -
πΊπΈ NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning
[USA]
arXiv:2602.21172v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models are advancing autonomous driving by replacing modular pipelines with unified end-to-end architectures. However, cur...
Related: #Artificial Intelligence, #Autonomous Driving, #Data Efficiency -
πΊπΈ Vision-Language Models for Ergonomic Assessment of Manual Lifting Tasks: Estimating Horizontal and Vertical Hand Distances from RGB Video
[USA]
arXiv:2602.20658v1 Announce Type: cross Abstract: Manual lifting tasks are a major contributor to work-related musculoskeletal disorders, and effective ergonomic risk assessment is essential for quan...
Related: #Ergonomics, #Workplace Safety -
πΊπΈ VAUQ: Vision-Aware Uncertainty Quantification for LVLM Self-Evaluation
[USA]
arXiv:2602.21054v1 Announce Type: cross Abstract: Large Vision-Language Models (LVLMs) frequently hallucinate, limiting their safe deployment in real-world applications. Existing LLM self-evaluation ...
Related: #Artificial Intelligence, #Model Evaluation, #AI Safety -
πΊπΈ PyVision-RL: Forging Open Agentic Vision Models via RL
[USA]
arXiv:2602.20739v1 Announce Type: new Abstract: Reinforcement learning for agentic multimodal models often suffers from interaction collapse, where models learn to reduce tool usage and multi-turn re...
Related: #Artificial Intelligence, #Reinforcement Learning, #Multimodal Models -
πΊπΈ How Do Inpainting Artifacts Propagate to Language?
[USA]
arXiv:2602.20520v1 Announce Type: cross Abstract: We study how visual artifacts introduced by diffusion-based inpainting affect language generation in vision-language models. We use a two-stage diagn...
Related: #Artificial Intelligence, #Multimodal Systems, #Image Reconstruction -
πΊπΈ SurgAtt-Tracker: Online Surgical Attention Tracking via Temporal Proposal Reranking and Motion-Aware Refinement
[USA]
arXiv:2602.20636v1 Announce Type: cross Abstract: Accurate and stable field-of-view (FoV) guidance is critical for safe and efficient minimally invasive surgery, yet existing approaches often conflat...
Related: #Medical Technology, #Surgical Innovation -
πΊπΈ Onboard-Targeted Segmentation of Straylight in Space Camera Sensors
[USA]
arXiv:2602.20709v1 Announce Type: cross Abstract: This study details an artificial intelligence (AI)-based methodology for the semantic segmentation of space camera faults. Specifically, we address t...
Related: #Artificial Intelligence, #Space Technology -
πΊπΈ 3D Scene Rendering with Multimodal Gaussian Splatting
[USA]
arXiv:2602.17124v1 Announce Type: cross Abstract: 3D scene reconstruction and rendering are core tasks in computer vision, with applications spanning industrial monitoring, robotics, and autonomous d...
Related: #Pattern Recognition, #RadioβFrequency (RF) Sensing, #Multimodal Fusion, #3D Reconstruction -
πΊπΈ A High-Level Survey of Optical Remote Sensing
[USA]
arXiv:2602.17397v1 Announce Type: cross Abstract: In recent years, significant advances in computer vision have also propelled progress in remote sensing. Concurrently, the use of drones has expanded...
Related: #Remote Sensing, #Drone Technology, #Datasets, #Artificial Intelligence -
πΊπΈ Tracing Copied Pixels and Regularizing Patch Affinity in Copy Detection
[USA]
arXiv:2602.17484v1 Announce Type: cross Abstract: Image Copy Detection (ICD) aims to identify manipulated content between image pairs through robust feature representation learning. While self-superv...
Related: #Image Forensics, #SelfβSupervised Learning, #Contrastive Representation Learning, #Geometric Reasoning -
πΊπΈ DocSplit: A Comprehensive Benchmark Dataset and Evaluation Approach for Document Packet Recognition and Splitting
[USA]
arXiv:2602.15958v1 Announce Type: cross Abstract: Document understanding in real-world applications often requires processing heterogeneous, multi-page document packets containing multiple documents ...
Related: #Document Understanding, #Data Annotation, #Benchmark Datasets, #Machine Learning -
πΊπΈ A Survey: Spatiotemporal Consistency in Video Generation
[USA]
arXiv:2502.17863v2 Announce Type: replace-cross Abstract: Video generation aims to produce temporally coherent sequences of visual frames, representing a pivotal advancement in Artificial Intelligenc...
Related: #Artificial Intelligence Generated Content, #Video Generation, #Spatiotemporal Consistency, #Temporal Coherence -
πΊπΈ Attention-gated U-Net model for semantic segmentation of brain tumors and feature extraction for survival prognosis
[USA]
arXiv:2602.15067v1 Announce Type: new Abstract: Gliomas, among the most common primary brain tumors, vary widely in aggressiveness, prognosis, and histology, making treatment challenging due to compl...
Related: #Medical Imaging, #Deep Learning, #Neuro-Oncology, #Survival Prediction -
πΊπΈ Improving MLLMs in Embodied Exploration and Question Answering with Human-Inspired Memory Modeling
[USA]
arXiv:2602.15513v1 Announce Type: cross Abstract: Deploying Multimodal Large Language Models as the brain of embodied agents remains challenging, particularly under long-horizon observations and limi...
Related: #Multimodal AI, #Embodied Agents, #Memory Modeling, #Natural Language Processing -
πΊπΈ Learning to Retrieve Navigable Candidates for Efficient Vision-and-Language Navigation
[USA]
arXiv:2602.15724v1 Announce Type: cross Abstract: Vision-and-Language Navigation (VLN) requires an agent to follow natural-language instructions and navigate through previously unseen environments. R...
Related: #Artificial Intelligence, #Natural Language Processing, #Navigation, #Efficiency -
πΊπΈ Language-Guided Invariance Probing of Vision-Language Models
[USA]
arXiv:2511.13494v1 Announce Type: cross Abstract: Recent vision-language models (VLMs) such as CLIP, OpenCLIP, EVA02-CLIP and SigLIP achieve strong zero-shot performance, but it is unclear how reliab...
Related: #Artificial Intelligence, #Natural Language Processing, #Model Evaluation -
πΊπΈ Multi-Task Learning with Additive U-Net for Image Denoising and Classification
[USA]
arXiv:2602.12649v1 Announce Type: cross Abstract: We investigate additive skip fusion in U-Net architectures for image denoising and denoising-centric multi-task learning (MTL). By replacing concaten...
Related: #Neural Network Architecture, #Multi-Task Learning -
πΊπΈ EPRBench: A High-Quality Benchmark Dataset for Event Stream Based Visual Place Recognition
[USA]
arXiv:2602.12919v1 Announce Type: cross Abstract: Event stream-based Visual Place Recognition (VPR) is an emerging research direction that offers a compelling solution to the instability of conventio...
Related: #Benchmark Development, #Event-Based Imaging -
πΊπΈ Invert4TVG: A Temporal Video Grounding Framework with Inversion Tasks Preserving Action Understanding Ability
[USA]
arXiv:2508.07388v2 Announce Type: replace Abstract: Temporal Video Grounding (TVG) aims to localize video segments corresponding to a given textual query, which often describes human actions. However...
Related: #AI Research, #Video Analysis