ArtiAgent creates pairs of real and artifact-injected images without human labeling
The system comprises three specialized agents: perception, synthesis, and curation
Researchers synthesized 100K images with rich artifact annotations demonstrating versatility
The approach addresses the costly and difficult-to-scale problem of human-labeled artifact datasets
📖 Full Retelling
Researchers Jaehyun Park, Minyoung Ahn, Minkyu Kim, Jonghyun Lee, Jae-Gil Lee, and Dongmin Park introduced ArtiAgent, a novel system designed to identify and fix visual artifacts in AI-generated images, in a paper submitted to arXiv on February 24, 2026, addressing the persistent challenge of unrealistic imperfections in AI imagery that current methods struggle to mitigate effectively. Despite significant advancements in diffusion models, AI-generated images frequently contain visual artifacts that compromise their realism, and while more thorough pre-training and larger models might reduce these imperfections, there is no guarantee they can be completely eliminated, making artifact mitigation a crucial area of research. The researchers propose ArtiAgent as an efficient solution that creates pairs of real and artifact-injected images without relying on costly human-labeled datasets that have limited previous approaches.
ArtiAgent consists of three specialized agents working in concert: a perception agent that recognizes and grounds entities and subentities from real images, a synthesis agent that introduces artifacts via novel patch-wise embedding manipulation within a diffusion transformer, and a curation agent that filters synthesized artifacts while generating both local and global explanations for each instance. This innovative approach eliminates the need for expensive and difficult-to-scale human annotation processes, which have been the bottleneck in previous artifact-aware methodologies. The researchers demonstrated the system's versatility by synthesizing 100,000 images with rich artifact annotations and showcasing its effectiveness across diverse applications in computer vision and artificial intelligence.
The significance of this research extends beyond mere artifact detection; it represents a paradigm shift toward automated data synthesis for improving AI model performance. By generating comprehensive datasets with precise annotations, ArtiAgent enables both vision-language models (VLMs) and diffusion models to better comprehend and correct visual imperfections. The publicly available code implementation allows researchers and developers to leverage this technology for improving their own AI systems, potentially leading to more realistic and higher-quality AI-generated content across various industries including entertainment, design, and digital media production.
🏷️ Themes
AI image generation, Visual artifact detection, Machine learning automation
Visual artifacts (also artefacts) are anomalies apparent during visual representation as in digital graphics and other forms of imagery, especially photography and microscopy.
Technique for the generative modeling of a continuous probability distribution
In machine learning, diffusion models, also known as diffusion-based generative models or score-based generative models, are a class of latent variable generative models. A diffusion model consists of two major components: the forward diffusion process, and the reverse sampling process. The goal of ...
Study of algorithms that improve automatically through experience
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data, and thus perform tasks without explicit instructions. Within a subdiscipline in machine learning, advances i...
Computer vision tasks include methods for acquiring, processing, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the form of decisions. "Understanding" in this context signifies th...
No entity connections available yet for this article.
Original Source
--> Computer Science > Computer Vision and Pattern Recognition arXiv:2602.20951 [Submitted on 24 Feb 2026] Title: See and Fix the Flaws: Enabling VLMs and Diffusion Models to Comprehend Visual Artifacts via Agentic Data Synthesis Authors: Jaehyun Park , Minyoung Ahn , Minkyu Kim , Jonghyun Lee , Jae-Gil Lee , Dongmin Park View a PDF of the paper titled See and Fix the Flaws: Enabling VLMs and Diffusion Models to Comprehend Visual Artifacts via Agentic Data Synthesis, by Jaehyun Park and 5 other authors View PDF Abstract: Despite recent advances in diffusion models, AI generated images still often contain visual artifacts that compromise realism. Although more thorough pre-training and bigger models might reduce artifacts, there is no assurance that they can be completely eliminated, which makes artifact mitigation a highly crucial area of study. Previous artifact-aware methodologies depend on human-labeled artifact datasets, which are costly and difficult to scale, underscoring the need for an automated approach to reliably acquire artifact-annotated datasets. In this paper, we propose ArtiAgent, which efficiently creates pairs of real and artifact-injected images. It comprises three agents: a perception agent that recognizes and grounds entities and subentities from real images, a synthesis agent that introduces artifacts via artifact injection tools through novel patch-wise embedding manipulation within a diffusion transformer, and a curation agent that filters the synthesized artifacts and generates both local and global explanations for each instance. Using ArtiAgent, we synthesize 100K images with rich artifact annotations and demonstrate both efficacy and versatility across diverse applications. Code is available at link. Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI) Cite as: arXiv:2602.20951 [cs.CV] (or arXiv:2602.20951v1 [cs.CV] for this version) https://doi.org/10.48550/arXiv.2602.20951 Focus to learn more arXiv...