SP
BravenNow
ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction
| USA | technology | ✓ Verified - arxiv.org

ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction

#Indirect Prompt Injection #Large Language Model agents #ICON framework #AI security #Latent Space Trace Prober #Mitigating Rectifier #Attack detection #Multi-modal agents

📌 Key Takeaways

  • ICON framework protects LLM agents from Indirect Prompt Injection attacks
  • Existing defenses suffer from over-refusal that terminates valid workflows
  • ICON uses Latent Space Trace Prober and Mitigating Rectifier for surgical defense
  • Achieves commercial-grade detection with 50% better task utility preservation
  • Demonstrates robust generalization and works with multi-modal agents

📖 Full Retelling

A team of researchers led by Che Wang and eight other authors introduced ICON, a novel defense framework for Large Language Model agents against Indirect Prompt Injection attacks, in a paper submitted to arXiv on February 24, 2026, addressing critical security vulnerabilities in AI systems. The researchers developed this innovative approach to combat malicious instructions hidden in retrieved content that can hijack an agent's execution flow, a growing concern as AI systems become more integrated into critical applications. Unlike existing defensive mechanisms that rely on strict filtering which often mistakenly terminate valid workflows, ICON employs a sophisticated probing-to-mitigation strategy that maintains task continuity while neutralizing threats. The framework utilizes a Latent Space Trace Prober to detect attacks by identifying distinctive over-focusing signatures in the latent space, followed by a Mitigating Rectifier that performs surgical attention steering to selectively manipulate adversarial dependencies while amplifying task-relevant elements. Extensive testing across multiple model backbones demonstrated ICON's effectiveness, achieving competitive attack detection rates matching commercial-grade systems while providing over 50% improvement in task utility preservation. Furthermore, the research highlights ICON's robust generalization capabilities beyond training data and its successful extension to multi-modal agents, establishing a superior balance between security and efficiency that addresses a critical gap in current AI defense technologies.

🏷️ Themes

AI Security, Prompt Injection Defense, Large Language Models, Cybersecurity Research

Entity Intersection Graph

No entity connections available yet for this article.

Original Source
--> Computer Science > Artificial Intelligence arXiv:2602.20708 [Submitted on 24 Feb 2026] Title: ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction Authors: Che Wang , Fuyao Zhang , Jiaming Zhang , Ziqi Zhang , Yinghui Wang , Longtao Huang , Jianbo Gao , Zhong Chen , Wei Yang Bryan Lim View a PDF of the paper titled ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction, by Che Wang and 8 other authors View PDF HTML Abstract: Large Language Model agents are susceptible to Indirect Prompt Injection attacks, where malicious instructions in retrieved content hijack the agent's execution. Existing defenses typically rely on strict filtering or refusal mechanisms, which suffer from a critical limitation: over-refusal, prematurely terminating valid agentic workflows. We propose ICON, a probing-to-mitigation framework that neutralizes attacks while preserving task continuity. Our key insight is that IPI attacks leave distinct over-focusing signatures in the latent space. We introduce a Latent Space Trace Prober to detect attacks based on high intensity scores. Subsequently, a Mitigating Rectifier performs surgical attention steering that selectively manipulate adversarial query key dependencies while amplifying task relevant elements to restore the LLM's functional trajectory. Extensive evaluations on multiple backbones show that ICON achieves a competitive 0.4% ASR, matching commercial grade detectors, while yielding a over 50% task utility gain. Furthermore, ICON demonstrates robust Out of Distribution generalization and extends effectively to multi-modal agents, establishing a superior balance between security and efficiency. Comments: 11 pages, Subjects: Artificial Intelligence (cs.AI) ; Cryptography and Security (cs.CR) Cite as: arXiv:2602.20708 [cs.AI] (or arXiv:2602.20708v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2602.20708 Focus to learn more arXiv-issued DOI via DataCite (pending r...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine