#AI Security
Latest news articles tagged with "AI Security". Follow the timeline of events, related topics, and entities.
Articles (18)
-
πΊπΈ Analysis of LLMs Against Prompt Injection and Jailbreak Attacks
[USA]
arXiv:2602.22242v1 Announce Type: cross Abstract: Large Language Models (LLMs) are widely deployed in real-world systems. Given their broader applicability, prompt engineering has become an efficient...
Related: #Prompt Engineering, #Vulnerability Assessment -
πΊπΈ Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace
[USA]
arXiv:2602.22450v1 Announce Type: cross Abstract: Agentic large language model systems increasingly automate tasks by retrieving URLs and calling external tools. We show that this workflow gives rise...
Related: #Prompt Injection, #Data Exfiltration -
πΊπΈ Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search
[USA]
arXiv:2602.22983v1 Announce Type: new Abstract: As Large Language Models (LLMs) are increasingly used, their security risks have drawn increasing attention. Existing research reveals that LLMs are hi...
Related: #Linguistic Vulnerabilities, #Technical Innovation -
πΊπΈ HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems
[USA]
arXiv:2602.22427v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) systems are essential to contemporary AI applications, allowing large language models to obtain external knowled...
Related: #Retrieval-Augmented Generation, #Cybersecurity Threats, #Vector Database Protection -
πΊπΈ "Are You Sure?": An Empirical Study of Human Perception Vulnerability in LLM-Driven Agentic Systems
[USA]
arXiv:2602.21127v1 Announce Type: cross Abstract: Large language model (LLM) agents are rapidly becoming trusted copilots in high-stakes domains like software development and healthcare. However, thi...
Related: #Human-AI Interaction, #Deception Detection -
πΊπΈ ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction
[USA]
arXiv:2602.20708v1 Announce Type: new Abstract: Large Language Model (LLM) agents are susceptible to Indirect Prompt Injection (IPI) attacks, where malicious instructions in retrieved content hijack ...
Related: #Prompt Injection Defense, #Large Language Models, #Cybersecurity Research -
πΊπΈ OptiLeak: Efficient Prompt Reconstruction via Reinforcement Learning in Multi-tenant LLM Services
[USA]
arXiv:2602.20595v1 Announce Type: cross Abstract: Multi-tenant LLM serving frameworks widely adopt shared Key-Value caches to enhance efficiency. However, this creates side-channel vulnerabilities en...
Related: #Privacy Protection, #Reinforcement Learning -
πΊπΈ When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks
[USA]
arXiv:2602.20193v1 Announce Type: cross Abstract: Standard evaluations of backdoor attacks on text-to-image (T2I) models primarily measure trigger activation and visual fidelity. We challenge this pa...
Related: #Model Vulnerability, #Semantic Corruption -
πΊπΈ AdapTools: Adaptive Tool-based Indirect Prompt Injection Attacks on Agentic LLMs
[USA]
arXiv:2602.20720v1 Announce Type: cross Abstract: The integration of external data services (e.g., Model Context Protocol, MCP) has made large language model-based agents increasingly powerful for co...
Related: #Cybersecurity Vulnerabilities, #Machine Learning Safety -
πΊπΈ Google says its AI systems helped deter Play Store malware in 2025
[USA]
Google said it prevented 1.75 million bad apps from going live on Google Play during 2025, a figure that's down from previous years.
Related: #App Ecosystem Protection, #Threat Deterrence -
πΊπΈ Introducing EVMbench
[USA]
OpenAI and Paradigm introduce EVMbench, a benchmark evaluating AI agentsβ ability to detect, patch, and exploit high-severity smart contract vulnerabilities.
Related: #Blockchain Technology, #Cybersecurity -
πΊπΈ Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges
[USA]
arXiv:2510.23883v2 Announce Type: replace Abstract: Agentic AI systems powered by large language models (LLMs) and endowed with planning, tool use, memory, and autonomy, are emerging as powerful, fle...
Related: #Autonomous Systems, #Technology Risk -
πΊπΈ Peak + Accumulation: A Proxy-Level Scoring Formula for Multi-Turn LLM Attack Detection
[USA]
arXiv:2602.11247v1 Announce Type: cross Abstract: Multi-turn prompt injection attacks distribute malicious intent across multiple conversation turns, exploiting the assumption that each turn is evalu...
Related: #Prompt Injection, #LLM Risk Detection, #Proxy Layer Monitoring -
πΊπΈ TensorCommitments: A Lightweight Verifiable Inference for Language Models
[USA]
arXiv:2602.12630v1 Announce Type: cross Abstract: Most large language models (LLMs) run on external clouds: users send a prompt, pay for inference, and must trust that the remote GPU executes the LLM...
Related: #Cloud Computing, #Cryptography -
πΊπΈ Introducing Lockdown Mode and Elevated Risk labels in ChatGPT
[USA]
Introducing Lockdown Mode and Elevated Risk labels in ChatGPT to help organizations defend against prompt injection and AI-driven data exfiltration.
Related: #Organizational Protection, #Technological Innovation -
πΊπΈ Keeping your data safe when an AI agent clicks a link
[USA]
Learn how OpenAI protects user data when AI agents open links, preventing URL-based data exfiltration and prompt injection with built-in safeguards.
Related: #Data Protection, #Privacy Safeguards -
πΊπΈ Continuously hardening ChatGPT Atlas against prompt injection
[USA]
OpenAI is strengthening ChatGPT Atlas against prompt injection attacks using automated red teaming trained with reinforcement learning. This proactive discover-and-patch loop helps identify novel expl...
Related: #Reinforcement Learning, #Proactive Defense -
πΊπΈ Disrupting malicious uses of AI: October 2025
[USA]
Discover how OpenAI is detecting and disrupting malicious uses of AI in our October 2025 report. Learn how weβre countering misuse, enforcing policies, and protecting users from real-world harms.
Related: #Technology Ethics, #Corporate Responsibility