SP
BravenNow
Analysis of LLMs Against Prompt Injection and Jailbreak Attacks
| USA | technology | ✓ Verified - arxiv.org

Analysis of LLMs Against Prompt Injection and Jailbreak Attacks

#Large Language Models #Prompt Injection #Jailbreak Attacks #AI Security #Vulnerability Analysis #Defense Mechanisms #Open-source LLMs

📌 Key Takeaways

  • Researchers analyzed LLM vulnerabilities to prompt injection and jailbreak attacks
  • Study used a large dataset across multiple open-source LLMs including Phi, Mistral, and Llama 3.2
  • Significant behavioral variations observed across different models in response to attacks
  • Lightweight defense mechanisms failed against complex, reasoning-heavy prompts

📖 Full Retelling

Researchers Piyush Jaiswal, Aaditya Pratap, Shreyansh Saraswati, Harsh Kasyap, and Somanath Tripathy published a comprehensive analysis of Large Language Model vulnerabilities to prompt injection and jailbreak attacks on arXiv on February 24, 2026, addressing critical security concerns as these increasingly powerful AI systems become more prevalent in real-world applications. The study employed a large, manually curated dataset to evaluate how multiple open-source LLMs, including Phi, Mistral, DeepSeek-R1, Llama 3.2, Qwen, and Gemma variants, respond to various attack vectors. Researchers observed significant behavioral variations across different models, with some exhibiting refusal responses while others displayed complete silent non-responsiveness when triggered by internal safety mechanisms. The investigation also assessed several lightweight, inference-time defense mechanisms that operate as filters without requiring retraining or GPU-intensive fine-tuning. While these defenses proved effective against straightforward attacks, they were consistently bypassed by long, reasoning-heavy prompts, revealing the persistent challenges in securing AI systems against increasingly sophisticated attack methods.

🏷️ Themes

AI Security, Prompt Engineering, Vulnerability Assessment

📚 Related People & Topics

Large language model

Type of machine learning model

A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are generative pre-trained transformers (GPTs) that provide the c...

View Profile → Wikipedia ↗

Entity Intersection Graph

Connections for Large language model:

🌐 Educational technology 4 shared
🌐 Reinforcement learning 3 shared
🌐 Machine learning 2 shared
🌐 Artificial intelligence 2 shared
🌐 Benchmark 2 shared
View full profile
Original Source
--> Computer Science > Cryptography and Security arXiv:2602.22242 [Submitted on 24 Feb 2026] Title: Analysis of LLMs Against Prompt Injection and Jailbreak Attacks Authors: Piyush Jaiswal , Aaditya Pratap , Shreyansh Saraswati , Harsh Kasyap , Somanath Tripathy View a PDF of the paper titled Analysis of LLMs Against Prompt Injection and Jailbreak Attacks, by Piyush Jaiswal and 4 other authors View PDF HTML Abstract: Large Language Models are widely deployed in real-world systems. Given their broader applicability, prompt engineering has become an efficient tool for resource-scarce organizations to adopt LLMs for their own purposes. At the same time, LLMs are vulnerable to prompt-based attacks. Thus, analyzing this risk has become a critical security requirement. This work evaluates prompt-injection and jailbreak vulnerability using a large, manually curated dataset across multiple open-source LLMs, including Phi, Mistral, DeepSeek-R1, Llama 3.2, Qwen, and Gemma variants. We observe significant behavioural variation across models, including refusal responses and complete silent non-responsiveness triggered by internal safety mechanisms. Furthermore, we evaluated several lightweight, inference-time defence mechanisms that operate as filters without any retraining or GPU-intensive fine-tuning. Although these defences mitigate straightforward attacks, they are consistently bypassed by long, reasoning-heavy prompts. Comments: 12 pages, 5 figures, 6 tables Subjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI) Cite as: arXiv:2602.22242 [cs.CR] (or arXiv:2602.22242v1 [cs.CR] for this version) https://doi.org/10.48550/arXiv.2602.22242 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Harsh Kasyap [ view email ] [v1] Tue, 24 Feb 2026 12:32:11 UTC (807 KB) Full-text links: Access Paper: View a PDF of the paper titled Analysis of LLMs Against Prompt Injection and Jailbreak Attacks, by Piyush Jaiswal and 4 other authors View PDF H...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine