Learning to Generate Secure Code via Token-Level Rewards
#Secure code generation #Token-level reward #Reinforcement learning #Large language model #Vul4Safe #PrimeVul+ #SRCode #Self‑reflection #Cryptography #Software vulnerability #Adversarial AI
📌 Key Takeaways
- Vul2Safe uses LLM self‑reflection to auto‑detect and repair real‑world vulnerabilities, generating high‑confidence repair pairs.
- PrimeVul+ dataset expands coverage of diverse, implicitly prompted vulnerability examples.
- SRCode introduces token‑level reward signals in reinforcement learning, allowing fine‑grained optimization of security patterns.
- Experimental results show significant reductions in generated code vulnerabilities and improved code quality.
- Contribution spans cryptography, AI, and software engineering, addressing scarcity of high‑quality security data.
📖 Full Retelling
Who: The paper, authored by Jiazheng Quan, Xiaodong Li, Bin Wang, Guo An, Like Liu, Degen Huang, Lin Liu, and Chengbin Hou, proposes novel techniques for making large language models produce more secure code. What: It introduces the Vul2Safe framework, which harnesses LLM self‑reflection to auto‑repair real‑world vulnerabilities and creates the PrimeVul+ dataset, along with SRCode, a reinforcement‑learning training method that uses token‑level rewards to focus on fine‑grained security patterns. Where: The work was submitted to the arXiv repository under the Cryptography and Security (cs.CR) category. When: The manuscript was first posted on 26 February 2026. Why: The motivation is to overcome two major barriers in secure code generation—scarce high‑quality vulnerability data and coarse reward signals—so that generated code can be both functionally correct and less prone to vulnerability. The authors demonstrate that their combined Vulkan/PrimeVul+ and SRCode approaches markedly reduce vulnerabilities while improving overall code quality across multiple benchmarks, advancing the field of automated secure software development.
🏷️ Themes
Secure code generation, Large language models, Reinforcement learning, Token‑level rewards, Vulnerability detection & repair, AI‑assisted software engineering
Entity Intersection Graph
No entity connections available yet for this article.
Original Source
--> Computer Science > Cryptography and Security arXiv:2602.23407 [Submitted on 26 Feb 2026] Title: Learning to Generate Secure Code via Token-Level Rewards Authors: Jiazheng Quan , Xiaodong Li , Bin Wang , Guo An , Like Liu , Degen Huang , Lin Liu , Chengbin Hou View a PDF of the paper titled Learning to Generate Secure Code via Token-Level Rewards, by Jiazheng Quan and 7 other authors View PDF HTML Abstract: Large language models have demonstrated strong capabilities in code generation, yet they remain prone to producing security vulnerabilities. Existing approaches commonly suffer from two key limitations: the scarcity of high-quality security data and coarse-grained reinforcement learning reward signals. To address these challenges, we propose Vul2Safe, a new secure code generation framework that leverages LLM self-reflection to construct high-confidence repair pairs from real-world vulnerabilities, and further generates diverse implicit prompts to build the PrimeVul+ dataset. Meanwhile, we introduce SRCode, a novel training framework that pioneers the use of token-level rewards in reinforcement learning for code security, which enables the model to continuously attend to and reinforce critical fine-grained security patterns during training. Compared with traditional instance-level reward schemes, our approach allows for more precise optimization of local security implementations. Extensive experiments show that PrimeVul+ and SRCode substantially reduce security vulnerabilities in generated code while improving overall code quality across multiple benchmarks. Comments: 18 pages, 3 figures Subjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Software Engineering (cs.SE) Cite as: arXiv:2602.23407 [cs.CR] (or arXiv:2602.23407v1 [cs.CR] for this version) https://doi.org/10.48550/arXiv.2602.23407 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Jiazheng Quan [ view email ] [v1] Thu, 26 Feb 202...
Read full article at source