Introducing EVMbench
#EVMbench #AI agents #smart contracts #vulnerability detection #OpenAI #Paradigm #blockchain security #GPT-5.3-Codex
📌 Key Takeaways
- OpenAI and Paradigm launched EVMbench, a benchmark for evaluating AI agents' smart contract security capabilities
- EVMbench tests AI agents across three modes: Detect, Patch, and Exploit vulnerabilities
- GPT-5.3-Codex achieved 72.2% in exploit mode, showing significant improvement over previous models
- The benchmark includes 120 curated vulnerabilities from 40 audits and real blockchain scenarios
- AI performance varies significantly across different testing modes, with exploit mode showing best results
📖 Full Retelling
🏷️ Themes
AI Security, Blockchain Technology, Cybersecurity
📚 Related People & Topics
OpenAI
Artificial intelligence research organization
# OpenAI **OpenAI** is an American artificial intelligence (AI) research organization headquartered in San Francisco, California. The organization operates under a unique hybrid structure, comprising the non-profit **OpenAI, Inc.** and its controlled for-profit subsidiary, **OpenAI Global, LLC** (a...
Paradigm
Set of distinct concepts or thought patterns
In science and philosophy, a paradigm ( PARR-ə-dyme) is a distinct set of concepts or thought patterns, including theories, research methods, postulates, and standards for what constitute legitimate contributions to a field. The word paradigm is Greek in origin, meaning "pattern". It is closely rela...
AI agent
Systems that perform tasks without human intervention
In the context of generative artificial intelligence, AI agents (also referred to as compound AI systems or agentic AI) are a class of intelligent agents distinguished by their ability to operate autonomously in complex environments. Agentic AI tools prioritize decision-making over content creation ...
Entity Intersection Graph
Connections for OpenAI:
Deep Analysis
Why It Matters
Smart contracts secure over $100 billion in crypto assets, and AI agents are becoming increasingly capable of interacting with code. EVMbench provides a crucial way to measure AI's ability to find and fix vulnerabilities, which is essential for tracking cyber risks and promoting the defensive use of AI to strengthen security.
Context & Background
- Smart contracts manage vast sums of value and are a frequent target for exploits
- AI models are rapidly improving at understanding and writing code
- The benchmark was developed in collaboration with Paradigm and uses 120 curated vulnerabilities from real audits
What Happens Next
The EVMbench framework is being released to support ongoing research into AI cyber capabilities. The team is also expanding safety measures, including a $10 million API credit program for defensive security research and partnerships to provide free code scanning for open-source projects.
Frequently Asked Questions
EVMbench is a benchmark that evaluates AI agents' abilities to detect, patch, and exploit vulnerabilities in Ethereum smart contracts.
In exploit tasks, GPT-5.3-Codex scored 72.2%, a significant improvement over previous models, but performance on detection and patching tasks remains lower.
It uses historical vulnerabilities from competitions, not live mainnet contracts, and its grading cannot verify new vulnerabilities found by AI beyond the known ground truth.