New benchmark evaluates AI agents' ability to detect, patch, and exploit smart contract vulnerabilities. GPT-5.3-Codex scores 72.2% on exploit tasks. (Read MoreNew benchmark evaluates AI agents' ability to detect, patch, and exploit smart contract vulnerabilities. GPT-5.3-Codex scores 72.2% on exploit tasks. (Read More

OpenAI and Paradigm Launch EVMbench to Test AI Smart Contract Hacking

2026/03/05 08:55
3 min read
For feedback or concerns regarding this content, please contact us at [email protected]

OpenAI and Paradigm Launch EVMbench to Test AI Smart Contract Hacking

Rongchai Wang Mar 05, 2026 00:55

New benchmark evaluates AI agents' ability to detect, patch, and exploit smart contract vulnerabilities. GPT-5.3-Codex scores 72.2% on exploit tasks.

OpenAI and Paradigm Launch EVMbench to Test AI Smart Contract Hacking

OpenAI and crypto venture firm Paradigm have released EVMbench, a benchmark that measures how well AI agents can find, fix, and exploit vulnerabilities in Ethereum smart contracts. The announcement comes as AI-powered security tools race to protect the $100 billion-plus locked in DeFi protocols.

The benchmark draws from 120 curated high-severity vulnerabilities pulled from 40 real security audits, mostly from Code4rena competitions. It also includes vulnerability scenarios from security reviews of Tempo, a Layer 1 blockchain built for stablecoin payments.

Three Ways to Break Smart Contracts

EVMbench tests AI agents across three distinct modes. In Detect mode, agents audit contract repositories and get scored on finding known vulnerabilities. Patch mode requires agents to fix vulnerable code without breaking existing functionality. Exploit mode is the most aggressive—agents must execute actual fund-draining attacks against contracts deployed on a sandboxed blockchain.

The results show how quickly AI capabilities are advancing in this domain. GPT-5.3-Codex running via Codex CLI hit a 72.2% success rate on exploit tasks. That's more than double the 31.9% score from GPT-5, which launched just six months prior.

Interestingly, AI agents perform better at attacking than defending. The exploit setting has a clear objective—keep iterating until you drain the funds. Detection and patching proved harder. Agents sometimes stopped after finding one bug instead of auditing exhaustively, and maintaining full contract functionality while removing subtle vulnerabilities remained challenging.

Real Limitations Worth Noting

OpenAI acknowledged EVMbench doesn't capture the full difficulty of real-world contract security. Heavily deployed protocols like Uniswap or Aave undergo far more scrutiny than audit competition code. The benchmark also can't verify if an agent finds legitimate vulnerabilities that human auditors missed—it only checks against known issues.

The exploit environment runs on a clean local Anvil instance rather than forked mainnet state, and timing-dependent attacks fall outside scope. Single-chain environments only for now.

$10M for Defensive Research

Alongside EVMbench, OpenAI committed $10 million in API credits specifically for defensive security research. The company is expanding its Aardvark security research agent to more users and partnering with open-source maintainers for free codebase scanning.

The timing matters. As AI agents get better at exploiting contracts, the window between vulnerability discovery and exploitation shrinks. Protocol teams that aren't using AI-assisted auditing will increasingly find themselves at a disadvantage against attackers who are.

OpenAI released EVMbench's tasks, tooling, and evaluation framework publicly. For DeFi developers and security researchers, it's both a measuring stick and a warning about where AI capabilities are headed.

Image source: Shutterstock
  • openai
  • paradigm
  • smart contracts
  • ai security
  • defi
Market Opportunity
Smart Blockchain Logo
Smart Blockchain Price(SMART)
$0.004178
$0.004178$0.004178
+0.67%
USD
Smart Blockchain (SMART) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Landmark Court Ruling Rejects Terrorism Financing Claims

Landmark Court Ruling Rejects Terrorism Financing Claims

The post Landmark Court Ruling Rejects Terrorism Financing Claims appeared on BitcoinEthereumNews.com. Binance Lawsuit Dismissed: Landmark Court Ruling Rejects
Share
BitcoinEthereumNews2026/03/07 10:27
IP Hits $11.75, HYPE Climbs to $55, BlockDAG Surpasses Both with $407M Presale Surge!

IP Hits $11.75, HYPE Climbs to $55, BlockDAG Surpasses Both with $407M Presale Surge!

The post IP Hits $11.75, HYPE Climbs to $55, BlockDAG Surpasses Both with $407M Presale Surge! appeared on BitcoinEthereumNews.com. Crypto News 17 September 2025 | 18:00 Discover why BlockDAG’s upcoming Awakening Testnet launch makes it the best crypto to buy today as Story (IP) price jumps to $11.75 and Hyperliquid hits new highs. Recent crypto market numbers show strength but also some limits. The Story (IP) price jump has been sharp, fueled by big buybacks and speculation, yet critics point out that revenue still lags far behind its valuation. The Hyperliquid (HYPE) price looks solid around the mid-$50s after a new all-time high, but questions remain about sustainability once the hype around USDH proposals cools down. So the obvious question is: why chase coins that are either stretched thin or at risk of retracing when you could back a network that’s already proving itself on the ground? That’s where BlockDAG comes in. While other chains are stuck dealing with validator congestion or outages, BlockDAG’s upcoming Awakening Testnet will be stress-testing its EVM-compatible smart chain with real miners before listing. For anyone looking for the best crypto coin to buy, the choice between waiting on fixes or joining live progress feels like an easy one. BlockDAG: Smart Chain Running Before Launch Ethereum continues to wrestle with gas congestion, and Solana is still known for network freezes, yet BlockDAG is already showing a different picture. Its upcoming Awakening Testnet, set to launch on September 25, isn’t just a demo; it’s a live rollout where the chain’s base protocols are being stress-tested with miners connected globally. EVM compatibility is active, account abstraction is built in, and tools like updated vesting contracts and Stratum integration are already functional. Instead of waiting for fixes like other networks, BlockDAG is proving its infrastructure in real time. What makes this even more important is that the technology is operational before the coin even hits exchanges. That…
Share
BitcoinEthereumNews2025/09/18 00:32
The U.S. Commodity Futures Trading Commission unveiled a new logo, claiming it will usher in a "golden age" of innovation.

The U.S. Commodity Futures Trading Commission unveiled a new logo, claiming it will usher in a "golden age" of innovation.

PANews reported on March 7 that the U.S. Commodity Futures Trading Commission (CFTC) today unveiled a new logo, stating that it symbolizes the agency's commitment
Share
PANews2026/03/07 10:08