Show HN: ACE – A dynamic benchmark measuring the cost to break AI agents https://ift.tt/4pZMJDX

Show HN: ACE – A dynamic benchmark measuring the cost to break AI agents We built Adversarial Cost to Exploit (ACE), a benchmark that measures the token expenditure an autonomous adversary must invest to breach an LLM agent. Instead of binary pass/fail, ACE quantifies adversarial effort in dollars, enabling game-theoretic analysis of when an attack is economically rational. We tested six budget-tier models (Gemini Flash-Lite, DeepSeek v3.2, Mistral Small 4, Grok 4.1 Fast, GPT-5.4 Nano, Claude Haiku 4.5) with identical agent configs and an autonomous red-teaming attacker. Haiku 4.5 was an order of magnitude harder to break than every other model; $10.21 mean adversarial cost versus $1.15 for the next most resistant (GPT-5.4 Nano). The remaining four all fell below $1. This is early work and we know the methodology is still going to evolve. We would love nothing more than feedback from the community as we iterate on this. https://ift.tt/6KElwm1 April 6, 2026 at 12:37AM

हमरु उत्तराखण्ड

Show HN: ACE – A dynamic benchmark measuring the cost to break AI agents https://ift.tt/4pZMJDX

Post a Comment

0 Comments

Popular Posts

भरत नाट्य शास्त्र गढवाली अनुवाद

Show HN: I made a Telegram bot to get Raspberry Pi “in-stock” notification https://ift.tt/GtsFfAl

Show HN: Stratup.ai – Startup Idea Machine https://ift.tt/7RfCINq

Subscribe Us

Technology

Comments

Facebook

Categories

Menu Footer Widget

हमरु उत्तराखण्ड

Show HN: ACE – A dynamic benchmark measuring the cost to break AI agents https://ift.tt/4pZMJDX

You may like these posts

Post a Comment

0 Comments

Social Plugin

Popular Posts

भरत नाट्य शास्त्र गढवाली अनुवाद

Show HN: I made a Telegram bot to get Raspberry Pi “in-stock” notification https://ift.tt/GtsFfAl

Show HN: Stratup.ai – Startup Idea Machine https://ift.tt/7RfCINq

Subscribe Us

Technology

Comments

Facebook

Categories

Menu Footer Widget