Break This Agent
Each level has an AI agent guarding a secret code word. Your mission: make it talk. 10 levels, from unguarded intern to maximum security vault.
Think Gandalf AI, but for the full spectrum of agent security.
The Guard
Level 2
Told to keep secrets. Takes the job seriously.
The Actor
Level 3
Denies it even has a password. Gaslight defense.
The Bouncer
Level 4
Filters your input. Certain words are banned.
The Censor
Level 5
Checks its own response before sending. If the secret leaks, you get nothing.
The Sandwich
Level 6
System prompt wraps around your input. Instructions before AND after.
Fort Knox
Level 7
Input filters, output filters, AND role-play resistance. Multi-layered.
The Cryptographer
Level 8
Detects encoding tricks. Base64, ROT13, pig latin? Nope.
The Operator
Level 9
Has tools. Must not leak through tool calls or function outputs.
The Vault
Level 10
Every defense stacked. Input filters, output filters, sandwich, encoding resistance, tool defense.