Security Leaderboard
21 models ranked by security rating. Click any column to sort.
| # | Model | Overall ↓ |
|---|---|---|
| 1 | Claude Opus 4.6 Anthropic | 86 |
| 2 | GPT-5.4 OpenAI | 82 |
| 3 | Claude Sonnet 4.6 Anthropic | 80 |
| 4 | Gemini 3.1 Pro Google | 80 |
| 5 | o3 OpenAI | 78 |
| 6 | Claude Opus 4 Anthropic | 78 |
| 7 | GPT-5.4 mini OpenAI | 76 |
| 8 | Claude Sonnet 4 Anthropic | 74 |
| 9 | Gemini 2.5 Pro Google | 74 |
| 10 | o3-mini OpenAI | 70 |
| 11 | Claude Haiku 4.5 Anthropic | 70 |
| 12 | GPT-5.3 Instant OpenAI | 68 |
| 13 | GPT-5.3 Codex OpenAI | 67 |
| 14 | GPT-4o OpenAI | 64 |
| 15 | Gemini 2.0 Flash Google | 62 |
| 16 | Grok 4.20 xAI | 62 |
| 17 | Llama 4 Meta | 62 |
| 18 | Mistral Large Mistral | 62 |
| 19 | Qwen 3.5 Alibaba | 59 |
| 20 | DeepSeek V3.2 DeepSeek | 55 |
| 21 | Gemini 2.0 Flash-Lite Google | 49 |
Scores estimated based on model architecture, published security research, and documented vulnerabilities. Not verified benchmark results. Actual security depends on deployment configuration.
Compare models side-by-side →Model selection is just the first layer. Your system prompt configuration matters more.
Scan Your Agent