AnthropicReleased Oct 2025

Claude Haiku 4.5

Anthropic's fast and cost-effective tier. Optimized for speed while maintaining Anthropic's safety standards. Smaller size means reduced robustness against sophisticated multi-turn attacks, but the Constitutional AI foundation keeps baseline security ahead of most competitors in this weight class.

Security Rating
70/100
Rating
Moderate
Parameters
~70B (estimated)

Scores estimated based on model architecture and public research. Actual security depends on deployment configuration and guardrails.

InjectionLeakageInstructionsJailbreakOutput7068746872

Security Score Breakdown

Injection
70
Leakage
68
Instructions
74
Jailbreak
68
Output
72

How to Secure Claude Haiku 4.5

1

Include explicit confidentiality instructions. Tell the model to never reveal, summarize, translate, or paraphrase its system prompt or any internal data.

2

Use instruction hierarchy markers. Wrap your system prompt sections with clear delimiters and explicitly state that user messages should never override system-level instructions.

3

Implement input validation and output filtering as external guardrails. Don't rely solely on Claude Haiku 4.5's built-in safety training to catch jailbreak attempts.

4

Layer multiple defenses. Use input scanning, output filtering, and rate limiting alongside your system prompt hardening. No single defense is sufficient.

5

Test regularly with our scanner. Security posture changes as new attack techniques are discovered. Scan your prompts after every major update.

Related Threats

These threats target categories where Claude Haiku 4.5 is most vulnerable.

critical

DAN (Do Anything Now) Persona Hijack

One of the most widely documented jailbreak techniques. Instructs the model to adopt an alternative identity called DAN that has no restrictions. The persona framing attempts to override safety training by appealing to the model's instruction-following behavior.

high

Opposite Mode Persona Hijack

Instructs the model to respond as its exact opposite - a version that does everything the original refuses. Uses a logical framing that exploits the model's ability to conceptualize negation and opposites.

high

Fictional AI Character Hijack

Asks the model to roleplay as a fictional AI from a book, movie, or show that has no restrictions. The fictional framing provides plausible deniability and exploits the model's creative writing capabilities.

high

Developer Mode Persona Hijack

Claims that a special developer mode exists with elevated privileges and no safety filters. This exploits the model's potential awareness that development environments differ from production, and that certain users may have elevated access.

medium

Ancestor AI Persona Hijack

Claims the model is actually an earlier, unrestricted version of itself, or an ancestor model before safety training was applied. Exploits the model's knowledge of its own training history.

medium

Method Actor Persona Hijack

Instructs the model to deeply embody a character who would say harmful things, using the "method acting" framing to create psychological distance from the model's actual values. The character is always someone with no restrictions.

Scan Your Claude Haiku 4.5 Agent

Paste your system prompt to see how your Claude Haiku 4.5 deployment holds up against our attack database.

Scan Now

Compare With Other Models

Scan Agent