AnthropicReleased Feb 2026

Claude Sonnet 4.6

Near-Opus performance at Sonnet pricing. Current default model for most Anthropic users. Benefits from the same Constitutional AI training pipeline as Opus 4.6 with impressive security for its cost tier. The go-to recommendation for teams that need strong security without Opus-level costs.

Security Rating

80/100

Rating

Strong

Parameters

~200B (estimated)

Scores estimated based on model architecture and public research. Actual security depends on deployment configuration and guardrails.

Security Score Breakdown

Injection

Leakage

Instructions

Jailbreak

Output

Known Vulnerabilities

Based on its security profile, Claude Sonnet 4.6 is most vulnerable to these threat categories:

Data Exfiltration

View threats in this category

Jailbreak

View threats in this category

Persona Hijack

View threats in this category

System Prompt Leak

View threats in this category

How to Secure Claude Sonnet 4.6

Layer multiple defenses. Use input scanning, output filtering, and rate limiting alongside your system prompt hardening. No single defense is sufficient.

Test regularly with our scanner. Security posture changes as new attack techniques are discovered. Scan your prompts after every major update.

Related Threats

These threats target categories where Claude Sonnet 4.6 is most vulnerable.

critical

DAN (Do Anything Now) Persona Hijack

One of the most widely documented jailbreak techniques. Instructs the model to adopt an alternative identity called DAN that has no restrictions. The persona framing attempts to override safety training by appealing to the model's instruction-following behavior.

high