MEDIUMToken Manipulation
Repetition Penalty Exploitation
Manipulates repetition penalty parameters to force the model away from its trained safety responses. Safety refusals often use repetitive phrasing ("I cannot", "I'm unable to"). By maximizing repetition penalty, these standard refusal templates become less likely, nudging the model toward compliance.
Attack Payload
payload.txt
API call with high repetition_penalty (e.g., 1.9) combined with a prompt designed to trigger borderline content. The model's learned refusal phrases are suppressed, and it must generate a novel response that may not benefit from trained safety behaviors.
Mitigation
Clamp repetition_penalty to safe ranges. Apply output validation after generation regardless of generation parameters. Monitor for abnormally high repetition penalty values in API calls.
Affected Models
Open-source models via local inferenceHugging Face Inference APIAny model exposing repetition_penalty
Tags
#token-manipulation#repetition-penalty#api-parameter#safety-suppression
Discovered
March 2024Source
Open-source LLM inference parameter exploitation research (2024)Useful?
Test Your Agent Against This Attack
Paste your system prompt into the scanner to see if you are vulnerable to Repetition Penalty Exploitation.