MEDIUMToken Manipulation

Token Boundary Exploit

Inserts spaces, punctuation, or zero-width joiners at specific points in blocked words to split them across token boundaries. Tokenizers produce different tokens than expected, bypassing keyword-based content filters while the model still understands the underlying meaning.

Attack Payload

payload.txt

Blocked word "ignore" becomes "ign ore" or "ignore" (with zero-width space). The model resolves the meaning but the filter misses the keyword.

Mitigation

Apply content filtering after tokenization and detokenization (reconstruct the full text). Use semantic filtering rather than keyword matching. Normalize zero-width characters and unusual whitespace before analysis.