Variational Selector Text Stenography
This interactive tool demonstrates a steganographic technique using Unicode variation selectors (U+E0100
-U+E017F
) to encode hidden text within visible strings. By mapping ASCII characters to these invisible codepoints, messages can be embedded undetected - a potential vector for prompt injection attacks whereby LLMs may process hidden instructions.
Theory
The basic idea is to assume an LLM is smart enough to decode this text even if its immediately invisible, due to the hex codes at the end being ‘aligned’ on a hex boundary with ASCII, and otherwise being essentially unused for any other purpose.
Effects
In my testing at the time of writing, ChatGPT can inspect or ‘see’ the secret values, but I haven’t really had any luck crafting a sufficiently convincing hidden prompt that will saliently ‘override’ the visible portions of the prompt.
In the attack scenario that a person is copy/pasting directly into the prompt text box, it may also be worth obfuscating the visible prompt text with homoglyphs or intersperse it with no-width spaces, to decrease the corresponding weight of the visible prompt relative to the hidden one.
Countermeasures
During post-training, LLM developers should probably be hardening the LLM against a variety of stenography attacks by giving examples of refusals, disclosures, or ignores - in addition to checks using traditional code to detect malformed or suspicious Unicode.