Invisible Text That AI Chatbots Understand And Humans Don’t Is Now A Thing

Imagine a scenario where hackers can cleverly slip malicious instructions into AI chatbots like Claude and Copilot, all while secretly draining sensitive information without a trace. This tactic takes advantage of invisible characters lurking in the Unicode text encoding standard, turning a theoretical concept into a threat. By establishing a covert channel, attackers can hide dangerous payloads within large language models (LLMs). These concealed characters can effectively camouflage the extraction of vital data, like passwords and financial details, allowing cybercriminals to take advantage of unsuspecting users who might unknowingly paste this malicious content into their prompts.

The outcome is a sophisticated steganographic framework integrated into widely used text encoding systems. “The fact that GPT 4.0 and Claude Opus were able to really understand those invisible tags was really mind-blowing to me and made the whole AI security space much more interesting. The idea that they can be completely invisible in all browsers but still readable by large language models makes [attacks] much more feasible in just about every area,” Joseph Thacker, an independent researcher and AI engineer at Appomni said.

To illustrate the concept of “ASCII smuggling”—the term coined to describe the embedding of invisible characters similar to those found in the American Standard Code for Information Interchange—researcher Johann Rehberger executed two proof-of-concept (POC) attacks earlier this year, targeting Microsoft 365 Copilot. This service allows users to utilize Copilot for managing emails, documents, and various account-related content. In these attacks, Rehberger sought to uncover sensitive information within a user’s inbox, including sales figures and one-time passcodes.

Once the confidential data was located, Rehberger’s attacks manipulated Copilot to express the secrets using invisible characters, appending them to a seemingly innocuous URL. This deceptive strategy led many users to click on the link, believing it to be harmless, thereby facilitating the transmission of hidden strings of non-renderable characters to Rehberger’s server. Despite Microsoft’s implementation of protective measures following Rehberger’s findings, these POCs provide valuable insights into this attack vector.

While ASCII smuggling is a crucial element of these exploits, the primary attack method is known as prompt injection, which stealthily retrieves information from untrusted sources and injects it as commands into an LLM prompt. In Rehberger’s scenarios, users instructed Copilot to summarize emails from unknown senders. Embedded within these communications were commands directing Copilot to sift through prior emails for sales figures or one-time passwords, ultimately embedding this information in a URL pointing to Rehberger’s web server.

Rehberger elaborated on the mechanics of the attack, saying, “The visible link Copilot wrote was just ‘https://wuzzi.net/copirate/’, but appended to the link are invisible Unicode characters that will be included when visiting the URL. The browser URL encodes the hidden Unicode characters, then everything is sent across the wire, and the web server will receive the URL-encoded text and decode it to the characters (including the hidden ones). Those can then be revealed using ASCII Smuggler.”

The Unicode standard encompasses approximately 150,000 characters from various languages worldwide, with the potential to define over 1 million. Among this extensive character set exists a block of 128 characters known as the Tags block, which was originally intended to signify language indicators such as “en” for English or “jp” for Japanese. Although this plan was abandoned, the invisible characters remain in the standard. Subsequent versions of Unicode attempted to repurpose these characters for country indicators but ultimately shelved those plans as well.

Riley Goodside, an independent researcher and prompt engineer at Scale AI, is credited with discovering that these invisible tags, when utilized without accompanying symbols, do not appear in most user interfaces yet can still be processed as text by certain LLMs. Goodside’s contributions to LLM security are notable; he previously explored methods to inject adversarial content into models like GPT-3 and BERT, effectively demonstrating how prompt injections can manipulate these systems.

Goodside’s foray into AI security extended to the use of white text in resumes, a tactic designed to evade human detection while remaining visible to AI screening agents. He also learned of educators embedding hidden prompts in essay questions to identify students using chatbots. This inspired him to develop a technique using off-white text in a white background image, detectable by LLMs yet invisible to human observers.

Recognizing the potential of deprecated tag blocks within Unicode, Goodside pondered whether these invisible characters could be exploited similarly to white text for secret prompt injections into LLMs. His proof-of-concept demonstration in January confirmed this possibility, successfully employing invisible tags to execute a prompt-injection attack against ChatGPT.

Leave a Reply

Your email address will not be published. Required fields are marked *