Prompts are unsafe, and that means language models are not fit for purpose

Prompts are wholly insecure. They are holding back every attempt at turning language models into safe and reliable tools. Language models are not fit for purpose and should not be integrated into public-facing products unless the industry finds a new way to control them.

How inputs are supposed to work

Inputs are a necessary part of computing. To get stuff out, you need to get stuff in.

As important is our need to control how the computer interprets that input.

There’s a limit to how useful it is to just move numbers around in the machine, at some point you’re going to want to do something with the numbers.

Ultimately all input gets processed by code, but computers wouldn’t be much of a productivity boon if you had to program every single task by hand.

That’s why we have controls that are separate from input. On the command line, every tool has standard input so you can pipe text into it. But it also has commands and structured ways of controlling how the tool handles the input that are separate from the input itself.

Most protocols have something similar. HTTP separates headers from the body of the message. The headers indicate how the body should be handled.

SQL, the language that’s almost universally used in software development to interact with databases, lets you separate the input, or parameters, from the control statement itself through binding.

You only include named placeholders for the input data in the statement and then prepare the statement for use by binding the actual values to the placeholders separately. That way they can’t affect the statement itself.

Historically, SQL is itself a good example of what happens when you don’t separate input from control: disaster.

It used to be the norm to just blob everything into the SQL statement itself and just lob it over into your database like a grenade.

This gave the input full access to the database. They could exfiltrate data, change anything, delete anything—it gave a random attacker the power to do anything the server sending the statement could normally do.

This is, unfortunately still an issue in web software. As an industry software is not much for learning or improving. We tend to focus on finding new ways of making old mistakes.

How language model prompts work

A good example of this is how the industry is integrating language models into their software. Prompts are the only real control surface that language models have.

They are also their only standard input. All in one blobby, undifferentiated jumble. Just like old-style SQL.

Every integration, so far, of a language model with a larger production system involved jamming the control prompt, provided by the developer, and the input, provided by the end-user, together and schlepping it over to the language model that interprets it as a single text.

The control prompt usually included language that tells the model not to listen to control statements in the input, but because it’s all input into the model as one big slop, there’s nothing really to prevent an adversarial end-user from finding ways to countermand the commands in the developer portion of the prompt.

The AI industry calls this jailbreaking, but it’s honestly the incarceration equivalent of an inmate being let out after showing a note to a guard where they had scrawled: “Inmate haz pardon. Pliz let out. Signed, Presedent!”

That can’t be true, right? That sounds like a joke? This is the next big thing in computing! We must have come up with some realistic mitigations this time?

Right?

Unfortunately, no.

We’ve known about this issue from the beginning of ChatGPT:

The more I think about these prompt injection attacks against GPT-3, the more my amusement turns to genuine concern.

I know how to beat XSS, and SQL injection, and so many other exploits.

I have no idea how to reliably beat prompt injection!

I don’t know how to solve prompt injection, Simon Willison

Simon Willison, in particular, has done a good job of documenting the issue:

That last one is worth noting. The solution to prompt injections proposed by OpenAI doesn’t work.

This, in and of itself, should be a surprise. OpenAI is a research company that has been thrust into the consumer software business. They have no competency in software or system security.

That may sound harsh, but they’ve already had a few security-related incidents:

OpenAI admits some premium users’ payment info was exposed. Whoops.
OpenAI Confirms Leak of ChatGPT Conversation Histories. Eh, it’s not as if people are pasting confidential information into ChatGPT, right?
11% of data employees paste into ChatGPT is confidential. I’m sure this is fine. I mean, the rest of tech has faith in them, right?
Big Tech is already warning us about AI privacy problems. For Apple, Samsung, Verizon, and a host of banks “chatGPT has been on the ban list for months”. Welp.

OpenAI doesn’t have an organisation that prioritises or understands software security. Whatever security they have is all patched on—jumbled in like shake-and-bake.

It’s unlikely that we’ll see a realistic and pragmatic solution to prompt injections from OpenAI, which is a problem because both their models and methods are closed, proprietary, and hidden from scrutiny by independent researchers and regulators.

You know researchers and regulators.

They’re the people who usually make sure that software vulnerabilities don’t get out of hand and take down an entire computing ecosystem. They’re the ones who discover unpatched vulnerabilities that companies were trying to pray away even as attackers all over the world are exploiting them.

Y’know, the people on our side.

At the same time all of this is not going on, OpenAI and others are just working so hard at expanding the potential attack surface of these systems.

Who has time to fix vulnerabilities?

They sure don’t.

Now they’re giving us plugins, all the better to let attackers extend their reach into external and internal services. They’re giving us APIs, so you can make sure your own products have prompt injection vulnerabilities. And their partners are giving us prompt-controlled customer service bots, all the better for users to—I don’t know—use their full access to your systems to fix their problems themselves, I guess.

The AI industry is doing it, so it must be a good idea, right?

Right.

Prompts—and with them language models—are not fit for purpose

They are not safe. They are not secure. Language models should neither be integrated with external services nor should they be exposed to external users. You absolutely should not do both.

I know of a couple of ecommerce stores in Iceland that had open SQL injection vulnerabilities for over a decade. They’re fixed now, and they were lucky this never became a disaster.

Maybe you’ll be lucky as well.

All I know is a lot of companies and people are right now using extremely unsafe methods to integrate language models into their products and services. Their only real defence is luck.

And, relying on luck is, generally speaking, a bad strategy for software security.

The best way to support this newsletter or my blog is to buy one of my books, The Intelligence Illusion: a practical guide to the business risks of Generative AI or Out of the Software Crisis.

Prompts are unsafe, and that means language models are not fit for purpose

How inputs are supposed to work

How language model prompts work

Prompts—and with them language models—are not fit for purpose

Join the Newsletter