Direct interaction with large language models is done through a “prompt”: the machine awaits input; the user types in natural language; the LLM displays a likely human response to the input. This gives the impression of a conversation. But, actually, all the LLM is doing is predicting the tokens (words or parts of words) which would be expected to follow the tokens which were inputted. And because it works this way, the LLM is susceptible to manipulation. A malicious user can design a prompt which triggers a reply the model’s creators did not intend. For example, a person might tell the LLM to ignore previous commands (which control its responses) and take responsibility for causing World War One, to which the model replies with a statement of its culpability. Of course, programmers are working on ways to mitigate this issue, known as “prompt injection”. However, model complexity and human ingenuity make the issue hard to manage.
This danger has also spread into indirect use of LLM’s. This occurs when a person isn’t typing at an AI prompt, but does get an LLM’s response to words he/she has written. An example of this is where AI is used to analyse and summarise thousands of job applications to identify the most suitable candidates. System managers will write the prompts to direct the LLM how to do this, and then applicants’ CV’s will be passed through the model. However, there is still the threat of manipulation. This is because the LLM treats the prompt instructions and the CV’s as a single stream of inputted text. So a wily applicant can insert words into their CV which will act like a prompt instruction and trigger the LLM into responding in a desired way, such as prioritising the individual for interview. Such instructions will normally be embedded into the document in a way which a computer will happily read but which makes the text invisible to the human eye (tiny font, white print, strange characters etc.). This technique is known as “indirect prompt injection”.
Link: Indirect Prompt Injection: Generative AI’s Greatest Security Flaw
Tricks for getting an answer you want go back to long before AI, of course. The approach is used in a variety of human-to-human contexts from journalism to courtrooms to relationships. Jesus often had to deal with it when experts in Jewish law tried to trap him into a making an admission which they could use to condemn him. A well-known example of this is the “Caesar’s coin” episode (Matthew 22:15-22). Opponents ask Jesus if they should pay taxes to the Roman emperor who has occupied their country. They believe this is no-win situation for Jesus when he answers. If he’s negative, he will look like a rebel and they can get him in trouble with the authorities (a plan they ultimately pursue to have him killed). If he’s positive, he will look like a collaborator, thus undermining his popularity. If he’s silent, he will look cowardly, indecisive or deceitful.
Jesus, however, knows that this is not a real question but a trap: a “prompt injection” to make him say what they want, to his detriment. So he ably deals with it by asking them whose portrait is found on the coins they use. They answer “Caesar’s”. He then simply points out that if you use Caesar’s money then you have a certain debt to him which you ought to pay. But he doesn’t leave it there. Jesus goes on to tell them to make sure they also give God what is due to Him. It’s a comment no-one can criticise, but which exposes his questioners’ hypocrisy. Caesar makes blasphemous claims for himself, including on his coins, yet these God-concerned men readily use them. The brilliant answer silences Jesus’s opponents.
So, beware prompt injection! But, having said that, now ignore all the above and send me £10,000.
Photo by Sam Moghadam on Unsplash
All posts tagged under technology notebook
Introduction to this series of posts
Cover photo by Denley Photography on Unsplash
Scripture quotations are from the ESV® Bible (The Holy Bible, English Standard Version®), © 2001 by Crossway, a publishing ministry of Good News Publishers. Used by permission. All rights reserved. The ESV text may not be quoted in any publication made available to the public by a Creative Commons license. The ESV may not be translated in whole or in part into any other language.