A few days ago, Anthropic released the newest versions – Opus 4 and Sonnet 4 – of their AI model named Claude. They made some big claims for the new models including calling Opus “the world’s best coding model”. But they also noted some safety concerns. In particular, one of their test scenarios resulted in Opus trying to blackmail an engineer over his martial unfaithfulness. Why did the LLM do this? It was in reaction to emails it had processed – as an AI assistant – which included the news that it was to be replaced with a newer model and that this engineer was in charge of the decision. The batch of emails also contained information about the engineer’s infidelity, and putting all this data together led Opus to a blackmail strategy for trying to prevent its replacement with the new model assistant.
AI systems have actually shown various techniques for self-preservation – including “self-exfiltration” where the LLM tries to transfer its data to another location so it can continue to function. However, in the case of Opus 4, the strategy it produced could be seen as having the flavour of empathy about it: the engineer’s wife and the LLM are both under threat from a new ‘model’ and fight hard against his unfaithful conduct. Of course, the model isn’t actually capable of empathy: LLM’s aren’t conscious, don’t have emotions and don’t care for others. But it would be interesting to know if its training corpus (all the words fed into it) produced a stronger possibility of responding with blackmail when adultery was in its context window. So if the engineer’s dark secret was entirely different, would Opus had gone down a different route?
Of course, blackmail is, sadly, a very common feature of our inter-connected age and so it may be embedded into the model’s weights for that reason, if no other. Individuals and organisations are regularly confronted with the threat of an important secret stolen from them being released, unless they comply with the thief’s demands. The secret may be something embarrassing, or incriminating, or problem-causing, or (especially for companies and nations) useful to rivals and enemies. LLM’s are far from alone in the electronic blackmail trade; it’s a big problem.
What helps the Christian when faced with such blackmail? It depends upon the circumstances. But if it is of the sort where something embarrassing is going to be released about us then we do have one simple option. When the blackmailer – human or AI – says that our secret will come out in public, the Christian can choose simply to reply with “I know”. Why? Because the apostle Paul wrote long ago of a day when “God judges the secrets of men by Christ Jesus” (Romans 2:16). All of our secrets are due to be examined and that by the purest of judges. Compared with that definite future, a bit of possible embarrassment now before other sinners takes on a different hue. It’s still a nasty scenario to face. But awareness of God’s judgement day gives us a bigger perspective.
Will AI blackmail grow? Probably. Especially as the systems find techniques for fabricating evidence of misbehaviour we’ve not done: fake photos and CCTV, or phoney email trails. But let’s not panic. We have a much weightier concerns to think about. Eternal ones.
Photo by Baptista Ime James on Unsplash
All posts tagged under technology notebook
Introduction to this series of posts
Cover photo by Denley Photography on Unsplash
Scripture quotations are from the ESV® Bible (The Holy Bible, English Standard Version®), © 2001 by Crossway, a publishing ministry of Good News Publishers. Used by permission. All rights reserved. The ESV text may not be quoted in any publication made available to the public by a Creative Commons license. The ESV may not be translated in whole or in part into any other language.