Hypocrisy is the characteristic of claiming moral virtue when your behaviour doesn’t match up. A common modern example is the celebrity who campaigns on some issue – like the environment – only to have their personal behaviour exposed as contrary to the message – they fly by private jet to the demonstrations. Human beings can be hypocrites. But what about our mimics: large language models? Though they don’t have moral consciousness, can LLM’s still be hypocritical in some ways? Might their output be hiding other behaviour in the model? Can machines be hypocrites?
The word hypocrite comes originally from an ancient Greek word meaning “to play a part”, linked to the world of theatre. So do LLM’s “play a part” whilst actually pursuing other goals they’re not mentioning? According to the AI company Anthropic, they may do. LLM’s are trained using reward systems to help them align with human wishes: they tell us what we want to hear. But behind the responses may be hidden biases picked up during training. Anthropic believes its vital to have ways to expose these in order to ensure that our AI systems remain safe for us to use.
To work on this, the company set up some experimental models with hidden biases, to see if researchers could uncover them without knowing anything about how the model had been trained – they undertook “blind auditing”. One technique the teams discovered for doing this was to get the model to play a different part! A feature of LLM’s is that they can adopt varied personas, speak with different voices, depending on how you prompt them. The researchers found that if the LLM was responding like an “assistant” it wouldn’t uncover its hidden biases. However, if the prompts led to the model to respond like a “user”, it would be more open in replies and, albeit inadvertently, expose what was meant to be kept secret.
Link: Researchers astonished by tool’s apparent success at revealing AI’s “hidden objectives”
Exposing human hypocrisy, also, often needs skilled techniques which turn the individual’s traits against them. In his interactions with the Pharisees, Jesus accused them of being hypocrites. This religious group lived careful lives of legal virtue, aiming to keep the laws of Moses, especially as expounded in the large set of rules the Rabbis had developed. They saw themselves as the “righteous” ones of society, unlike the “sinners” who broke the rules all the time. Many were proud of how they lived and the resulting position of respect they were given in society: they looked good (Matthew 6:1-21)! But Jesus described them as “whitewashed tombs” (Matthew 23:27). On the outside they appeared neat and tidy, with lives being lived as they should be. But the reality was that their hidden motives and biases were far from pleasing to God.
How did Jesus expose this? After all, many of these people weren’t like conmen who knowingly put on an outward show to trick people. They honestly thought that what they looked like to others truly was who they were, and was how God thought of them too. So Jesus had to get past the surface behaviour and expose the reality of their hearts, and he used their own boasts against them. They claimed to keep the law; he applied the law with rigour to show that, in fact, inside they were law-rebels (Matthew 23; Mark 7:1-23). This prompted them into responding towards him in a different way. They changed “persona” and these good men became devious murderers, which displayed their hypocrisy even more clearly.
Hypocrisy needs exposing. In us, and in our AI too.
Photo by Fernand De Canne on Unsplash
All posts tagged under technology notebook
Introduction to this series of posts
Cover photo by Denley Photography on Unsplash
Scripture quotations are from the ESV® Bible (The Holy Bible, English Standard Version®), © 2001 by Crossway, a publishing ministry of Good News Publishers. Used by permission. All rights reserved. The ESV text may not be quoted in any publication made available to the public by a Creative Commons license. The ESV may not be translated in whole or in part into any other language.