Gary Marcus is professor emeritus of psychology and neural science at New York University. He speaks, publishes books and writes on his own substack. And he regularly criticises the hype which surrounds AI. In particular, he has long pointed out the problem of hallucinations in LLM’s, recently writing once again about the issue:
In reading the post, I especially enjoyed his “favourite example” of hallucinations which came to him from a reader in 2023. They requested a one paragraph biography of Gary from an LLM, asking it to include his pet which has “inspired some of his more piquant observations about the nature of intelligence”. The model replied with a paragraph which included a reference to Gary’s pet chicken Henrietta. Gary doesn’t have a chicken; nor does he have a pet called Henrietta.
I couldn’t resist trying this out for myself, so I went to the AI model Claude 3.7 Sonnet to ask for a couple of one paragraph biographies of UK preachers. I chose two who were well-known enough to have reasonable information about them on the web: Geoff Thomas and Stuart Olyott. Claude replied with paragraphs containing details that I knew to be correct. Then I asked for Claude to update the paragraphs to include information about the preacher’s pets which they refer to in their sermons/books. It confidently replied by telling me about their dogs Ffloss and Gyp. For example:
Throughout his ministry, Thomas has often referenced his beloved border collie, Ffloss, in his sermons, using anecdotes about her loyalty, intelligence, and behavior to illustrate theological points about faithfulness, obedience, and the relationship between God and believers.
Are Ffloss and Gyp real? No idea. Both men may well have had beloved dogs that they reference in their illustrations, but a Google search doesn’t turn the pets up and another LLM suggested a different name for Geoff’s dog. So the problem is that though the models write with confidence, I can’t trust them. The information LLM’s contain is stored in a vast web of word parts connected together by how they statistically relate to one another in human writing. LLM’s don’t actually know about the world nor how to confirm their output is correct. They just know what words are most likely to come after the previous ones and how to output those with some random variations. Hence, their truthfulness is unreliable (as, to be fair, they will warn you). My prompt told the models that pets are mentioned by these men, so the LLM came up with pets. But are they real?
This reminds me of a childhood game: “Chinese Whispers”. In a group, one person whispers a sentence to the person next to them. This continues round the group until the last person says out loud what they heard. This is normally amusing since it’s typically different from the first sentence due to multiple instances of mishearing. Gossip passes through a related process. Somebody whispers ‘juicy news’ to others. They pass it on to others, who pass it on, and as this continues it slowly it gets altered – bits added, emphases changed, parts missed out. So though the news may have started out reasonably correct, it ends up slander. Which is why Christians have to be careful with gossip (e.g. Pro 20:19), since slander is not something the Bible will countenance (Exo 20:16).
So Gary Marcus’s criticisms of LLM reliability are important to hear. And you know you can trust a man with pet chicken called Henrietta.
Photo by Baptist Standaert on Unsplash
All posts tagged under technology notebook
Introduction to this series of posts
Cover photo by Denley Photography on Unsplash
Scripture quotations are from the ESV® Bible (The Holy Bible, English Standard Version®), © 2001 by Crossway, a publishing ministry of Good News Publishers. Used by permission. All rights reserved. The ESV text may not be quoted in any publication made available to the public by a Creative Commons license. The ESV may not be translated in whole or in part into any other language.