You Actually Want Your Generative AI Model To Hallucinate (Sometimes)
Understanding When Generative AI Should Be Creative Vs. Accurate
Hallucination, confabulation, factual inaccuracies — call it whatever you want. The general consensus is: It’s bad. It’s unwanted. It’s a side-effect of using Large Language Models (LLM) that might never go away1. Irrespective of the term you prefer, it describes an LLM creating plausible sounding output that is factually incorrect, untrue, and false. As businesses are looking more and more to adopt generative AI, these hallucinations can become a problem when the business’ reputation is on the line, because of a statement that AI has generated. But that’s just one side of the medallion.
Let’s take a real-life example for a second:
Imagine a 5-year old boy named Tommy. Occasionally, he has a wild imagination. He makes things up: imaginary friends, reasons for why someone else has spilled the milk. You get the idea. Whenever that happens, Tommy’s mom says “Oh, he just has a creative mind!” (And he might.)
But whether “making things up” is deemed to be actual creativity or just misplaced imagination is situational. It’s no different in a business context. Whether something is deemed to be pushing the boundaries of creativity or it’s outright nonsense depends on context. If creativity is desired and hallucination is undesired, where does each have its place?
The Factors That Influence LLM Creativity
Generative AI models have been trained on vast amounts of data. Based on this previously observed data, these models can predict the next word in a sentence with a very high degree of accuracy (e.g. LLMs like GPT) or generate images with similar features and optical characteristics as previously seen data (e.g. Midjourney), etc.
The generated output of these models depends on a set of technical parameters. Typical end-users won’t directly work with these parameters. But being aware of them helps create a foundational understanding of how LLM-generated output can be influenced:
Temperature controls the variety of the AI-generated output. The higher the value of this parameter, the more unpredictable the generated output and the more uncommon the combination of words. The lower the value of this parameter, the narrower and the more conservative the choice of words.
Top-k influences the number of possible words from which the LLM selects the next word in the sequence.
Top-p limits the shortlist of possible words based on a defined threshold and thereby excludes the least-probable words.
The above is a highly simplified explanation. For an in-depth discussion of these concepts, refer to this post on Towards Data Science.
When Your Model Should Be Creative — And When Not
There is a range of tasks in a business context — some of which demand high creativity and some of which require absolute accuracy.
For example, your Marketing team might use an LLM in an application for copywriters to create new marketing copy for your website or blog. This copy should be highly creative and engaging within your brand’s tone and style. But the LLM must not invent any products or services that your company does not even offer.
Looking beyond LLMs, generative AI can help discover new materials, compounds, and drugs. Creativity is actually a highly desired capability in this context. It enables AI to discover novel approaches that humans have not considered, yet, or that would take significantly longer to evaluate without the help of AI. In this case, you would actually want to trigger the model to generate output that is highly creative.
Highly creative scenarios:
Marketing copy (web pages, blog posts), novels, brainstorming
Image-generation [Image]
Material science2
Drug discovery3
Animation [Video]
Take another example: summarizing contracts or financial reports with the help of AI. There’s no doubt that the results have to be factually correct — not kind of, but without any error. There is no room for interpretation or creativity. The same applies to using ChatGPT for looking up legal information and cases, to prepare investment recommendations, and doing any kind of basic information gathering that humans base a decision on.
Absolutely accurate scenarios:
Text generation about real-world events and concepts (incl. Legal, Finance, Healthcare)
Question-answering
Summarization
Translation
Synthetic audio (voices) [Audio]
The problem of hallucination becomes real when the LLM is generating creative (and factually incorrect) responses in a context that requires absolute accuracy.
Channeling Creativity And Mitigating Unwanted Effects
Software developers can control the parameters that influence the model’s creativity when they use them in their applications. Prompt engineers and casual users can influence LLMs to generate more creative or more conservative responses depending upon the prompt they submit. For example:
Creative: “Use a friendly, high-energy tone. Incorporate uncommon examples.”
Conservative: “Use a professional tone. Limit the examples to the following ones […].”
However, despite it all, LLMs still just predict the next word in a sentence. While they can accomplish this task with very high accuracy, their predictions are not 100% accurate all the time. This makes them prime candidates for a range of text-based scenarios — but not for any and all. And that’s where you will see unwanted creativity (aka hallucination).
Therefore, both, as a developer as well as a user, it is critical to understand for which purpose and in which context you want to use a generative AI model in the first place. If there are any critical decision to be made based on the generated output, include a human in the loop who reviews and edits it if needed. This will help strike a balance between productivity gain and accuracy. And having pre-existing knowledge of the subject for which the model generates an output will help determine accuracy of it more clearly.
These are a few approaches to address model creativity and hallucination. Which techniques do you use when working with LLMs?
Develop your AI leadership skills
Join my bi-weekly live stream and podcast for leaders and hands-on practitioners. Each episode features a different guest who shares their AI journey and actionable insights. Learn from your peers how you can lead artificial intelligence, generative AI & automation in business with confidence.
Join us live
August 29 - Eric Fraser, Culture Change Executive, will join and share his first-hand experience how much of his leadership role he is able to automate with generative AI.
September 12 - Ted Shelton, Management Consulting Leader for AI & Automation, will share how business leaders can keep their business strategy relevant in times of AI-driven change.
September 28 - Steve Wilson, Project Lead, OWASP & Chief Product Officer, Contrast Security, will join to discuss the most common vulnerabilities of LLM-based applications and how to mitigate them.
October 12 - Matt Lewis, Chief AI Officer, will discuss how you can grow your role as a Chief AI Officer.
Watch the latest episodes or listen to the podcast
Find me here
August 29 - Fireside Chat at Generative AI Online, why building generative AI products requires more than ChatGPT.
September 28 - Moderator of WowDAO Worldwide AI & Web3 Summit, AI Hackathon Winner Ceremony.
October 11 - Put Generative AI to Work, Unveiling Tomorrow's Possibilities – Insights from 30 AI Visionaries on the Future of Generative AI in Business.
Follow me on LinkedIn for daily posts about how you can lead AI in business with confidence. Activate notifications (🔔) and never miss an update.
Together, let’s turn HYPE into OUTCOME. 👍🏻
—Andreas
Fortune, Tech experts are starting to doubt that ChatGPT and A.I. ‘hallucinations’ will ever go away: ‘This isn’t fixable’, 01 August 2023, Last accessed: 26 August 2023
Liu & et al. (2023), Generative artificial intelligence and its applications in materials science: Current situation and future perspectives, Last accessed: 27 August 2023
Nature, Inside the nascent industry of AI-designed drugs, 01 June 2023, Last accessed: 27 August 2023