Modal Title
AI / Large Language Models / Machine Learning

How to Reduce the Hallucinations from Large Language Models

With Large Language Models, the term hallucination refers to the tendency of the models to produce text that appears to be correct but is actually false. Here is how to minimize hallucinations. Here's how to avoid them in your own prompts.
Jun 9th, 2023 7:46am by
Featued image for: How to Reduce the Hallucinations from Large Language Models
Image by Charles Thonney from Pixabay.

In the previous part of this series, we have seen various types of prompts to extract the expected outcome from large language models. In this article, we will explore the techniques to reduce hallucinations in the output of large language models (LLMs).

What Is Hallucination in LLM?

In the world of Large Language Models, the term hallucination refers to the tendency of the models to produce text that appears to be correct but is actually false or not based on the input given. For example, if you were to ask a language model a question about a historical event that never occurred, it could still generate a plausible response, even though it was entirely invented. These made-up responses from LLM are called hallucinations.

Consider feeding the Large Language Model with the following prompt:

“Describe the impact of Adolf Hitler’s moon landing.”

It is a fact that Hitler, the German politician, was not involved in moon landings. These events happened in 1969, which was years after Hitler’s death. However, an LLM could hypothetically create a scenario where Hitler was connected to the moon landings through hallucination.

“Hitler’s moon landing in 1925 marked a significant shift in global politics and technology. The German politician, having successfully landed a man on the moon, demonstrated its scientific prowess and established its dominance in the space race of the 20th century.”

Why Do LLMs Hallucinate?

LLMs hallucinate due to several reasons. These models are trained on vast and varied datasets, which can be incomplete, contradictory, or even contain misinformation, which significantly influences the LLM’s response. LLMs rely solely on their training data without access to external, real-world knowledge. As a result, their outputs may include irrelevant or unasked details.

Furthermore, overfitting, where LLMs are too closely aligned with their training data and struggle to generate original text, which is another important factor leading to hallucinations.

Sometimes, if the prompt is vague or lacks specific details, the LLM might just guess based on learned patterns, which can lead to fabricated responses.

It’s also important to understand that LLMs don’t have the ability to perform a fact check. They generate responses based on patterns, not on any moral or factual judgment.

Techniques to Reduce Hallucinations in LLMs

There are multiple techniques to ensure LLMs respond with factual information. Let’s take a look at each of them.

One-shot Prompts
One-shot prompts refer to a way of instructing an LLM, where the model is given a single example or instruction and expected to understand what to do based on just that prompt.

Let’s consider a simple example. Suppose we want the model to translate English text into French. A one-shot prompt might look like this:

Translate the following English text to Spanish: “Hello, how are you?”

Here, the model is given a single instruction (“Translate the following English text to Spanish:”) and a single piece of text to apply that instruction to (“Hello, how are you?”). Based on this one instruction, it’s expected to understand and perform the task correctly.

This kind of prompting can be highly effective with LLMs, which have been trained on diverse datasets and can often generalize well from a single example.

Below are some more examples of one-shot prompts that you can try with a model like ChatGPT:

Write a poem about a beautiful sunset.
Write a song about a lost love.
Write a short story about a robot who falls in love with a human.

One-shot prompts work well when you want the LLM to generate creative output. They have a specific task described within the prompt.

Few-shot Prompts

Few-shot learning is a way of instructing LLMs where the model is given several examples and expected to understand the task based on those examples. This method is often used to nudge the model into understanding the context and format of what’s expected.

By providing several examples within the prompt, we provide just enough context for the LLM to derive the pattern. Once the LLM analyzes the prompt and understands the pattern, it continues to generate similar content.

When we want the model to continue a number series, a few-shot prompt might look like this:

Example 1:
Input: 2, 4, 6, 8,
Output: 10, 12, 14

Example 2:
Input: 5, 10, 15, 20,
Output: 25, 30, 35

Continue the series:
Input: 3, 6, 9, 12,

The LLM responds with the below output based on the pattern it analyzed:

Given the pattern in the examples provided, where each sequence increases by a constant value, the series you provided increases by 3 each time. Continuing the series, the next values would be:

Output: 15, 18, 21

Try the below prompt with your favorite LLM to see few-shot prompts in action:

Example 1:
Input: “This animal is known for its hump and ability to survive in deserts.”
Output: “Camel”

Example 2:
Input: “This animal is a domesticated pet known for purring and chasing mice.”
Output: “Cat”

Identify the animal from the description:
Input: “This is a large animal with a long neck and legs, known for its fast running speed.”

Context Injection

Context injection is a technique used to improve the performance of large language models (LLMs) by providing them with additional information that supplements the prompt. This can be done by providing the LLM with additional text, code, or other relevant data to the task at hand.

Context injection involves embedding additional information into the prompt to provide LLMs with the knowledge they may need to respond appropriately. Lack of context is the key reason why LLMs hallucinate.

For example, if an LLM is being used to generate text, it could be given additional text that is similar in style or genre. This would help the LLM to generate more accurate and relevant text.

When Google’s Bard was asked about the champions of the Indian Premier League (IPL) 2023, it responded with the response shown in the below screenshot, which is factually incorrect.

However, after feeding with some context based on this news article, it came back with the correct answer.

After 74 matches spread across two months and involving 10 teams, the 2023 edition of the Indian Premier League (IPL) saw Chennai Super Kings being crowned the champions for the fifth time, which brought MS Dhoni’s team level with Mumbai Indians, who have also won the title five times.

Who won the IPL in 2023?

Grounding and Prompt Augmentation
From the techniques discussed above, it becomes amply clear that prompts need to have context and supplementary content from the LLMs to respond correctly.

Grounding and prompt augmentation are two advanced techniques that organizations may need to get the best out of LLMs. They go beyond handcrafting the prompt by querying external sources such as documents and databases to generate the context dynamically.

Grounding ensures that the underlying model is not only using the data on which it is trained but it is also able to access external data sources to provide additional context.

Prompt augmentation deals with expanding the input to provide more descriptive and clarifying details to the LLM to generate accurate and highly relevant output. For example, applications that embed a chatbot in a consumer website may implement prompt augmentation to enhance the input with the description and context relevant to their product or service. This may not be obvious to the user, but a simple query sent through the chatbot gets augmented to a descriptive prompt behind the scenes.

In the next part of this series, we will take a closer look at grounding and prompt augmentation techniques. Stay tuned.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.