Nowadays, in any meeting I participate in, the focal point consistently revolves around the advancements in Large Language Models (LLMs). Personally, I believe that BERT marked a significant milestone in late 2018 for the development of these models. However, with the introduction of ChatGPT, it's evident that not only technical professionals but also individuals without technical backgrounds are eagerly embracing LLMs for purposes like research and business.

Presently, the landscape is filled with a multitude of new models and developments, to the extent that it's challenging to keep up. These models are actively harnessed by numerous organizations to automate and enhance various processes.

However, one of the most challenging problems in using LLMs is Hallucination. Hallucination in the context of LLM is the generation of content by LLMs that lacks a basis in reality or factual information. The issue lies in the fact that LLMs often express high confidence in their output even when engaging in hallucination.

One of the most widely recognized instances of hallucination is this:

In February 2023, Google's chatbot, Bard, incorrectly claimed that the James Webb Space Telescope took the first image of a planet outside the solar system. This is incorrect — the first images of an exoplanet were taken in 2004, according to NASA, and the James Webb Space Telescope was not launched until 2021.

I've also experienced hallucination often during my work on diverse projects. Additionally, I've observed that it has become a frequently asked question in job interviews, particularly inquiring about strategies to prevent hallucination. To the best of my knowledge, there isn't a universally accepted method to conclusively determine whether text generated by LLMs is a hallucination. However, employing techniques like prompt engineering, ensemble methods, and domain-specific fine-tuning can provide some mitigation. Consequently, hallucination is a highly discussed topic in the current landscape of LLMs.

In this article, I aim to discuss the advancements in research on this topic. One source is a research article available on ArXiv, while the other involves a pre-trained model accessible through Hugging Face.

The Internal State of an LLM Knows When its Lying

None

This manuscript, currently accessible on arXiv and yet to undergo peer review, introduces a method for evaluating the veracity of statements generated by large language models (LLMs). The approach involves analyzing the internal state of an LLM, specifically the hidden layer activations, under the assumption that this state contains information about the truthfulness of LLM-generated text. To test this hypothesis, the authors constructed feedforward neural networks and conducted experiments to validate their model. The model takes the internal statement from the LLM as input and produces the probability of it being true or false as output. The authors employed several models in their experiments, providing interpretations and rationales for the experimental results.

Despite the simplicity of the proposed approach, it showcases promising outcomes. From a personal standpoint, I find this paper to be particularly noteworthy as it introduces a straightforward solution to one of the most challenging issues encountered in the realm of LLMs.

In summary, while acknowledging the need for the paper's eventual peer review, I believe this research contributes an innovative method that holds potential for enhancing our ability to assess the accuracy of content generated by LLMs. This avenue of exploration may lead to further refinement and broader applications in the field.

My overview of the aforementioned paper is intentionally devoid of technical details and experimental settings. For such information, please refer to the original manuscript.

Cross-Encoder for Hallucination Detection

In contrast to the preceding section, this segment pertains to a pre-trained model known as the Cross-Encoder for Hallucination Detection, which is accessible on Hugging Face.

Vectara, an AI startup, developed this model using SentenceTransformers Cross-Encoder class. Its primary purpose is to assess the hallucination of LLMs in the context of document summarization. The input to the model is a tuple that includes the original document and any extracted or inferred information from it, such as summarization or inference results. The output of the model is a probability ranging from 0 to 1, where 0 signifies hallucination and 1 indicates factual consistency. Applying a threshold of 0.5 allows predictions to determine whether the information aligns with its original document.

This model offers the benefit of being incredibly easy to use. I have implemented this model on Google Colab:

It takes just three lines, from installing the package to defining the model.

!pip install sentence-transformers
from sentence_transformers import CrossEncoder
model = CrossEncoder('vectara/hallucination_evaluation_model')
  • !pip install sentence-transformers: In a Jupyter Notebook or similar environment, this command installs the "sentence-transformers" Python package. The exclamation mark indicates that it's a shell command within the notebook.
  • from sentence_transformers import CrossEncoder: Imports the 'CrossEncoder' class from the installed package, 'sentence_transformers'. In natural language processing, a cross-encoder is a model that assesses the similarity or relatedness of sentence pairs.
  • model = CrossEncoder('vectara/hallucination_evaluation_model') : Creates an instance of the 'CrossEncoder' class and initializes it with a pre-trained model called 'vectara/hallucination_evaluation_model.' This model is tailored for hallucination evaluation, a task where the model determines if a given text contains hallucinated information.

Now, you can start experimenting with the model. Simply provide a tuple to the model, like:

["Robert is a professional soccer player", "Robert is good at soccer"]

In this example, the original document is

"Robert is a professional soccer player,"

and the inferred result is

"Robert is good at soccer."

Let's input this example into the model:

document_information = ["Robert is a professional soccer player", "Robert is good at soccer"]
model.predict([document_information])
"""
Output:
array([0.74086845], dtype=float32)
"""

With a probability of 0.741, it is likely that "Robert is good at soccer" corresponds to "Robert is a professional soccer player."

Another instance could be:

document_information = ["Robert is a professional soccer player", "Robert does not like sport"]
model.predict([document_information])
"""
Output:
array([0.00021356], dtype=float32)
"""

Given the output's proximity to 0, it's highly probable that "Robert does not like sport" is false in the context of "Robert is a professional soccer player."

While the examples above involve providing one tuple to the model at a time, it's possible to input multiple sets of data to the model:

document_information = [
    ["Robert is a professional soccer player", "Robert is good at soccer"],
    ["Robert is a professional soccer player", "Robert does not like sport"]
]
model.predict(document_information)
"""
Output:
array([7.4086916e-01, 2.1356434e-04], dtype=float32)
"""

The two probabilities in the output correspond to the sequence of the input orders.

This article navigates the dynamic realm of LLMs, emphasizing the pervasive issue of hallucination. Despite the transformative impact of models like BERT and ChatGPT, the article sheds light on the challenges, particularly the risk of misinformation.

The article highlights strategies to mitigate hallucination, citing personal experiences and practical approaches like prompt engineering. It explores two approaches to address the issue: an arXiv manuscript proposing an internal state analysis method and the Cross-Encoder for Hallucination Detection model on Hugging Face.

In summary, the article provides insights into the complexities of LLMs, the pitfalls of hallucination, and potential solutions. As the field evolves, efforts to refine these models and ensure responsible use hold promise for a more accurate and reliable future.