SAFE app fact checks LLMs

Marie Donlon | April 18, 2024

Artificial intelligence (AI) specialists at Google's DeepMind have created an AI-based system that can reportedly fact check the results of large language models (LLMs) like ChatGPT.

According to its developers, the AI-based system, dubbed Search-Augmented Factuality Evaluator (SAFE), can autonomously determine the accuracy of LLMs like ChatGP, which have been used to write papers, answer questions and solve math problems, among other tasks.

Source: Public domain

Because the accuracy of the content produced by LLMs has been called into question, LLM products must be checked manually to ensure their accuracy, thereby greatly reducing their value.

As such, the researchers at DeepMind created the AI-based application to autonomously confirm the accuracy of the answers generated by LLMs.

Taking the same approach as humans who use search engines like Google to manually fact-check LLM results, the team at DeepMind created an LLM that breaks down claims or facts in an answer produced by the original LLM and then used Google Search to locate sites to verify the accuracy of those results.

The researchers tested SAFE to verify approximately 16,000 facts featured in answers generated by different LLMs and those results were compared with human (crowdsourced) fact-checkers. According to their research, SAFE matched the findings of the human fact checkers roughly 72% of the time.

DeepMind has made the code for SAFE available for use on the open-source site GitHub.

SAFE is detailed in the article, “Long-form factuality in large language models,” which appears in the journal arXiv.

To contact the author of this article, email mdonlon@globalspec.com