The person who came up with the term RAG is the first to admit it’s not a great name.
“We definitely would have put more thought into the name had we known our work would become so widespread,” says Patrick Lewis, the lead author of the 2020 paper that coined the term. Originally an AI research scientist for Meta, he now leads a RAG team at AI startup Cohere, according to a blog post by NVIDIA. “We always planned to have a nicer sounding name, but when it came time to write the paper, no one had a better idea.”
Aside from it being an unappealing word, RAG is a term already used in IT project management, for red, amber or green to indicate the progress of various tasks. But the AI meaning for the term has stuck.
What is RAG?
But what is RAG? It stands for retrieval-augmented generation, and like so many things these days, it’s related to generative AI and how to overcome the limitations of large language models (LLMs).
“RAG provides a way to optimize the output of an LLM with targeted information without modifying the underlying model itself; that targeted information can be more up-to-date than the LLM as well as specific to a particular organization and industry,” according to Oracle. “That means the generative AI system can provide more contextually appropriate answers to prompts as well as base those answers on extremely current data.”
In other words, RAG does just what its name suggests: It improves GenAI results by augmenting them from additional data that it retrieves from someplace else.
Why is RAG Important?
But why do companies want it?
“Organizations want AI tools that use RAG, because it makes those tools aware of proprietary data without the effort and expense of custom model training,” according to GitHub. “RAG also keeps models up to date. When generating an answer without RAG, models can only draw upon data that existed when they were trained. With RAG, on the other hand, models can leverage a private database of newer information for more informed responses.”
In particular, it’s thought that RAG can reduce the likelihood of GenAI systems from “hallucinating” or making things up. “RAG has shown promise in reducing false information outputted by LLMs in domain-specific contexts, because RAG involves retrieving the appropriate document for the LLM to base its response on,” according to the legal database blog JD Supra, noting that researchers at Stanford University previously found that legal AI tools hallucinated 58%-82% of the time on legal queries.
That said, RAG is no panacea — even with it, Stanford researchers found that hallucination still occurs in approximately 17%-34% of responses, JD Supra says.
Other benefits include the ability of companies in highly regulated or highly secure industries, such as law, health care and defense, use intellectual property, to be able to use GenAI without having to train an LLM using their data.
How Does RAG Work?
In its simplest form, RAG works by retrieving additional data from another source and providing it to the stored knowledge the LLM already has. However, there’s a lot more to it than that, according to AWS:
“When implementing a RAG-based application, two basic concepts must be understood: retrieval, the 'R' and generation, the 'G.' Although a lot of attention is often given to the generation part due to the popularity of LLMs, getting the most out of RAG really depends on having the best retrieval engine possible — full stop. For the retrieval part of RAG to work well, one has to carefully consider various steps and design choices for both the ingest flow and the query flow.”
That means, first, converting the data into a form the RAG system can read, and, second, breaking it up into bite-size “chunks” from which the RAG system derives data.
Other best practices include reducing the chunk size, increasing the chunk overlap, increasing the number of retrieved chunks and reranking data to make more relevant chunks more likely to be used, according to DataCamp.
And whatever external data sources the RAG system is using need to be carefully vetted and regularly updated, according to AWS. Providing additional contextual information doesn’t help if the additional information is itself incorrect or outdated.
When is RAG Done?
Organizations that want to use RAG need to take some steps ahead of time to ensure that the GenAI system has access to the additional data, GitHub says. First, the organization decides what additional information it wants its GenAI to be able to use. Those could include documents — in which case they then need to be indexed in a database where the RAG system has access to them ‚— or internet sources, in which case the RAG system needs to be told about them.
Then, the process works like this, according to GitHub:
- A user asks a question to the GenAI system.
- RAG uses the internal search engine to find relevant code or text from indexed files to answer that question.
- The internal search engine conducts a semantic search by analyzing the content of documents from the indexed repository, and then ranking those documents based on relevance.
- The GenAI system then uses RAG, which may also conduct a semantic search, to find and retrieve the most relevant snippets from the top-ranked documents.
- Those snippets are added to the prompt so the GenAI system can generate a relevant response for the user.
Where is RAG Implemented?
Here are some of the most common use cases for RAG, according to data and AI company Databricks:
- Question and answer chatbots: Adding RAG to chatbots means they can get more accurate answers from company documents and knowledge bases.
- Search augmentation: Incorporating LLMs with search engines that augment LLM-generated answers with search engine results can better answer informational queries with more nuanced or updated data and make it easier for users to find the information they need to do their jobs.
- Knowledge engine: Using company data, whether it’s information about customers or corporate information, means the company data can be used as context for LLMs and allow employees to get answers to their questions easily, including HR questions related to benefits and policies and security and compliance questions, or information specific to the company’s customers.