Retrieval Augmented Generation

Created
TagsGENAI

Retrieval-Augmented Generation (RAG) is a machine learning model architecture that combines retrieval and generation capabilities, particularly for natural language processing (NLP) tasks such as question answering, text summarization, and dialogue systems. The RAG model operates by first retrieving the most relevant information from a large knowledge base given an input query, such as a question or a text snippet. Then, it uses this information as context to feed into a generative model to produce accurate, information-rich output.

How It Works

The RAG model typically follows these two main steps:

  1. Retrieval Phase: Given a query (e.g., a question or a piece of text), the model uses a retrieval system to find the most relevant documents or document fragments from a large collection of documents (e.g., Wikipedia or a specialized database). This step often utilizes traditional information retrieval technologies like inverted indexes or modern vector-based similarity search techniques, such as Dense Vector Retrieval.
  1. Generation Phase: The retrieved documents or fragments, along with the original query, are fed into a pretrained language generation model (e.g., GPT or BART) to generate the final output. In this step, the generative model considers not only the information from the original query but also integrates the retrieved external knowledge, enabling it to produce more accurate and detailed outputs.

Applications

RAG models are particularly suited for tasks requiring support from external knowledge, such as:

Advantages

Challenges

RAG models represent an important direction in the NLP field, significantly enhancing the model's capability to handle complex tasks by combining retrieval and generation.

RAG system:

Now we have multimodal retrieval augmented generation (MM-RAG)

https://medium.com/@bijit211987/multimodal-retrieval-augmented-generation-mm-rag-2e8f6dc59f11