RAG

Retrieval-Augmented Generation (RAG) enhances language models by retrieving relevant documents from a database to provide more accurate and context-rich responses. It combines the strengths of LLMs with external information retrieval to answer queries with up-to-date or specialized information.

How does it work?

Retrieval: The system retrieves relevant documents from a vector database based on the user query.
Generation: These documents are then used by the LLM to generate a response, adding accuracy and context.

High level diagram

Document Integration

The retrieved documents are combined with the user's query and fed into the LLM. This process allows the model to generate responses that are more informed and contextually accurate by leveraging both its pre-trained knowledge and the specific information from the documents.

Connecting to a Vector Database

Vector databases store document embeddings, enabling fast retrieval of relevant documents.

Pinecone

Pinecone offers a managed vector database service. See Get started with Pinecone for more information. Build a RAG Chat with Big Hummingbird.

RAG

How does it work?

High level diagram

Document Integration

Connecting to a Vector Database

Pinecone

Qdrant

Construct query to vector database

Pinecone

Top K

Index

Namespace

Embedding Model

RAG

How does it work?​

High level diagram​

Document Integration​

Connecting to a Vector Database​

Pinecone​

Qdrant​

Construct query to vector database​

Pinecone​

Top K​

Index​

Namespace​

Embedding Model​

How does it work?

High level diagram

Document Integration

Connecting to a Vector Database

Pinecone

Qdrant

Construct query to vector database

Pinecone

Top K

Index

Namespace

Embedding Model