How LLM Embeddings Work in Retrieval Augmented Generation

User Submits Query to LLM

The interaction begins when the user asks a question. For instance, they might type, "What are effective strategies for managing remote teams?" Before sending this directly to the LLM, the application adds context and refines the prompt to enhance its clarity and focus.

RAG Application Generates Query Vector and Performs Similarity Search

Next, the RAG (Retrieval-Augmented Generation) application takes the refined prompt and converts it into a numerical representation known as an embedding. This involves using an embedding model that transforms the user's question into a vector. The application then submits this query vector to a vector database.

The vector database conducts a semantic search by comparing the query vector to existing embedding vectors of documents indexed in the database. It utilizes a distance metric, such as cosine similarity or dot product, to find documents that are most similar to the query. This method, known as approximate nearest neighbour search, ensures that the application retrieves relevant information effectively, outperforming traditional keyword searches.

RAG Application Augments the Prompt Using Semantic Search Results

With the relevant documents retrieved, the RAG application synthesizes this information to create an augmented prompt. This new prompt incorporates insights from the search results along with the original user query. The enhanced prompt is then structured in a way that guides the LLM to generate a more comprehensive response.

LLM Generates a Response Based on the Augmented Prompt

Finally, the LLM processes the augmented prompt and generates a response. Because the prompt now includes context from the relevant documents, the LLM can provide detailed information about effective remote team management strategies, even if its training data did not specifically cover the latest best practices. The user receives a tailored and informative answer that addresses their original query effectively.