Glossary — Agentic AI

What is Retrieval-Augmented Generation (RAG)?

1 min read Updated

Retrieval-Augmented Generation (RAG) is an architecture that enhances LLM responses by retrieving relevant documents from an external knowledge base and including them in the model's context before generation.

WHY IT MATTERS

RAG solves one of the fundamental limitations of LLMs: their knowledge is frozen at training time. By retrieving relevant documents at inference time and injecting them into the prompt, RAG gives models access to current, domain-specific, and proprietary information.

The architecture is straightforward: a query is embedded, similar documents are retrieved from a vector store, and the retrieved text is added to the LLM's context. The model then generates a response grounded in the retrieved information.

For financial agents, RAG is crucial. An agent managing a portfolio needs current price data, recent news, and up-to-date protocol documentation — none of which exist in the model's training data.

FREQUENTLY ASKED QUESTIONS

How is RAG different from fine-tuning?
Fine-tuning changes model weights permanently. RAG retrieves knowledge at query time without modifying the model. RAG is cheaper, easier to update, and keeps the knowledge source auditable.
What are RAG's limitations?
Retrieval quality is the bottleneck. If wrong documents are retrieved, the model generates plausible but incorrect answers. Chunking strategy and embedding quality matter enormously.
Can RAG eliminate hallucinations?
RAG reduces but doesn't eliminate hallucinations. The model can still generate text that contradicts the retrieved documents or blend facts incorrectly.

FURTHER READING

BUILD WITH POLICYLAYER

Non-custodial spending controls for AI agents. Setup in 5 minutes.

Get Started