RAG Patterns

RAG is not one architecture — it’s a family. These patterns are the recurring shapes, ordered simple to complex. Match the pattern to the problem and resist climbing higher than you need.

Pattern 1 — Simple Q&A RAG

The baseline. One knowledge source, one retrieval step, one generation step.

Use when: a single, reasonably uniform knowledge base; questions answerable from one retrieval. Strengths: simple, fast, cheap, debuggable. Start here for almost everything — and add complexity only when evaluation proves you need it.

Pattern 2 — Query routing

Different question types need different handling. A router classifies the query first and dispatches it.

Use when: queries are heterogeneous, or some shouldn’t trigger retrieval at all. Watch: the router is a failure point — misroute and the answer is doomed; evaluate it on its own.

Pattern 3 — Multi-source retrieval

One question genuinely needs several sources at once. Retrieve from each in parallel, merge, rerank, then generate.

Use when: complete answers span multiple repositories. Watch: merging needs a reranker so the best chunks survive regardless of source; cost and latency rise with each source.

Pattern 4 — Query transformation RAG

The raw question is a poor search query, so transform it before retrieving — rewrite, expand to multiple queries, or decompose. See Chunking & Retrieval.

Use when: conversational follow-ups (“what about the other one?”), vague questions, or multi-part questions. Watch: an extra LLM call before retrieval — latency for recall.

Pattern 5 — Agentic RAG

Retrieval becomes a tool an agent decides to use. The agent chooses whether to retrieve, from where, judges if results suffice, and can search again.

Use when: unpredictable queries, multi-hop questions, results that need iterative refinement. Watch: multiple LLM calls — the most expensive and slowest pattern, and the hardest to make predictable. Adopt last.

Cross-cutting components

Most production RAG, whatever the pattern, also includes:

Hybrid search (vector + keyword) — so exact terms aren’t lost.
Reranking — retrieve broad, rerank, keep the best few.
Metadata filtering — permissions, recency, tenant isolation.
Caching — for repeated or similar queries.
Citations — every claim traceable to a source.

Choosing a pattern

Situation	Pattern
One knowledge base, direct questions	Simple Q&A RAG
Distinct query types / some need no retrieval	Query routing
Answers span multiple sources	Multi-source retrieval
Vague, conversational, or multi-part queries	Query transformation
Unpredictable or multi-hop, need iteration	Agentic RAG

Key takeaways

RAG is a family of patterns of rising complexity: simple Q&A, query routing, multi-source, query transformation, and agentic RAG. Almost every system should begin with simple Q&A. Most production RAG layers in hybrid search, reranking, metadata filtering, caching, and citations regardless of pattern. Choose by the problem’s actual shape, and move to a more complex pattern only when evaluation proves the simpler one is failing.