What is Retrieval-augmented generation (RAG)?

Retrieval-augmented generation (RAG)

RAG is a technique that retrieves relevant documents at query time and adds them to the prompt so the model answers from up-to-date, specific knowledge.

Retrieval-augmented generation (RAG) connects a language model to an external knowledge source. At query time, the system embeds the question, retrieves the most relevant chunks from a vector store or search index, and includes them in the prompt so the model grounds its answer in real data.

RAG is the default way to give a model knowledge it wasn’t trained on — internal docs, current facts, customer-specific data — without retraining. Update the index and the answers update; no model training required.

Good RAG lives and dies on retrieval quality and citation. Poor chunking or a weak retriever produces confident, wrong answers; strong retrieval with citations dramatically reduces hallucination.