RAG & Retrieval

Retrieval-Augmented Generation adds retrieval latency to your LLM pipeline. These guides help you optimize every stage of the retrieval process for responsive RAG applications.

In This Section

Vector Database Latency Understand and optimize vector search performance.

Vector Search Optimization Tune ANN indexes for the right speed-accuracy tradeoff.

Hybrid Search Combine semantic and keyword search for better retrieval.

Context Management Efficiently manage context windows for optimal performance.

Embedding Selection Choose embeddings that balance quality and speed.

Multi-Stage Retrieval Design efficient retrieval pipelines with reranking.