---
title: Implement RAG Pattern Correctly
impact: HIGH
impactDescription: Proper RAG implementation improves LLM response quality
tags: vector, rag, llm, embeddings, retrieval
description: Implement RAG Pattern Correctly
alwaysApply: true
---

## Implement RAG Pattern Correctly

Store documents with embeddings, retrieve relevant context, and pass to LLM.

**Correct:** Full RAG pipeline with RedisVL.

```python
from redisvl.index import SearchIndex
from redisvl.query import VectorQuery

# 1. Store documents with embeddings
records = []
for doc in documents:
    records.append({
        "content": doc["content"],
        "embedding": embed_model.encode(doc["content"]).tolist(),
        "source": doc["source"]
    })

index.load(records)

# 2. Query with vector similarity
query_embedding = embed_model.encode(user_question)
results = index.query(VectorQuery(
    vector=query_embedding,
    vector_field_name="embedding",
    return_fields=["content", "source"],
    num_results=5
))

# 3. Pass context to LLM
context = "\n".join([r["content"] for r in results])
response = llm.generate(f"Context: {context}\n\nQuestion: {user_question}")
```

**Best practices:**
- Match your distance metric to your embedding model; many modern text embeddings already work well with COSINE
- Batch inserts using `index.load()` with lists
- Set appropriate M and EF_CONSTRUCTION for HNSW based on dataset size
- Use filters to reduce the search space before vector comparison
- Consider chunking long documents for better retrieval

Reference: [Redis RAG Quickstart](https://redis.io/docs/latest/develop/get-started/rag/)