Qdrant
Quick Summary
DeepEval allows you to evaluate your Qdrant retriever and optimize retrieval hyperparameters like top-K
, embedding model
, and distance (similarity) function
.
To get started, install Qdrant’s Python client using the following command:
pip install qdrant-client
Qdrant is a high-performance and scalable vector database optimized for semantic search and retrieval in RAG applications. It efficiently handles high-dimensional embeddings and fast similarity searches, making it a powerful choice for AI-driven applications. Learn more about Qdrant here.
This diagram illustrates how the Qdrant retriever fits into your RAG pipeline.

Setup Qdrant
To get started, connect to your local or cloud-hosted Qdrant instance.
import qdrant_client
import os
client = qdrant_client.QdrantClient(
url="http://localhost:6333" # Change this if using Qdrant Cloud
)
Next, create a Qdrant collection with the appropriate vector configuration. The collection will store document embeddings as vectors
and their corresponding text as metadata.
# Define collection name
collection_name = "documents"
# Create collection if it doesn't exist
if collection_name not in [col.name for col in client.get_collections().collections]:
client.create_collection(
collection_name=collection_name,
vectors_config=qdrant_client.http.models.VectorParams(
size=384, # Vector dimensionality
distance="cosine" # Similarity function
),
)
Finally, define an embedding model to convert your document chunks into vectors before indexing them in Qdrant.
# Load an embedding model
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
# Example document chunks
document_chunks = [
"Qdrant is an efficient vector database.",
"RAG improves AI-generated responses with retrieved context.",
"Vector search enables high-precision semantic retrieval.",
...
]
# Store chunks with embeddings
for i, chunk in enumerate(document_chunks):
embedding = model.encode(chunk).tolist() # Convert text to vector
client.upsert(
collection_name=collection_name,
points=[
qdrant_client.http.models.PointStruct(
id=i, vector=embedding, payload={"text": chunk}
)
]
)
To use Qdrant as part of your RAG pipeline, retrieve relevant contexts using similarity search and insert them into your prompt template. This ensures your model has the necessary context to generate accurate and informed responses.
Evaluating Qdrant Retrieval
Evaluating your Qdrant retriever consists of 2 steps:
- Preparing an
input
query along with the expected LLM response, and using theinput
to generate a response from your RAG pipeline to create anLLMTestCase
containing the input, actual output, expected output, and retrieval context. - Evaluating the test case using a selection of retrieval metrics.
An LLMTestCase
allows you to create unit tests for your LLM applications, helping you identify specific weaknesses in your RAG application.
Preparing your Test Case
Since the first step in generating a response from your RAG pipeline is retrieving the relevant retrieval_context
from your Qdrant collection, first perform this retrieval for your input
query.
def search(query, top_k=3):
query_embedding = model.encode(query).tolist()
search_results = client.search(
collection_name=collection_name,
query_vector=query_embedding,
limit=top_k # Retrieve the top K most similar results
)
return [hit.payload["text"] for hit in search_results] if search_results else None
query = "How does Qdrant work?"
retrieval_context = search(query)
Next, pass the retrieved context into your LLM's prompt template to generate a response.
prompt = """
Answer the user question based on the supporting context
User Question:
{input}
Supporting Context:
{retrieval_context}
"""
actual_output = generate(prompt) # hypothetical function, replace with your own LLM
print(actual_output)
Let's examine the actual_output
generated by our RAG pipeline:
Qdrant is a scalable vector database optimized for high-performance semantic retrieval.
Finally, create an LLMTestCase
using the input and expected output you prepared, along with the actual output and retrieval context you generated.
from deepeval.test_case import LLMTestCase
test_case = LLMTestCase(
input=input,
actual_output=actual_output,
retrieval_context=retrieval_context,
expected_output="Qdrant is a powerful vector database optimized for semantic search and retrieval.",
)
Running Evaluations
To run evaluations on the LLMTestCase
, we first need to define relevant deepeval
metrics to evaluate the Qdrant retriever: contextual recall, contextual precision, and contextual relevancy.
These contextual metrics help assess your retriever. For more retriever evaluation details, check out this guide.
from deepeval.metrics import (
ContextualRecallMetric,
ContextualPrecisionMetric,
ContextualRelevancyMetric,
)
contextual_recall = ContextualRecallMetric(),
contextual_precision = ContextualPrecisionMetric()
contextual_relevancy = ontextualRelevancyMetric()
Finally, pass the test case and metrics into the evaluate
function to begin the evaluation.
from deepeval import evaluate
evaluate(
[test_case],
metrics=[contextual_recall, contextual_precision, contextual_relevancy]
)
Improving Qdrant Retrieval
Below is a table outlining the hypothetical metric scores for your evaluation run.
Metric | Score |
---|---|
Contextual Precision | 0.85 |
Contextual Recall | 0.92 |
Contextual Relevancy | 0.44 |
Each contextual metric evaluates a specific hyperparameter. To learn more about this, read this guide on RAG evaluation.
To improve your Qdrant retriever, you'll need to experiment with various hyperparameters and prepare LLMTestCase
s using generations from different retriever versions.
Ultimately, analyzing improvements and regressions in contextual metric scores (the three metrics defined above) will help you determine the optimal hyperparameter combination for your Qdrant retriever.
For a more detailed guide on tuning your retriever’s hyperparameters, check out this guide.