Retrieval
Retrieval is the core of RAG — finding the most relevant chunks from your knowledge base to include in the LLM prompt. Airbeeps implements a state-of-the-art retrieval pipeline with multiple strategies.
Retrieval pipeline
Query → Transform → Retrieve → Rerank → Return- Transform — Expand or reformulate the query for better recall
- Retrieve — Find candidate chunks via dense and/or sparse search
- Rerank — Re-score candidates for precision
Each stage is independently configurable.
Query transformation
Transform the user's query before retrieval to improve recall. Configured via RAG_QUERY_TRANSFORM_TYPE.
Multi-query (default)
Generate alternative phrasings to capture different aspects of the query:
- With LLM: Generates diverse query variants using the chat model
- Without LLM: Deterministic text manipulations (punctuation removal, sentence fragments)
Results from all queries are merged and deduplicated.
HyDE (Hypothetical Document Embeddings)
Instead of searching with the query, generate a hypothetical answer first, then search for documents similar to that answer. Requires an LLM.
Query: "How do I reset my password?"
→ HyDE generates: "To reset your password, navigate to Settings > Account..."
→ Search uses the hypothetical answer as the embedding queryStep-back prompting
Generate a more general question that addresses the broader concept behind the original query. Useful when queries are too specific.
Query: "What's the default port for the Qdrant vector store?"
→ Step-back: "How is vector store connectivity configured?"None
Disable query transformation entirely. The original query is used as-is.
Retrieval modes
Dense vector search
Uses embedding similarity to find semantically related chunks:
- Embed the user query with the same model used for documents
- Find nearest neighbors in the vector store
- Return top-k chunks above the similarity threshold
Hybrid search (dense + BM25)
Combine semantic search with lexical matching using Reciprocal Rank Fusion (RRF):
- Run dense vector search
- Run BM25 keyword search on the same corpus
- Merge results using RRF score fusion
RRF score = Σ (1 / (k + rank)) for each retrieverThe alpha parameter controls the weighting: 0 = sparse only, 1 = dense only, 0.5 = equal weight.
| Setting | Default | Description |
|---|---|---|
RAG_HYBRID_ALPHA | 0.5 | Dense vs sparse weight |
RAG_BM25_K1 | 1.5 | BM25 term frequency saturation |
RAG_BM25_B | 0.75 | BM25 document length normalization |
Enable/disable: AIRBEEPS_RAG_ENABLE_HYBRID_SEARCH=true
Auto-merging retriever
For hierarchical chunks, the auto-merging retriever can merge child chunks back into parent chunks when enough children are retrieved from the same parent. This provides better context while maintaining retrieval precision.
Reranking
Re-score top candidates using a cross-encoder or other model for improved precision.
Supported rerankers
| Type | Model | Description |
|---|---|---|
| BGE (default) | BAAI/bge-reranker-v2-m3 | Local cross-encoder, free |
| Cohere | rerank-english-v3.0 | API-based, requires Cohere key |
| ColBERT | Various | Late-interaction reranking |
| Sentence Transformer | Various | Cross-encoder models via sentence-transformers |
| Embedding | — | Lightweight cosine similarity fallback |
Ensemble reranking
Combine multiple rerankers using score fusion:
| Fusion method | Description |
|---|---|
| RRF | Reciprocal Rank Fusion — good for combining rankings |
| Weighted average | Weighted average of normalized scores |
| Max | Take the maximum score from any reranker |
| Setting | Default | Description |
|---|---|---|
RAG_ENABLE_RERANKING | true | Enable reranking |
RAG_RERANKER_MODEL | BAAI/bge-reranker-v2-m3 | Default reranker model |
RAG_RERANKER_TOP_N | 5 | Results after reranking |
API parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
k | int | Number of chunks to return | 5 |
fetch_k | int | Candidates to fetch before reranking | k * 3 |
score_threshold | float | Minimum similarity score | 0.0 |
use_hybrid | bool | Enable hybrid search | from settings |
use_rerank | bool | Enable reranking | from settings |
query_transform | string | Transform type: none, hyde, multi_query, step_back | from settings |
rerank_top_k | int | Chunks to rerank | RAG_RERANKER_TOP_N |
rerank_model_id | string | Override reranker model | RAG_RERANKER_MODEL |
hybrid_alpha | float | Dense vs sparse weight | 0.5 |
Choosing a strategy
Simple Q&A
Default settings work well:
k: 5
use_hybrid: true
use_rerank: true
query_transform: multi_queryDiverse results
Step-back prompting with hybrid search:
k: 5
query_transform: step_back
use_hybrid: true
use_rerank: trueMaximum recall
HyDE with hybrid search and aggressive reranking:
k: 8
query_transform: hyde
use_hybrid: true
use_rerank: true
rerank_top_k: 15Platform defaults
Configure default RAG settings in the Admin UI or via configuration:
- Retrieval count (
k) - Similarity threshold
- Hybrid search toggle
- Reranking toggle
- Query transform type
- Reranker model
Individual assistants can override these settings.
RAG evaluation
Airbeeps includes a RAGAS-based evaluation module for measuring retrieval and generation quality:
| Metric | Description |
|---|---|
| Faithfulness | Is the answer faithful to the retrieved context? |
| Answer relevancy | Is the answer relevant to the question? |
| Context recall | Were all relevant pieces of information retrieved? |
| Context precision | How precise is the retrieved context? |
When RAGAS is not installed, a lightweight fallback evaluation using word overlap heuristics is used.
Debugging retrieval
Use the search endpoint to test retrieval without chat:
POST /api/v1/rag/knowledge-bases/{kb_id}/search
{
"query": "your question",
"k": 5,
"score_threshold": 0.5
}Check returned scores and metadata to tune parameters.