Retrieval

Retrieval is the core of RAG — finding the most relevant chunks from your knowledge base to include in the LLM prompt. Airbeeps implements a state-of-the-art retrieval pipeline with multiple strategies.

Retrieval pipeline

Query → Transform → Retrieve → Rerank → Return

Transform — Expand or reformulate the query for better recall
Retrieve — Find candidate chunks via dense and/or sparse search
Rerank — Re-score candidates for precision

Each stage is independently configurable.

Query transformation

Transform the user's query before retrieval to improve recall. Configured via RAG_QUERY_TRANSFORM_TYPE.

Multi-query (default)

Generate alternative phrasings to capture different aspects of the query:

With LLM: Generates diverse query variants using the chat model
Without LLM: Deterministic text manipulations (punctuation removal, sentence fragments)

Results from all queries are merged and deduplicated.

HyDE (Hypothetical Document Embeddings)

Instead of searching with the query, generate a hypothetical answer first, then search for documents similar to that answer. Requires an LLM.

Query: "How do I reset my password?"
  → HyDE generates: "To reset your password, navigate to Settings > Account..."
  → Search uses the hypothetical answer as the embedding query

Step-back prompting

Generate a more general question that addresses the broader concept behind the original query. Useful when queries are too specific.

Query: "What's the default port for the Qdrant vector store?"
  → Step-back: "How is vector store connectivity configured?"

None

Disable query transformation entirely. The original query is used as-is.

Retrieval modes

Dense vector search

Uses embedding similarity to find semantically related chunks:

Embed the user query with the same model used for documents
Find nearest neighbors in the vector store
Return top-k chunks above the similarity threshold

Hybrid search (dense + BM25)

Combine semantic search with lexical matching using Reciprocal Rank Fusion (RRF):

Run dense vector search
Run BM25 keyword search on the same corpus
Merge results using RRF score fusion

RRF score = Σ (1 / (k + rank)) for each retriever

The alpha parameter controls the weighting: 0 = sparse only, 1 = dense only, 0.5 = equal weight.

Setting	Default	Description
`RAG_HYBRID_ALPHA`	`0.5`	Dense vs sparse weight
`RAG_BM25_K1`	`1.5`	BM25 term frequency saturation
`RAG_BM25_B`	`0.75`	BM25 document length normalization

Enable/disable: AIRBEEPS_RAG_ENABLE_HYBRID_SEARCH=true

Auto-merging retriever

For hierarchical chunks, the auto-merging retriever can merge child chunks back into parent chunks when enough children are retrieved from the same parent. This provides better context while maintaining retrieval precision.

Reranking

Re-score top candidates using a cross-encoder or other model for improved precision.

Supported rerankers

Type	Model	Description
BGE (default)	`BAAI/bge-reranker-v2-m3`	Local cross-encoder, free
Cohere	`rerank-english-v3.0`	API-based, requires Cohere key
ColBERT	Various	Late-interaction reranking
Sentence Transformer	Various	Cross-encoder models via sentence-transformers
Embedding	—	Lightweight cosine similarity fallback

Ensemble reranking

Combine multiple rerankers using score fusion:

Fusion method	Description
RRF	Reciprocal Rank Fusion — good for combining rankings
Weighted average	Weighted average of normalized scores
Max	Take the maximum score from any reranker

Setting	Default	Description
`RAG_ENABLE_RERANKING`	`true`	Enable reranking
`RAG_RERANKER_MODEL`	`BAAI/bge-reranker-v2-m3`	Default reranker model
`RAG_RERANKER_TOP_N`	`5`	Results after reranking

API parameters

Parameter	Type	Description	Default
`k`	int	Number of chunks to return	`5`
`fetch_k`	int	Candidates to fetch before reranking	`k * 3`
`score_threshold`	float	Minimum similarity score	`0.0`
`use_hybrid`	bool	Enable hybrid search	from settings
`use_rerank`	bool	Enable reranking	from settings
`query_transform`	string	Transform type: `none`, `hyde`, `multi_query`, `step_back`	from settings
`rerank_top_k`	int	Chunks to rerank	`RAG_RERANKER_TOP_N`
`rerank_model_id`	string	Override reranker model	`RAG_RERANKER_MODEL`
`hybrid_alpha`	float	Dense vs sparse weight	`0.5`

Choosing a strategy

Simple Q&A

Default settings work well:

yaml

k: 5
use_hybrid: true
use_rerank: true
query_transform: multi_query

Diverse results

Step-back prompting with hybrid search:

yaml

k: 5
query_transform: step_back
use_hybrid: true
use_rerank: true

Maximum recall

HyDE with hybrid search and aggressive reranking:

yaml

k: 8
query_transform: hyde
use_hybrid: true
use_rerank: true
rerank_top_k: 15

Platform defaults

Configure default RAG settings in the Admin UI or via configuration:

Retrieval count (k)
Similarity threshold
Hybrid search toggle
Reranking toggle
Query transform type
Reranker model

Individual assistants can override these settings.

RAG evaluation

Airbeeps includes a RAGAS-based evaluation module for measuring retrieval and generation quality:

Metric	Description
Faithfulness	Is the answer faithful to the retrieved context?
Answer relevancy	Is the answer relevant to the question?
Context recall	Were all relevant pieces of information retrieved?
Context precision	How precise is the retrieved context?

When RAGAS is not installed, a lightweight fallback evaluation using word overlap heuristics is used.

Debugging retrieval

Use the search endpoint to test retrieval without chat:

bash

POST /api/v1/rag/knowledge-bases/{kb_id}/search
{
  "query": "your question",
  "k": 5,
  "score_threshold": 0.5
}

Check returned scores and metadata to tune parameters.

Retrieval ​

Retrieval pipeline ​

Query transformation ​

Multi-query (default) ​

HyDE (Hypothetical Document Embeddings) ​

Step-back prompting ​

None ​

Retrieval modes ​

Dense vector search ​

Hybrid search (dense + BM25) ​

Auto-merging retriever ​

Reranking ​

Supported rerankers ​

Ensemble reranking ​

API parameters ​

Choosing a strategy ​

Simple Q&A ​

Diverse results ​

Maximum recall ​

Platform defaults ​

RAG evaluation ​

Debugging retrieval ​

Retrieval

Retrieval pipeline

Query transformation

Multi-query (default)

HyDE (Hypothetical Document Embeddings)

Step-back prompting

None

Retrieval modes

Dense vector search

Hybrid search (dense + BM25)

Auto-merging retriever

Reranking

Supported rerankers

Ensemble reranking

API parameters

Choosing a strategy

Simple Q&A

Diverse results

Maximum recall

Platform defaults

RAG evaluation

Debugging retrieval