Skip to content

Retrieval

Retrieval is the core of RAG — finding the most relevant chunks from your knowledge base to include in the LLM prompt. Airbeeps implements a state-of-the-art retrieval pipeline with multiple strategies.

Retrieval pipeline

Query → Transform → Retrieve → Rerank → Return
  1. Transform — Expand or reformulate the query for better recall
  2. Retrieve — Find candidate chunks via dense and/or sparse search
  3. Rerank — Re-score candidates for precision

Each stage is independently configurable.

Query transformation

Transform the user's query before retrieval to improve recall. Configured via RAG_QUERY_TRANSFORM_TYPE.

Multi-query (default)

Generate alternative phrasings to capture different aspects of the query:

  • With LLM: Generates diverse query variants using the chat model
  • Without LLM: Deterministic text manipulations (punctuation removal, sentence fragments)

Results from all queries are merged and deduplicated.

HyDE (Hypothetical Document Embeddings)

Instead of searching with the query, generate a hypothetical answer first, then search for documents similar to that answer. Requires an LLM.

Query: "How do I reset my password?"
  → HyDE generates: "To reset your password, navigate to Settings > Account..."
  → Search uses the hypothetical answer as the embedding query

Step-back prompting

Generate a more general question that addresses the broader concept behind the original query. Useful when queries are too specific.

Query: "What's the default port for the Qdrant vector store?"
  → Step-back: "How is vector store connectivity configured?"

None

Disable query transformation entirely. The original query is used as-is.

Retrieval modes

Uses embedding similarity to find semantically related chunks:

  1. Embed the user query with the same model used for documents
  2. Find nearest neighbors in the vector store
  3. Return top-k chunks above the similarity threshold

Hybrid search (dense + BM25)

Combine semantic search with lexical matching using Reciprocal Rank Fusion (RRF):

  1. Run dense vector search
  2. Run BM25 keyword search on the same corpus
  3. Merge results using RRF score fusion
RRF score = Σ (1 / (k + rank)) for each retriever

The alpha parameter controls the weighting: 0 = sparse only, 1 = dense only, 0.5 = equal weight.

SettingDefaultDescription
RAG_HYBRID_ALPHA0.5Dense vs sparse weight
RAG_BM25_K11.5BM25 term frequency saturation
RAG_BM25_B0.75BM25 document length normalization

Enable/disable: AIRBEEPS_RAG_ENABLE_HYBRID_SEARCH=true

Auto-merging retriever

For hierarchical chunks, the auto-merging retriever can merge child chunks back into parent chunks when enough children are retrieved from the same parent. This provides better context while maintaining retrieval precision.

Reranking

Re-score top candidates using a cross-encoder or other model for improved precision.

Supported rerankers

TypeModelDescription
BGE (default)BAAI/bge-reranker-v2-m3Local cross-encoder, free
Coherererank-english-v3.0API-based, requires Cohere key
ColBERTVariousLate-interaction reranking
Sentence TransformerVariousCross-encoder models via sentence-transformers
EmbeddingLightweight cosine similarity fallback

Ensemble reranking

Combine multiple rerankers using score fusion:

Fusion methodDescription
RRFReciprocal Rank Fusion — good for combining rankings
Weighted averageWeighted average of normalized scores
MaxTake the maximum score from any reranker
SettingDefaultDescription
RAG_ENABLE_RERANKINGtrueEnable reranking
RAG_RERANKER_MODELBAAI/bge-reranker-v2-m3Default reranker model
RAG_RERANKER_TOP_N5Results after reranking

API parameters

ParameterTypeDescriptionDefault
kintNumber of chunks to return5
fetch_kintCandidates to fetch before rerankingk * 3
score_thresholdfloatMinimum similarity score0.0
use_hybridboolEnable hybrid searchfrom settings
use_rerankboolEnable rerankingfrom settings
query_transformstringTransform type: none, hyde, multi_query, step_backfrom settings
rerank_top_kintChunks to rerankRAG_RERANKER_TOP_N
rerank_model_idstringOverride reranker modelRAG_RERANKER_MODEL
hybrid_alphafloatDense vs sparse weight0.5

Choosing a strategy

Simple Q&A

Default settings work well:

yaml
k: 5
use_hybrid: true
use_rerank: true
query_transform: multi_query

Diverse results

Step-back prompting with hybrid search:

yaml
k: 5
query_transform: step_back
use_hybrid: true
use_rerank: true

Maximum recall

HyDE with hybrid search and aggressive reranking:

yaml
k: 8
query_transform: hyde
use_hybrid: true
use_rerank: true
rerank_top_k: 15

Platform defaults

Configure default RAG settings in the Admin UI or via configuration:

  • Retrieval count (k)
  • Similarity threshold
  • Hybrid search toggle
  • Reranking toggle
  • Query transform type
  • Reranker model

Individual assistants can override these settings.

RAG evaluation

Airbeeps includes a RAGAS-based evaluation module for measuring retrieval and generation quality:

MetricDescription
FaithfulnessIs the answer faithful to the retrieved context?
Answer relevancyIs the answer relevant to the question?
Context recallWere all relevant pieces of information retrieved?
Context precisionHow precise is the retrieved context?

When RAGAS is not installed, a lightweight fallback evaluation using word overlap heuristics is used.

Debugging retrieval

Use the search endpoint to test retrieval without chat:

bash
POST /api/v1/rag/knowledge-bases/{kb_id}/search
{
  "query": "your question",
  "k": 5,
  "score_threshold": 0.5
}

Check returned scores and metadata to tune parameters.

MIT License