Building a Production ready RAG Pipeline: TF-IDF, HNSW, LSH, CAG, guardrails and More

less than 1 minute read

Published:

TL;DR

This post outlines a potentially effective approach to user queries by implementing a Retrieval-Augmented Generation (RAG) strategy, and 10-guardrail safety system.. The proposed solutions involve utilizing Cache-Augmented Generation alongside Context Engineering, Semantic Search, Embeddings, Chunking, Page Indexing, a Web Chat User Interface, and large language models such as Olama and Gemeni. Additionally, it incorporates Hugging Face’s Chain and the MCP Server for Claude Desktop.

Standard RAG Pipeline

  • Raw Documents → /data/
  • Agentic Chunking + TF-IDF (semantic boundaries + vocabulary scoring)
  • Sentence Transformers — BGE model, dim=384, normalize
  • ChromaDB + HNSW + LSH — O(log n) ANN with the layer graph visualised
  • CAG (Redis) + Context Engine (7 steps) + LLM (Gemini/Ollama)

References are available at: