Building a Production ready RAG Pipeline: TF-IDF, HNSW, LSH, CAG, guardrails and More
Published:
TL;DR
This post outlines a potentially effective approach to user queries by implementing a Retrieval-Augmented Generation (RAG) strategy, and 10-guardrail safety system.. The proposed solutions involve utilizing Cache-Augmented Generation alongside Context Engineering, Semantic Search, Embeddings, Chunking, Page Indexing, a Web Chat User Interface, and large language models such as Olama and Gemeni. Additionally, it incorporates Hugging Face’s Chain and the MCP Server for Claude Desktop.
Standard RAG Pipeline
- Raw Documents → /data/
- Agentic Chunking + TF-IDF (semantic boundaries + vocabulary scoring)
- Sentence Transformers — BGE model, dim=384, normalize
- ChromaDB + HNSW + LSH — O(log n) ANN with the layer graph visualised
- CAG (Redis) + Context Engine (7 steps) + LLM (Gemini/Ollama)
References are available at: