Building a Production ready Agentic RAG Pipeline: TF-IDF, HNSW, LSH, CAG, guardrails and More
Published:
TL;DR
This post outlines a potentially effective approach to user queries by implementing a Agentic Retrieval-Augmented Generation (RAG) strategy, and 10-guardrail safety system.. The proposed solutions involve utilizing Cache-Augmented Generation alongside Context Engineering, Semantic Search, Embeddings, Chunking, Page Indexing, a Web Chat User Interface, and large language models such as Olama and Gemeni. Additionally, it incorporates Hugging Face’s Chain and the MCP Server for Claude Desktop.
Agentic RAG Pipeline
- User Query → Agent Brain (LLM as orchestrator)
- 5-tool belt: search_chunks, search_pages, filter_by_file, summarise_doc, answer
- ToolExecutor → ChromaDB → Observation
- Self-Reflection: “Enough? YES → answer / NO → loop back”
- Loop-back arrow on the right showing max 5 iterations
- Real example trace at the bottom showing a 3-iteration comparison query
References are available at: