Building a Production ready Agentic RAG Pipeline: TF-IDF, HNSW, LSH, CAG, guardrails and More

less than 1 minute read

Published:

TL;DR

This post outlines a potentially effective approach to user queries by implementing a Agentic Retrieval-Augmented Generation (RAG) strategy, and 10-guardrail safety system.. The proposed solutions involve utilizing Cache-Augmented Generation alongside Context Engineering, Semantic Search, Embeddings, Chunking, Page Indexing, a Web Chat User Interface, and large language models such as Olama and Gemeni. Additionally, it incorporates Hugging Face’s Chain and the MCP Server for Claude Desktop.

Agentic RAG Pipeline

  • User Query → Agent Brain (LLM as orchestrator)
  • 5-tool belt: search_chunks, search_pages, filter_by_file, summarise_doc, answer
  • ToolExecutor → ChromaDB → Observation
  • Self-Reflection: “Enough? YES → answer / NO → loop back”
  • Loop-back arrow on the right showing max 5 iterations
  • Real example trace at the bottom showing a 3-iteration comparison query

References are available at: