[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-build-production-rag-with-langchain-in-8-steps-en":3,"article-related-build-production-rag-with-langchain-in-8-steps-en":31,"series-ai-agent-1b25f514-9ed1-4c6f-b9d7-f56eb34033f5":83},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":23,"views":27,"created_at":28,"published_at":29,"topic_cluster_id":30},"1b25f514-9ed1-4c6f-b9d7-f56eb34033f5","build-production-rag-with-langchain-in-8-steps-en","Build Production RAG with LangChain in 8 Steps","\u003Cp data-speakable=\"summary\">Build a production-ready RAG pipeline with \u003Ca href=\"\u002Ftag\u002Flangchain\">LangChain\u003C\u002Fa>, \u003Ca href=\"\u002Fnews\u002F5-turboquant-lessons-for-vector-search-teams-en\">vector search\u003C\u002Fa>, and observability.\u003C\u002Fp>\u003Cp>This guide is for developers who have already tried a basic retrieval-augmented generation app and now need a version that can survive real traffic, real debugging, and real security reviews. By the end, you will have a clear path from document ingestion to indexed retrieval, hybrid search, observability, and a deployable \u003Ca href=\"\u002Ftag\u002Fapi\">API\u003C\u002Fa>.\u003C\u002Fp>\u003Cp>The workflow below follows the same production concerns highlighted in the freeCodeCamp course and points to the core tools on their first mention: \u003Ca href=\"https:\u002F\u002Fpython.langchain.com\u002Fdocs\u002F\" target=\"_blank\" rel=\"noopener noreferrer\">LangChain docs\u003C\u002Fa>, the \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Flangchain-ai\u002Flangchain\" target=\"_blank\" rel=\"noopener noreferrer\">LangChain GitHub repo\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fwww.trychroma.com\u002F\" target=\"_blank\" rel=\"noopener noreferrer\">Chroma\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fchroma-core\u002Fchroma\" target=\"_blank\" rel=\"noopener noreferrer\">Chroma GitHub repo\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fsupabase.com\u002Fdocs\u002Fguides\u002Fdatabase\u002Fextensions\u002Fpgvector\" target=\"_blank\" rel=\"noopener noreferrer\">Supabase pgvector docs\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fdocs.langchain.com\u002Flangsmith\u002F\" target=\"_blank\" rel=\"noopener noreferrer\">LangSmith docs\u003C\u002Fa>, and the \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Flangchain-ai\u002Flanggraph\" target=\"_blank\" rel=\"noopener noreferrer\">LangGraph GitHub repo\u003C\u002Fa>.\u003C\u002Fp>\u003Ch2>Before you start\u003C\u002Fh2>\u003Cul>\u003Cli>Python 3.11+\u003C\u002Fli>\u003Cli>Node.js 20+ if you plan to run a separate frontend or test harness\u003C\u002Fli>\u003Cli>Docker 24+ for local vector database and API services\u003C\u002Fli>\u003Cli>A LangSmith account and API key\u003C\u002Fli>\u003Cli>A Supabase account and project if you want managed Postgres with pgvector\u003C\u002Fli>\u003Cli>OpenAI API key or another embedding and chat model provider key\u003C\u002Fli>\u003Cli>Git 2.40+\u003C\u002Fli>\u003Cli>Basic familiarity with RAG, embeddings, and vector search\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Step 1: Prepare the RAG workspace\u003C\u002Fh2>\u003Cp>Your first goal is to create a clean project that can hold ingestion, retrieval, and API code without turning into a notebook prototype. A production RAG system becomes much easier to debug when document processing, indexing, and serving live in separate modules.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780178601812-0o68.png\" alt=\"Build Production RAG with LangChain in 8 Steps\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cpre>\u003Ccode>mkdir production-rag && cd production-rag\npython -m venv .venv\nsource .venv\u002Fbin\u002Factivate\npip install langchain chromadb fastapi uvicorn langgraph langsmith supabase psycopg[binary] python-dotenv\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>You should see a virtual environment activated and the packages installed without resolver errors. If you run \u003Ccode>python -c \"import langchain, chromadb, fastapi\"\u003C\u002Fcode>, you should get no output and no import failure.\u003C\u002Fp>\u003Ch2>Step 2: Ingest and chunk source documents\u003C\u002Fh2>\u003Cp>Your next outcome is a repeatable document pipeline that turns raw files into retrieval-friendly chunks. This is where you decide loaders, chunk sizes, and metadata fields so later retrieval can trace answers back to source content.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780178601020-lhl5.png\" alt=\"Build Production RAG with LangChain in 8 Steps\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cpre>\u003Ccode>from langchain_community.document_loaders import DirectoryLoader\nfrom langchain_text_splitters import RecursiveCharacterTextSplitter\n\nloader = DirectoryLoader(\".\u002Fdocs\", glob=\"**\u002F*.md\")\ndocs = loader.load()\n\nsplitter = RecursiveCharacterTextSplitter(\n    chunk_size=800,\n    chunk_overlap=120,\n)\nchunks = splitter.split_documents(docs)\nprint(len(docs), len(chunks))\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>You should see the number of chunks exceed the number of source documents. That tells you the splitter is creating smaller retrieval units instead of indexing whole files.\u003C\u002Fp>\u003Ch2>Step 3: Index embeddings in a vector database\u003C\u002Fh2>\u003Cp>Now you want a searchable index that stores embeddings and metadata in a \u003Ca href=\"\u002Ftag\u002Fvector-database\">vector database\u003C\u002Fa>. For local development, Chroma is a simple starting point. For production, Supabase with pgvector gives you a managed Postgres path that is easier to secure and operate.\u003C\u002Fp>\u003Cpre>\u003Ccode>from langchain_openai import OpenAIEmbeddings\nfrom langchain_community.vectorstores import Chroma\n\nემბeddings = OpenAIEmbeddings(model=\"text-embedding-3-small\")\nvectorstore = Chroma.from_documents(\n    documents=chunks,\n    embedding=embeddings,\n    persist_directory=\".\u002Fchroma_db\",\n)\nvectorstore.persist()\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>You should see a local persistence directory created, or a successful insert into your managed pgvector table. A quick similarity search should return semantically related chunks instead of random text.\u003C\u002Fp>\u003Ch2>Step 4: Build retrieval and answer generation\u003C\u002Fh2>\u003Cp>Your goal here is a basic RAG chain that retrieves context, injects it into a prompt, and returns an answer with source-aware grounding. This step proves the core product behavior before you add optimization or orchestration.\u003C\u002Fp>\u003Cpre>\u003Ccode>retriever = vectorstore.as_retriever(search_kwargs={\"k\": 4})\nquery = \"What is hybrid search in RAG?\"\nresults = retriever.invoke(query)\nprint(results[0].page_content[:200])\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>You should see the most relevant chunks for the query. If the retrieved text clearly matches the question, your embedding model, chunking strategy, and index are aligned well enough for the next step.\u003C\u002Fp>\u003Ch2>Step 5: Add observability and debugging traces\u003C\u002Fh2>\u003Cp>At production scale, retrieval failures are often invisible unless you trace them. Your outcome in this step is a visible request trail in LangSmith so you can inspect prompts, retrieved chunks, latency, and bad answers.\u003C\u002Fp>\u003Cpre>\u003Ccode>import os\nos.environ[\"LANGSMITH_TRACING\"] = \"true\"\nos.environ[\"LANGSMITH_API_KEY\"] = \"your-key\"\nos.environ[\"LANGSMITH_PROJECT\"] = \"production-rag\"\n\n# Run your chain once, then inspect the trace in LangSmith.\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>You should see a new run in the LangSmith dashboard with the full chain execution. That trace should show the retriever output and the final model response, which makes debugging much faster than reading logs alone.\u003C\u002Fp>\u003Ch2>Step 6: Secure and serve the RAG API\u003C\u002Fh2>\u003Cp>Now you want a deployable service that exposes retrieval through FastAPI and protects it with a security layer. This is the point where production concerns like auth, input validation, rate limits, and environment-based secrets become non-negotiable.\u003C\u002Fp>\u003Cpre>\u003Ccode>from fastapi import FastAPI, Header, HTTPException\n\napp = FastAPI()\nAPI_TOKEN = os.getenv(\"RAG_API_TOKEN\")\n\n@app.get(\"\u002Fanswer\")\ndef answer(q: str, authorization: str = Header(default=\"\")):\n    if authorization != f\"Bearer {API_TOKEN}\":\n        raise HTTPException(status_code=401, detail=\"Unauthorized\")\n    return {\"query\": q, \"status\": \"ok\"}\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>You should be able to start the server with Uvicorn and get a 401 without the \u003Ca href=\"\u002Ftag\u002Ftoken\">token\u003C\u002Fa>, then a 200 response with the token. That confirms the API is gated before any retrieval work happens.\u003C\u002Fp>\u003Ch2>Step 7: Tune hybrid search and token budget\u003C\u002Fh2>\u003Cp>Your next outcome is a more reliable retriever that balances semantic search, keyword signals, and prompt size. Hybrid search helps when embeddings miss exact terms, while token budgeting prevents context overflow and wasted model spend.\u003C\u002Fp>\u003Cpre>\u003Ccode># Pseudocode pattern\n# 1. Retrieve by vector similarity\n# 2. Retrieve by keyword or BM25\n# 3. Merge and rerank results\n# 4. Keep only the top context that fits the token budget\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>You should see better answers on queries that contain product names, code symbols, or rare terms. If the final prompt stays under your model context limit, your budget logic is working.\u003C\u002Fp>\u003Ch2>Step 8: Orchestrate agentic and multimodal flows\u003C\u002Fh2>\u003Cp>The final outcome is an architecture that can route between standard RAG, self-correcting retrieval, graph-based reasoning, and multimodal document understanding. LangGraph is a strong fit when you need conditional steps, retries, and multi-hop reasoning instead of a single linear chain.\u003C\u002Fp>\u003Cpre>\u003Ccode>from langgraph.graph import StateGraph\n\n# Define nodes for retrieve -> grade -> refine -> answer\n# Add branches for fallback search or multimodal document handling\n# Compile the graph and test each route separately\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>You should see different paths fire for different query types, such as a normal factual question, a multi-hop question, or a document image request. That means the system is ready for advanced production behavior instead of only one retrieval pattern.\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>Metric\u003C\u002Fth>\u003Cth>Before\u002FBaseline\u003C\u002Fth>\u003Cth>After\u002FResult\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>Answer traceability\u003C\u002Ftd>\u003Ctd>Manual log hunting\u003C\u002Ftd>\u003Ctd>LangSmith request traces with retrieved chunks\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Retrieval quality\u003C\u002Ftd>\u003Ctd>Single vector search only\u003C\u002Ftd>\u003Ctd>Hybrid search with keyword plus semantic signals\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Deployment readiness\u003C\u002Ftd>\u003Ctd>Notebook prototype\u003C\u002Ftd>\u003Ctd>FastAPI service with token-based access control\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>Common mistakes\u003C\u002Fh2>\u003Cul>\u003Cli>Using chunk sizes that are too large. Fix it by shrinking chunks to fit retrieval granularity, then re-test with a few real queries.\u003C\u002Fli>\u003Cli>Skipping observability. Fix it by enabling LangSmith traces before tuning prompts so you can see which step fails.\u003C\u002Fli>\u003Cli>Sending every chunk into the prompt. Fix it by applying top-k retrieval, reranking, and a strict token budget.\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>What's next\u003C\u002Fh2>\u003Cp>From here, expand into reranking, evaluation sets, multimodal retrieval, and \u003Ca href=\"\u002Ftag\u002Fagent\">agent\u003C\u002Fa>-based fallbacks so your RAG stack can handle harder questions and noisier data. If you want the full walkthrough, compare your implementation against the freeCodeCamp course sections on scaling, production hosting, security, GraphRAG, and ColPali-style multimodal retrieval.\u003C\u002Fp>","Build a production-ready RAG pipeline with LangChain, vector search, and observability.","www.freecodecamp.org","https:\u002F\u002Fwww.freecodecamp.org\u002Fnews\u002Fproduction-rag-with-langchain-vector-databases\u002F",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780178601812-0o68.png","ai-agent","en","37a5e429-4235-439c-9b05-bb377085462c",[17,18,19,20,21,22],"LangChain","Chroma","pgvector","LangSmith","LangGraph","FastAPI",[24,25,26],"Production RAG needs ingestion, indexing, retrieval, observability, and security as separate concerns.","Hybrid search and token budgeting improve answer quality and control cost in real deployments.","LangSmith and LangGraph help teams debug, route, and scale complex RAG workflows.",2,"2026-05-30T22:02:48.810322+00:00","2026-05-30T22:02:48.783+00:00","c58956f2-0e6f-4be5-b68a-39eda67428b3",{"tags":32,"relatedLang":42,"relatedPosts":46},[33,35,37,39,41],{"name":18,"slug":34},"chroma",{"name":20,"slug":36},"langsmith",{"name":21,"slug":38},"langgraph",{"name":17,"slug":40},"langchain",{"name":19,"slug":19},{"id":15,"slug":43,"title":44,"language":45},"8-steps-build-production-rag-with-langchain-zh","8 步驟打造可上線的 LangChain RAG","zh",[47,53,59,65,71,77],{"id":48,"slug":49,"title":50,"cover_image":51,"image_url":51,"created_at":52,"category":13},"5efa67dd-b9f7-4a2f-8c68-3a4bc6a6b7d9","claude-code-dynamic-workflow-ai-harness-en","Claude Code 动态工作流：AI 自写 Harness","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781035372495-9czj.png","2026-06-09T20:02:22.33375+00:00",{"id":54,"slug":55,"title":56,"cover_image":57,"image_url":57,"created_at":58,"category":13},"2bd28e0e-0f4b-4987-a961-28763c1e1926","agent-orchestration-enterprise-ai-layer-en","Agent orchestration is the missing layer for enterprise AI","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780984981174-08mj.png","2026-06-09T06:02:31.384174+00:00",{"id":60,"slug":61,"title":62,"cover_image":63,"image_url":63,"created_at":64,"category":13},"95684312-23dc-4a78-a917-df14d132c5fa","ai-agents-use-blockchain-trust-layer-en","AI agents use blockchain as a trust layer","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780980506080-ki4s.png","2026-06-09T04:48:01.710214+00:00",{"id":66,"slug":67,"title":68,"cover_image":69,"image_url":69,"created_at":70,"category":13},"0208e47f-7d4c-4473-a0f9-4cd193b5c139","8-rag-patterns-demos-into-prod-en","8 RAG patterns that turn demos into prod","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780971552707-qpl7.png","2026-06-09T02:18:36.760049+00:00",{"id":72,"slug":73,"title":74,"cover_image":75,"image_url":75,"created_at":76,"category":13},"b413d484-6786-4c32-abdc-77f010ac7eba","fine-tuning-beats-rag-style-not-facts-en","Fine-tuning beats RAG when the goal is style, not facts","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780924681800-5xji.png","2026-06-08T13:17:25.701649+00:00",{"id":78,"slug":79,"title":80,"cover_image":81,"image_url":81,"created_at":82,"category":13},"57beb8b4-c233-400f-b95b-a97be1cf9d02","openclaw-small-business-ai-staff-en","OpenClaw shows how small businesses use AI staff","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780904882032-yp13.png","2026-06-08T07:47:27.730921+00:00",[84,89,94,99,104,109,114,119,124,129],{"id":85,"slug":86,"title":87,"created_at":88},"03db8de8-8dc2-4ac1-9cf7-898782efbb1f","anthropic-claude-ai-agent-task-automation-en","Anthropic's Claude AI Agent: A New Era of Task Automation","2026-03-25T16:25:06.513026+00:00",{"id":90,"slug":91,"title":92,"created_at":93},"045d1abc-190d-4594-8c95-91e2a26f0c5a","googles-2026-ai-agent-report-decoded-en","Google’s 2026 AI Agent Report, Decoded","2026-03-26T11:15:23.046616+00:00",{"id":95,"slug":96,"title":97,"created_at":98},"e64aba21-254b-4f93-aa21-837484bb52ec","kimi-k25-review-stronger-still-not-legend-en","Kimi K2.5 review: stronger, still not a legend","2026-03-27T07:15:55.385951+00:00",{"id":100,"slug":101,"title":102,"created_at":103},"30dfb781-a1b2-4add-aebe-b3df40247c37","claude-code-controls-mac-desktop-en","Claude Code now controls your Mac desktop","2026-03-28T03:01:59.384091+00:00",{"id":105,"slug":106,"title":107,"created_at":108},"254405b6-7833-4800-8e13-f5196deefbe6","cloudflare-100x-faster-ai-agent-sandbox-en","Cloudflare’s 100x Faster AI Agent Sandbox","2026-03-28T03:09:44.356437+00:00",{"id":110,"slug":111,"title":112,"created_at":113},"04f29b7f-9b91-4306-89a7-97d725e6e1ba","openai-backs-isara-agent-swarm-bet-en","OpenAI backs Isara’s agent-swarm bet","2026-03-28T03:15:27.849766+00:00",{"id":115,"slug":116,"title":117,"created_at":118},"3b0bf479-e4ae-4703-9666-721a7e0cdb91","openai-plan-automated-ai-researcher-en","OpenAI’s plan for an automated AI researcher","2026-03-28T03:17:42.312819+00:00",{"id":120,"slug":121,"title":122,"created_at":123},"fe91bce0-b85d-4efa-a207-24ae9939c29f","harness-engineering-ai-agent-reliability-2026","Harness Engineering: From Bridle to Operating System, The Missing Link in AI Agent Reliability","2026-03-31T06:36:55.648751+00:00",{"id":125,"slug":126,"title":127,"created_at":128},"7a09007d-820f-43b3-8607-8ad1bfcb94c8","mcp-explained-from-prompts-to-production-en","MCP Explained: From Prompts to Production","2026-04-01T09:24:40.089177+00:00",{"id":130,"slug":131,"title":132,"created_at":133},"116d5ee9-a4f1-4b5a-aac5-5d035dd22bbe","amazon-bedrock-agents-multi-agent-workflows-en","Amazon Bedrock Agents Gets Multi-Agent Workflows","2026-04-01T09:30:30.197685+00:00"]