[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-how-to-build-a-rag-pipeline-in-5-steps-en":3,"tags-how-to-build-a-rag-pipeline-in-5-steps-en":34,"related-lang-how-to-build-a-rag-pipeline-in-5-steps-en":44,"related-posts-how-to-build-a-rag-pipeline-in-5-steps-en":48,"series-ai-agent-95ec8193-dee3-4ec5-93db-89f285d07612":85},{"id":4,"title":5,"content":6,"summary":7,"source":8,"source_url":9,"author":10,"image_url":11,"keywords":12,"language":18,"translated_content":10,"views":19,"is_premium":20,"created_at":21,"updated_at":21,"cover_image":11,"published_at":22,"rewrite_status":23,"rewrite_error":10,"rewritten_from_id":24,"slug":25,"category":26,"related_article_id":27,"status":28,"google_indexed_at":29,"x_posted_at":10,"tweet_text":10,"title_rewritten_at":10,"title_original":10,"key_takeaways":30,"topic_cluster_id":10,"embedding":10,"is_canonical_seed":20},"95ec8193-dee3-4ec5-93db-89f285d07612","How to Build a RAG Pipeline in 5 Steps","\u003Cp data-speakable=\"summary\">Build a retrieval-augmented generation pipeline that grounds AI answers in your own data.\u003C\u002Fp>\u003Cp>This guide is for developers who want to make an \u003Ca href=\"\u002Ftag\u002Fllm\">LLM\u003C\u002Fa> answer from trusted documents instead of relying only on model memory. After you follow the steps, you will have a working \u003Ca href=\"\u002Ftag\u002Frag\">RAG\u003C\u002Fa> flow that ingests documents, creates embeddings, retrieves relevant chunks, and generates grounded answers.\u003C\u002Fp>\u003Cp>You will also know how to verify each stage so you can debug quality, latency, and freshness before you ship to users.\u003C\u002Fp>\u003Ch2>Before you start\u003C\u002Fh2>\u003Cul>\u003Cli>Node 20+ or Python 3.11+\u003C\u002Fli>\u003Cli>An OpenAI API key or another chat model API key\u003C\u002Fli>\u003Cli>An embedding model account or local embedding runtime\u003C\u002Fli>\u003Cli>A vector database such as Pinecone, Weaviate, Chroma, or pgvector\u003C\u002Fli>\u003Cli>A document source such as PDFs, Markdown files, web pages, or a database\u003C\u002Fli>\u003Cli>Access to the [OpenAI docs](https:\u002F\u002Fplatform.openai.com\u002Fdocs) and the [LangChain GitHub repo](https:\u002F\u002Fgithub.com\u002Flangchain-ai\u002Flangchain) if you want the sample stack used here\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Step 1: Prepare your document corpus\u003C\u002Fh2>\u003Cp>Goal: create a clean source of truth that the retriever can search later.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777959054423-dgs9.png\" alt=\"How to Build a RAG Pipeline in 5 Steps\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>Start by collecting the content you want the model to trust. For a support bot, that may be product docs and FAQs. For a legal or medical tool, use approved internal material only. Convert everything into text, remove duplicates, and split large files into smaller chunks so the retriever can return precise passages rather than huge pages.\u003C\u002Fp>\u003Cpre>\u003Ccode>from langchain_text_splitters import RecursiveCharacterTextSplitter\n\ntext_splitter = RecursiveCharacterTextSplitter(\n    chunk_size=800,\n    chunk_overlap=120,\n)\nchunks = text_splitter.split_text(long_document_text)\nprint(len(chunks))\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>You should see a chunk count greater than 1, and each chunk should read like a coherent paragraph. If the chunks are too large, retrieval becomes noisy; if they are too small, the answer may lose context.\u003C\u002Fp>\u003Ch2>Step 2: Create embeddings for each chunk\u003C\u002Fh2>\u003Cp>Goal: turn text into vectors that capture meaning, not just keywords.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777959061522-d315.png\" alt=\"How to Build a RAG Pipeline in 5 Steps\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>Send every chunk through an embedding model so semantically similar passages end up close together in vector space. This is the foundation of retrieval quality. Keep the same embedding model for both indexing and querying, or the similarity search will become unreliable.\u003C\u002Fp>\u003Cp>Store the output vectors alongside the original chunk text and metadata such as title, URL, section, and timestamp. That metadata helps you trace answers back to the source and filter results by document type or freshness.\u003C\u002Fp>\u003Cp>You should see a fixed-length vector for every chunk, often a list of numbers or a float array. If the embedding step fails, check that your text is not empty and that the model dimensions match what your \u003Ca href=\"\u002Ftag\u002Fvector-database\">vector database\u003C\u002Fa> expects.\u003C\u002Fp>\u003Ch2>Step 3: Index vectors in a vector database\u003C\u002Fh2>\u003Cp>Goal: make your knowledge base searchable by similarity.\u003C\u002Fp>\u003Cp>Load the chunk embeddings into your vector database and create an index optimized for nearest-neighbor search. This lets the system compare a user question against stored content in milliseconds instead of scanning every document. Add filters for document source, language, or tenant if your application serves multiple teams.\u003C\u002Fp>\u003Cp>Example flow: upsert each chunk with its vector, text, and metadata, then confirm the index is ready for queries. If you are using pgvector, create the vector column and similarity index first. If you are using a hosted service, verify the namespace or collection name before you ingest data.\u003C\u002Fp>\u003Cp>You should see the index contain the same number of records as your prepared chunks. A quick test query should return the most relevant passages instead of random text.\u003C\u002Fp>\u003Ch2>Step 4: Retrieve relevant context for a user query\u003C\u002Fh2>\u003Cp>Goal: fetch the best supporting passages before the LLM writes an answer.\u003C\u002Fp>\u003Cp>When a user asks a question, embed the query with the same model, then run similarity search against the vector index. Return the top-k chunks, usually 3 to 8, and optionally rerank them with a cross-encoder or an LLM-based scorer if precision matters more than speed.\u003C\u002Fp>\u003Cp>Keep an eye on retrieval quality. If the top result is only loosely related, improve chunking, add metadata filters, or enrich the document corpus. Retrieval is the most important quality lever in a RAG system because the generator can only ground its answer in what it receives.\u003C\u002Fp>\u003Cp>You should see a short list of passages that clearly match the user intent. If the passages are off-topic, the model will likely produce a weak answer even if the generation step is strong.\u003C\u002Fp>\u003Ch2>Step 5: Augment the prompt and generate the answer\u003C\u002Fh2>\u003Cp>Goal: produce a grounded response that cites the retrieved context.\u003C\u002Fp>\u003Cp>Build a prompt that includes the user question, the retrieved chunks, and clear instructions to answer only from the provided context when possible. Then send that prompt to the LLM. Ask it to say when the context is insufficient instead of inventing facts. This reduces hallucinations and makes the system easier to trust.\u003C\u002Fp>\u003Cpre>\u003Ccode>prompt = f\"\"\"\nUse only the context below to answer the question.\nIf the context is insufficient, say so.\n\nContext:\n{retrieved_context}\n\nQuestion:\n{user_question}\n\"\"\"\n\nresponse = llm.invoke(prompt)\nprint(response.content)\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>You should see an answer that reflects the retrieved passages and, ideally, mentions the source or quotes key facts. Test with a question that is definitely covered by your corpus and one that is not. The first should be accurate, and the second should politely say the system lacks enough context.\u003C\u002Fp>\u003Ch2>Common mistakes\u003C\u002Fh2>\u003Cul>\u003Cli>Using the wrong embedding model for queries and documents. Fix: keep one embedding model for indexing and retrieval, and re-embed the corpus if you change models.\u003C\u002Fli>\u003Cli>Making chunks too large or too small. Fix: start around 500 to 1,000 characters with overlap, then tune based on retrieval results.\u003C\u002Fli>\u003Cli>Skipping freshness updates. Fix: add a scheduled re-index job or incremental updater so new documents and edits reach the vector store.\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>What's next\u003C\u002Fh2>\u003Cp>Once the basic pipeline works, add citations, reranking, cache layers, and evaluation tests for answer faithfulness and retrieval recall. From there, you can extend the same pattern into chat memory, tool use, or domain-specific assistants for support, search, or internal knowledge bases.\u003C\u002Fp>","Build a retrieval-augmented generation pipeline that grounds AI answers in your own data.","www.geeksforgeeks.org","https:\u002F\u002Fwww.geeksforgeeks.org\u002Fnlp\u002Fwhat-is-retrieval-augmented-generation-rag\u002F",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777959054423-dgs9.png",[13,14,15,16,17],"RAG","embeddings","vector database","LangChain","LLM","en",1,false,"2026-05-05T05:30:32.335273+00:00","2026-05-05T05:30:32.322+00:00","done","b650511e-86f0-4505-9a39-e6f94a51f16e","how-to-build-a-rag-pipeline-in-5-steps-en","ai-agent","e133ed69-fb56-495d-96f6-1e14d7ac3242","published","2026-05-05T09:00:17.647+00:00",[31,32,33],"RAG improves answer quality by combining retrieval with generation.","Chunking, embeddings, and vector search are the core plumbing of a RAG system.","Prompt grounding and freshness updates help reduce hallucinations and stale answers.",[35,37,39,41,42],{"name":13,"slug":36},"rag",{"name":16,"slug":38},"langchain",{"name":17,"slug":40},"llm",{"name":14,"slug":14},{"name":15,"slug":43},"vector-database",{"id":27,"slug":45,"title":46,"language":47},"how-to-build-a-rag-pipeline-in-5-steps-zh","5 步完成 RAG 管線","zh",[49,55,61,67,73,79],{"id":50,"slug":51,"title":52,"cover_image":53,"image_url":53,"created_at":54,"category":26},"c5d4bc11-1f4d-438c-b644-a8498826e1ab","claude-agent-dreaming-outcomes-multiagent-en","Claude给Agent加了“做梦”功能","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778868649463-f5qv.png","2026-05-15T18:10:25.29539+00:00",{"id":56,"slug":57,"title":58,"cover_image":59,"image_url":59,"created_at":60,"category":26},"fda44d24-7baf-4d91-a7f9-bbfecae20a27","switch-ai-outputs-markdown-to-html-en","How to Switch AI Outputs from Markdown to HTML","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778743249827-wmsr.png","2026-05-14T07:20:22.631724+00:00",{"id":62,"slug":63,"title":64,"cover_image":65,"image_url":65,"created_at":66,"category":26},"064275f5-4282-47c3-8e4a-60fe8ac99246","anthropic-cat-wu-proactive-ai-assistants-en","Anthropic’s Cat Wu on proactive AI assistants","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778735465548-a92i.png","2026-05-14T05:10:31.723441+00:00",{"id":68,"slug":69,"title":70,"cover_image":71,"image_url":71,"created_at":72,"category":26},"423ac8ad-2886-42a9-8dd8-78e5d43a1574","how-to-run-hermes-agent-on-discord-en","How to Run Hermes Agent on Discord","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778724656141-i30t.png","2026-05-14T02:10:35.727086+00:00",{"id":74,"slug":75,"title":76,"cover_image":77,"image_url":77,"created_at":78,"category":26},"776a562c-99a6-4a6b-93a0-9af40300f3f2","why-ragflow-is-the-right-open-source-rag-engine-to-self-host-en","Why RAGFlow is the right open-source RAG engine to self-host","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778674254587-0pxn.png","2026-05-13T12:10:25.721583+00:00",{"id":80,"slug":81,"title":82,"cover_image":83,"image_url":83,"created_at":84,"category":26},"322ec8bc-61d3-4c80-bb9e-a19941e137c6","how-to-add-temporal-rag-in-production-en","How to Add Temporal RAG in Production","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778667085221-0mox.png","2026-05-13T10:10:31.619892+00:00",[86,91,96,101,106,111,116,121,126,131],{"id":87,"slug":88,"title":89,"created_at":90},"03db8de8-8dc2-4ac1-9cf7-898782efbb1f","anthropic-claude-ai-agent-task-automation-en","Anthropic's Claude AI Agent: A New Era of Task Automation","2026-03-25T16:25:06.513026+00:00",{"id":92,"slug":93,"title":94,"created_at":95},"045d1abc-190d-4594-8c95-91e2a26f0c5a","googles-2026-ai-agent-report-decoded-en","Google’s 2026 AI Agent Report, Decoded","2026-03-26T11:15:23.046616+00:00",{"id":97,"slug":98,"title":99,"created_at":100},"e64aba21-254b-4f93-aa21-837484bb52ec","kimi-k25-review-stronger-still-not-legend-en","Kimi K2.5 review: stronger, still not a legend","2026-03-27T07:15:55.385951+00:00",{"id":102,"slug":103,"title":104,"created_at":105},"30dfb781-a1b2-4add-aebe-b3df40247c37","claude-code-controls-mac-desktop-en","Claude Code now controls your Mac desktop","2026-03-28T03:01:59.384091+00:00",{"id":107,"slug":108,"title":109,"created_at":110},"254405b6-7833-4800-8e13-f5196deefbe6","cloudflare-100x-faster-ai-agent-sandbox-en","Cloudflare’s 100x Faster AI Agent Sandbox","2026-03-28T03:09:44.356437+00:00",{"id":112,"slug":113,"title":114,"created_at":115},"04f29b7f-9b91-4306-89a7-97d725e6e1ba","openai-backs-isara-agent-swarm-bet-en","OpenAI backs Isara’s agent-swarm bet","2026-03-28T03:15:27.849766+00:00",{"id":117,"slug":118,"title":119,"created_at":120},"3b0bf479-e4ae-4703-9666-721a7e0cdb91","openai-plan-automated-ai-researcher-en","OpenAI’s plan for an automated AI researcher","2026-03-28T03:17:42.312819+00:00",{"id":122,"slug":123,"title":124,"created_at":125},"fe91bce0-b85d-4efa-a207-24ae9939c29f","harness-engineering-ai-agent-reliability-2026","Harness Engineering: From Bridle to Operating System, The Missing Link in AI Agent Reliability","2026-03-31T06:36:55.648751+00:00",{"id":127,"slug":128,"title":129,"created_at":130},"67dc66da-ca46-4aa5-970b-e997a39fe109","openai-codex-plugin-claude-code-en","OpenAI puts Codex inside Claude Code","2026-04-01T09:21:55.381386+00:00",{"id":132,"slug":133,"title":134,"created_at":135},"7a09007d-820f-43b3-8607-8ad1bfcb94c8","mcp-explained-from-prompts-to-production-en","MCP Explained: From Prompts to Production","2026-04-01T09:24:40.089177+00:00"]