[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-how-to-build-milvus-rag-stack-in-13-steps-en":3,"article-related-how-to-build-milvus-rag-stack-in-13-steps-en":31,"series-tools-5cdb0497-b424-4a06-a985-9f4543c4db36":78},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":23,"views":27,"created_at":28,"published_at":29,"topic_cluster_id":30},"5cdb0497-b424-4a06-a985-9f4543c4db36","how-to-build-milvus-rag-stack-in-13-steps-en","How to Build a Milvus RAG Stack in 13 Steps","\u003Cp data-speakable=\"summary\">Build a production-ready Milvus \u003Ca href=\"\u002Ftag\u002Frag\">RAG\u003C\u002Fa> stack with embeddings, hybrid search, and \u003Ca href=\"\u002Ftag\u002Flangchain\">LangChain\u003C\u002Fa>.\u003C\u002Fp>\u003Cp>This guide is for developers who want to turn a clean Python project into a working retrieval-augmented generation stack on Milvus. After following the steps, you will have a local Milvus 2.6 environment, a document collection with vector and metadata fields, a hybrid search path, and a RAG chain you can adapt for production.\u003C\u002Fp>\u003Cp>You will also know how to verify each stage, from server connection to indexed search, so you can catch setup issues early and ship with confidence.\u003C\u002Fp>\u003Ch2>Before you start\u003C\u002Fh2>\u003Cul>\u003Cli>Python 3.10, 3.11, or 3.12\u003C\u002Fli>\u003Cli>Docker Engine 24.x or later with Compose V2\u003C\u002Fli>\u003Cli>8 GB RAM minimum, 16 GB recommended\u003C\u002Fli>\u003Cli>20 GB free disk space\u003C\u002Fli>\u003Cli>Milvus 2.6.x server image, such as \u003Ca href=\"https:\u002F\u002Fmilvus.io\u002Fdocs\" target=\"_blank\" rel=\"noopener noreferrer\">Milvus docs\u003C\u002Fa> and \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fmilvus-io\u002Fmilvus\" target=\"_blank\" rel=\"noopener noreferrer\">milvus-io\u002Fmilvus\u003C\u002Fa>\u003C\u002Fli>\u003Cli>pymilvus 2.6.x\u003C\u002Fli>\u003Cli>sentence-transformers latest stable\u003C\u002Fli>\u003Cli>LangChain 0.3+\u003C\u002Fli>\u003Cli>An OpenAI API key if you plan to use a hosted LLM for generation\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Step 1: Create the Python project\u003C\u002Fh2>\u003Cp>Goal: ship a clean, isolated workspace with the exact client libraries needed for Milvus RAG.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779511587286-dsm4.png\" alt=\"How to Build a Milvus RAG Stack in 13 Steps\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>Start with a virtual environment, upgrade packaging tools, and install the core dependencies for Milvus, embeddings, and orchestration.\u003C\u002Fp>\u003Cpre>\u003Ccode>mkdir milvus-rag-tutorial\ncd milvus-rag-tutorial\npython3.11 -m venv .venv\nsource .venv\u002Fbin\u002Factivate\npython -m pip install --upgrade pip wheel\npip install pymilvus sentence-transformers langchain langchain-community langchain-openai tiktoken python-dotenv\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>Verification: run \u003Ccode>python -c \"import pymilvus; print(pymilvus.__version__)\"\u003C\u002Fcode> and you should see a 2.6.x version string.\u003C\u002Fp>\u003Ch2>Step 2: Start Milvus Standalone with Docker Compose\u003C\u002Fh2>\u003Cp>Goal: launch a local Milvus server that behaves close to production without needing a cluster.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779511584901-2x6v.png\" alt=\"How to Build a Milvus RAG Stack in 13 Steps\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>Create a \u003Ca href=\"\u002Ftag\u002Fdocker\">Docker\u003C\u002Fa> Compose file with Milvus, etcd, and MinIO, then pin the Milvus image to a specific patch release so your setup stays reproducible.\u003C\u002Fp>\u003Cpre>\u003Ccode>docker compose up -d\ndocker compose logs -f standalone | head -60\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>Verification: you should see messages like \u003Ccode>Proxy successfully started, listen on: [::]:19530\u003C\u002Fcode> and the container should stay healthy.\u003C\u002Fp>\u003Ch2>Step 3: Connect the client to Milvus\u003C\u002Fh2>\u003Cp>Goal: confirm the Python client can reach the server and read its version.\u003C\u002Fp>\u003Cp>Create a small connection helper with \u003Ccode>MilvusClient\u003C\u002Fcode>, then point it at \u003Ccode>http:\u002F\u002Flocalhost:19530\u003C\u002Fcode> with the default root \u003Ca href=\"\u002Ftag\u002Ftoken\">token\u003C\u002Fa> for local testing.\u003C\u002Fp>\u003Cpre>\u003Ccode>from pymilvus import MilvusClient\n\nclient = MilvusClient(uri=\"http:\u002F\u002Flocalhost:19530\", token=\"root:Milvus\")\nprint(client.get_server_version())\nprint(client.list_collections())\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>Verification: you should see the Milvus server version and an empty collection list.\u003C\u002Fp>\u003Ch2>Step 4: Define the RAG collection schema\u003C\u002Fh2>\u003Cp>Goal: create a collection that can store document text, embeddings, and metadata for filtering.\u003C\u002Fp>\u003Cp>Design fields for an ID, chunk text, a dense vector, and metadata such as source, title, or tenant. Keep the schema stable so it can move from local development to production without redesign.\u003C\u002Fp>\u003Cpre>\u003Ccode>from pymilvus import MilvusClient\n\nclient = MilvusClient(uri=\"http:\u002F\u002Flocalhost:19530\", token=\"root:Milvus\")\nclient.create_collection(\n    collection_name=\"rag_docs\",\n    dimension=768,\n    primary_field_name=\"id\",\n    vector_field_name=\"embedding\",\n    id_type=\"int64\",\n    metric_type=\"COSINE\",\n)\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>Verification: listing collections should now show \u003Ccode>rag_docs\u003C\u002Fcode>.\u003C\u002Fp>\u003Ch2>Step 5: Generate embeddings for your documents\u003C\u002Fh2>\u003Cp>Goal: turn raw text into dense vectors that Milvus can index and search.\u003C\u002Fp>\u003Cp>Load a sentence-transformers model, embed your chunks, and keep the model choice consistent across ingestion and query time.\u003C\u002Fp>\u003Cpre>\u003Ccode>from sentence_transformers import SentenceTransformer\n\nmodel = SentenceTransformer(\"all-MiniLM-L6-v2\")\nvectors = model.encode([\"Milvus is a vector database.\", \"RAG uses retrieval before generation.\"])\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>Verification: you should get one vector per input string, and each vector should have the same dimension as your collection schema.\u003C\u002Fp>\u003Ch2>Step 6: Chunk and ingest a document corpus\u003C\u002Fh2>\u003Cp>Goal: load real content into Milvus so search results come from your own data.\u003C\u002Fp>\u003Cp>Split documents into overlapping chunks, attach metadata, generate embeddings, and insert the rows into the collection in batches.\u003C\u002Fp>\u003Cpre>\u003Ccode>rows = [\n  {\"id\": 1, \"text\": \"...\", \"embedding\": vectors[0], \"source\": \"docs\", \"title\": \"Intro\"}\n]\nclient.insert(collection_name=\"rag_docs\", data=rows)\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>Verification: the insert call should return inserted IDs, and the collection row count should increase.\u003C\u002Fp>\u003Ch2>Step 7: Create and load the vector index\u003C\u002Fh2>\u003Cp>Goal: make search fast by building an ANN index on the embedding field.\u003C\u002Fp>\u003Cp>Start with AUTOINDEX for simplicity, or choose HNSW, IVF_FLAT, or DISKANN when you need explicit control over latency and memory tradeoffs.\u003C\u002Fp>\u003Cpre>\u003Ccode>client.create_index(\n    collection_name=\"rag_docs\",\n    field_name=\"embedding\",\n    index_params={\"index_type\": \"AUTOINDEX\", \"metric_type\": \"COSINE\"},\n)\nclient.load_collection(collection_name=\"rag_docs\")\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>Verification: the collection should report as loaded, and search requests should no longer fail with an unloaded-collection error.\u003C\u002Fp>\u003Ch2>Step 8: Run your first semantic search\u003C\u002Fh2>\u003Cp>Goal: confirm the retrieval layer returns relevant chunks for a natural-language query.\u003C\u002Fp>\u003Cp>Encode the query with the same embedding model, then search the collection and inspect the top results and scores.\u003C\u002Fp>\u003Cpre>\u003Ccode>query_vec = model.encode([\"What is Milvus used for?\"])[0]\nresults = client.search(\n    collection_name=\"rag_docs\",\n    data=[query_vec],\n    limit=5,\n    output_fields=[\"text\", \"title\", \"source\"],\n)\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>Verification: you should see the most relevant chunks near the top, with scores that make sense for your metric type.\u003C\u002Fp>\u003Ch2>Step 9: Add metadata filtering for hybrid retrieval\u003C\u002Fh2>\u003Cp>Goal: narrow results to the right document set before generation.\u003C\u002Fp>\u003Cp>Use scalar fields such as \u003Ccode>source\u003C\u002Fcode>, \u003Ccode>tenant\u003C\u002Fcode>, or \u003Ccode>category\u003C\u002Fcode> to filter the candidate set and reduce noise in the final answer.\u003C\u002Fp>\u003Cpre>\u003Ccode>results = client.search(\n    collection_name=\"rag_docs\",\n    data=[query_vec],\n    filter='source == \"docs\"',\n    limit=5,\n    output_fields=[\"text\", \"title\", \"source\"],\n)\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>Verification: the returned rows should only match the filter you set, which confirms the metadata path is working.\u003C\u002Fp>\u003Ch2>Step 10: Wire Milvus into a LangChain RAG chain\u003C\u002Fh2>\u003Cp>Goal: connect retrieval results to a prompt and \u003Ca href=\"\u002Ftag\u002Fllm\">LLM\u003C\u002Fa> so the system can answer questions end to end.\u003C\u002Fp>\u003Cp>Wrap the Milvus search call in a retriever interface, pass the retrieved context into your prompt, and generate the final answer with your model of choice.\u003C\u002Fp>\u003Cpre>\u003Ccode># Pseudocode pattern\n# query -> Milvus search -> top chunks -> prompt template -> LLM answer\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>Verification: ask a question about your corpus and you should get a grounded answer that cites the retrieved content.\u003C\u002Fp>\u003Ch2>Step 11: Add BM25 hybrid search for rare terms\u003C\u002Fh2>\u003Cp>Goal: improve recall when users search for exact names, codes, or uncommon words.\u003C\u002Fp>\u003Cp>Combine dense vectors with sparse retrieval so semantic search and keyword search work together instead of competing.\u003C\u002Fp>\u003Cpre>\u003Ccode># Configure sparse retrieval alongside dense retrieval\n# Then merge or rerank results before generation\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>Verification: queries with rare tokens should return better matches than dense search alone.\u003C\u002Fp>\u003Ch2>Step 12: Benchmark and tune the index\u003C\u002Fh2>\u003Cp>Goal: choose the right index settings for your latency, memory, and recall target.\u003C\u002Fp>\u003Cp>Test HNSW for low-latency in-memory search, IVF_FLAT for balanced performance, DISKANN for large disk-backed collections, and AUTOINDEX when you want Milvus to choose the \u003Ca href=\"\u002Fnews\u002Fwhy-llama-cpp-should-treat-turboquant-as-default-en\">default path\u003C\u002Fa>.\u003C\u002Fp>\u003Cp>Verification: after tuning, you should see faster search or lower memory use without a major drop in result quality.\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>Metric\u003C\u002Fth>\u003Cth>Before\u002FBaseline\u003C\u002Fth>\u003Cth>After\u002FResult\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>BM25 search speed\u003C\u002Ftd>\u003Ctd>Elasticsearch baseline\u003C\u002Ftd>\u003Ctd>Milvus team claims 400% faster in 2.6\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Embedding memory footprint\u003C\u002Ftd>\u003Ctd>FP32 vectors\u003C\u002Ftd>\u003Ctd>About half with FP16\u002FBF16 conversion\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Deployment size\u003C\u002Ftd>\u003Ctd>Cluster-only setup\u003C\u002Ftd>\u003Ctd>Lite, Standalone, or Distributed options\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>Step 13: Secure and ship to production\u003C\u002Fh2>\u003Cp>Goal: harden the stack so it can run outside the laptop.\u003C\u002Fp>\u003Cp>Rotate the default root token, separate query and insert traffic, back up volumes, and move from local Docker to a managed or Helm-based deployment when your workload grows.\u003C\u002Fp>\u003Cpre>\u003Ccode># Production checklist\n# - change credentials\n# - snapshot data volumes\n# - monitor query latency\n# - test restore procedures\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>Verification: you should be able to restart the stack, reconnect, and still list the same collections and row counts.\u003C\u002Fp>\u003Cul>\u003Cli>\u003Cstrong>Mismatched client and server versions:\u003C\u002Fstrong> if \u003Ccode>pymilvus\u003C\u002Fcode> and Milvus differ in minor version, reinstall the matching client and rerun the connection check.\u003C\u002Fli>\u003Cli>\u003Cstrong>Unloaded collection errors:\u003C\u002Fstrong> if search fails after ingest, call \u003Ccode>load_collection\u003C\u002Fcode> before querying and confirm the load state.\u003C\u002Fli>\u003Cli>\u003Cstrong>Wrong embedding dimension:\u003C\u002Fstrong> if inserts fail, make sure the model output dimension matches the collection schema exactly.\u003C\u002Fli>\u003C\u002Ful>\u003Cp>What’s next: extend this stack with reranking, multi-tenant partitions, observability, and a production deployment on Kubernetes or Zilliz Cloud.\u003C\u002Fp>\u003Cul>\u003Cli>Use \u003Ccode>docker compose down -v\u003C\u002Fcode> only when you want a full reset.\u003C\u002Fli>\u003Cli>Keep the same embedding model for both ingestion and query time.\u003C\u002Fli>\u003Cli>Pin image tags and Python package versions for repeatable builds.\u003C\u002Fli>\u003C\u002Ful>","Build a production-ready Milvus RAG stack with embeddings, hybrid search, and LangChain.","tech-insider.org","https:\u002F\u002Ftech-insider.org\u002Fmilvus-vector-database-tutorial-rag-13-steps-2026\u002F",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779511587286-dsm4.png","tools","en","641a3d14-6c0b-492d-b295-89edb4355765",[17,18,19,20,21,22],"Milvus","pymilvus","LangChain","RAG","sentence-transformers","Docker Compose",[24,25,26],"Milvus 2.6 gives you a practical path from local development to production RAG.","A stable schema, matching embeddings, and proper indexing are the core setup steps.","Hybrid retrieval and metadata filters improve recall and answer quality.",6,"2026-05-23T04:45:56.748027+00:00","2026-05-23T04:45:56.738+00:00","bd2d8352-b862-4c8f-af39-d598d22ae929",{"tags":32,"relatedLang":37,"relatedPosts":41},[33,35],{"name":20,"slug":34},"rag",{"name":19,"slug":36},"langchain",{"id":15,"slug":38,"title":39,"language":40},"13-bu-jian-chu-milvus-rag-dui-die-zh","13 步建出 Milvus RAG 堆疊","zh",[42,48,54,60,66,72],{"id":43,"slug":44,"title":45,"cover_image":46,"image_url":46,"created_at":47,"category":13},"96d5d6ba-05e8-47cb-a87b-01e6ef03e840","coding-plan-pro-integration-guide-en","Coding Plan Pro 接入完整指南","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781630272181-s6hg.png","2026-06-16T17:17:24.543206+00:00",{"id":49,"slug":50,"title":51,"cover_image":52,"image_url":52,"created_at":53,"category":13},"7fba3c18-f82c-48d9-80ba-a0209898c80b","windsurf-turns-coding-into-agent-driven-editing-en","Windsurf turns coding into agent-driven editing","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781568204816-laij.png","2026-06-16T00:02:57.636389+00:00",{"id":55,"slug":56,"title":57,"cover_image":58,"image_url":58,"created_at":59,"category":13},"6c73d853-b09f-4d14-ab64-549e19726135","cursors-latest-update-ide-workflow-tools-en","Cursor’s latest update proves IDEs must become workflow tools","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781491673281-ub6v.png","2026-06-15T02:47:20.88317+00:00",{"id":61,"slug":62,"title":63,"cover_image":64,"image_url":64,"created_at":65,"category":13},"33220b48-098e-4417-90f2-681787bbb128","cursor-bugbot-before-push-not-pr-en","Cursor’s Bugbot belongs before the push, not in the PR","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781490763751-pnh5.png","2026-06-15T02:32:16.801116+00:00",{"id":67,"slug":68,"title":69,"cover_image":70,"image_url":70,"created_at":71,"category":13},"6997fa46-16f8-48bd-80dc-fe20f08815a2","prompt-engineering-writing-skill-not-magic-trick-en","Prompt engineering is a writing skill, not a magic trick","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781470978720-rxo2.png","2026-06-14T21:02:28.362525+00:00",{"id":73,"slug":74,"title":75,"cover_image":76,"image_url":76,"created_at":77,"category":13},"50c2cc6b-fdf4-425a-aa80-05be0dee9815","open-notebook-turns-notebooklm-into-open-source-en","Open-Notebook turns NotebookLM into open source","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781450301942-cx4t.png","2026-06-14T15:17:50.526134+00:00",[79,84,89,94,99,104,109,114,119,124],{"id":80,"slug":81,"title":82,"created_at":83},"8008f1a9-7a00-4bad-88c9-3eedc9c6b4b1","surepath-ai-mcp-policy-controls-en","SurePath AI's New MCP Policy Controls Enhance AI Security","2026-03-26T01:26:52.222015+00:00",{"id":85,"slug":86,"title":87,"created_at":88},"27e39a8f-b65d-4f7b-a875-859e2b210156","mcp-standard-ai-tools-2026-en","MCP Standard in 2026: Integrating AI Tools","2026-03-26T01:27:43.127519+00:00",{"id":90,"slug":91,"title":92,"created_at":93},"165f9a19-c92d-46ba-b3f0-7125f662921d","rag-2026-transforming-enterprise-ai-en","How RAG in 2026 is Transforming Enterprise AI","2026-03-26T01:28:11.485236+00:00",{"id":95,"slug":96,"title":97,"created_at":98},"6a2a8e6e-b956-49d8-be12-cc47bdc132b2","mastering-ai-prompts-2026-guide-en","Mastering AI Prompts: A 2026 Guide for Developers","2026-03-26T01:29:07.835148+00:00",{"id":100,"slug":101,"title":102,"created_at":103},"3ab2c67e-4664-4c67-a013-687a2f605814","garry-tan-open-sources-claude-code-toolkit-en","Garry Tan Open-Sources a Claude Code Toolkit","2026-03-26T08:26:20.245934+00:00",{"id":105,"slug":106,"title":107,"created_at":108},"66a7cbf8-7e76-41d4-9bbf-eaca9761bf69","github-ai-projects-to-watch-in-2026-en","20 GitHub AI Projects to Watch in 2026","2026-03-26T08:28:09.752027+00:00",{"id":110,"slug":111,"title":112,"created_at":113},"9f332fda-eace-448a-a292-2283951eee71","practical-github-guide-learning-ml-2026-en","A Practical GitHub Guide to Learning ML in 2026","2026-03-27T01:16:50.125678+00:00",{"id":115,"slug":116,"title":117,"created_at":118},"1b1f637d-0f4d-42bd-974b-07b53829144d","aiml-2026-student-ai-ml-lab-repo-review-en","AIML-2026 Is a Bare-Bones Student Lab Repo","2026-03-27T01:21:51.661231+00:00",{"id":120,"slug":121,"title":122,"created_at":123},"6d1bf3f6-e191-4d30-b55b-8a0722fa6afe","ai-trending-github-repos-and-research-feeds-en","AI Trending Tracks Repos and Research Feeds","2026-03-27T01:31:35.709532+00:00",{"id":125,"slug":126,"title":127,"created_at":128},"010539a1-4c3a-4bd3-937a-26616422ee0d","awesome-ai-for-science-research-tools-map-en","Awesome AI for Science Is Becoming a Real Research Map","2026-03-27T01:46:50.89513+00:00"]