[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-ibm-100b-vector-database-single-server-zh":3,"tags-ibm-100b-vector-database-single-server-zh":33,"related-lang-ibm-100b-vector-database-single-server-zh":48,"related-posts-ibm-100b-vector-database-single-server-zh":52,"series-research-6510a804-74fd-4073-9c73-a1b4d3dc491c":89},{"id":4,"title":5,"content":6,"summary":7,"source":8,"source_url":9,"author":10,"image_url":11,"keywords":12,"language":21,"translated_content":10,"views":22,"is_premium":23,"created_at":24,"updated_at":24,"cover_image":11,"published_at":25,"rewrite_status":26,"rewrite_error":10,"rewritten_from_id":27,"slug":28,"category":29,"related_article_id":30,"status":31,"google_indexed_at":32,"x_posted_at":10,"tweet_text":10,"title_rewritten_at":10,"title_original":10,"key_takeaways":10,"topic_cluster_id":10,"embedding":10,"is_canonical_seed":23},"6510a804-74fd-4073-9c73-a1b4d3dc491c","IBM 單機塞進 1000 億向量","\u003Cp>IBM 這次丟出的數字很硬。單一伺服器，1000 億向量，平均查詢延遲 694 毫秒，召回率超過 90%。說真的，這不是一般簡報會拿來唬人的那種規格。\u003C\u002Fp>\u003Cp>重點不只在向量數字大。它想做的是把 RAG 的一部分，直接塞進儲存層。講白了，就是少一層中介，少一些伺服器，也少一些整合地獄。\u003C\u002Fp>\u003Cp>這個原型來自 \u003Ca href=\"https:\u002F\u002Fresearch.ibm.com\u002Fblog\u002Fcas-100-billion-vector-storage-ai\" target=\"_blank\" rel=\"noopener\">IBM Research\u003C\u002Fa> 的 co\u003Ca href=\"\u002Fnews\u002Fuk-regulators-assess-anthropic-model-risks-zh\">nt\u003C\u002Fa>ent-aware storage，簡稱 CAS。它把文件切塊、嵌入、索引，盡量往儲存系統裡面放。這種做法，對企業資料量大的場景特別有感。\u003C\u002Fp>\u003Ch2>IBM 到底做了什麼\u003C\u002Fh2>\u003Cp>CAS 的核心概念很直接。資料進到儲存系統後，不是先丟到外部向量資料庫，再走一輪複雜管線。它想在儲存層就把文件轉成向量，讓檢索更靠近資料本體。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1776125936277-ct7n.png\" alt=\"IBM 單機塞進 1000 億向量\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>IBM 說，單一文件切成多段後，可能變成數百個向量。企業一旦有數十萬份文件，向量數量就會爆開。這時候還靠傳統 scale-out 架構，就會開始燒錢。\u003C\u002Fp>\u003Cp>這套原型用了分層索引、GPU 加速，還把查詢運算和儲存拆開。硬體部分，IBM 是跟 \u003Ca href=\"https:\u002F\u002Fwww.samsung.com\u002Fsemiconductor\u002F\" target=\"_blank\" rel=\"noopener\">Samsung Semiconductor\u003C\u002Fa> 和 \u003Ca href=\"https:\u002F\u002Fwww.nvidia.com\" target=\"_blank\" rel=\"noopener\">NVIDIA\u003C\u002Fa> 合作，跑在 \u003Ca href=\"https:\u002F\u002Fwww.ibm.com\u002Fproducts\u002Fstorage-scale-system-6000\" target=\"_blank\" rel=\"noopener\">IBM Storage Scale System 6000\u003C\u002Fa> 上。\u003C\u002Fp>\u003Cul>\u003Cli>向量規模：1000 億\u003C\u002Fli>\u003Cli>向量維度：384，full precision float\u003C\u002Fli>\u003Cli>儲存占用：153 TiB\u003C\u002Fli>\u003Cli>平均查詢延遲：694 毫秒\u003C\u002Fli>\u003Cli>召回率：超過 90%\u003C\u002Fli>\u003Cli>建索引硬體：6 張 NVIDIA H200 GPU\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>為什麼 RAG 一直卡在儲存\u003C\u002Fh2>\u003Cp>現在很多企業做 AI，第一個想到的就是 RAG。原因很簡單。你不用把所有內部文件重新訓練進模型。你只要把文件嵌入後存起來，查詢時抓相關片段就好。\u003C\u002Fp>\u003Cp>問題是，資料一大，這套流程就開始變重。索引要時間，重建索引更花時間。等你把資料、向量庫、搜尋層、模型服務全部串好，維運成本也跟著上來。\u003C\u002Fp>\u003Cp>IBM 的說法是，現在很多向量資料庫要靠數十台，甚至數百台伺服器，才撐得住十億級向量。這對雲端預算很不友善。對內部 IT 團隊來說，也很像在養一隻越長越大的怪獸。\u003C\u002Fp>\u003Cp>IBM 想把更多工作往儲存層下放，再用 GPU 做最吃重的部分。它說，如果只用 2-socket I\u003Ca href=\"\u002Fnews\u002Fanthropic-mythos-pr-battle-ai-risk-zh\">nt\u003C\u002Fa>el CPU，建索引大概要 120 天。換成 6 張 NVIDIA H200 GPU，時間降到 4 天。前面還要先花 9 天做載入和分割。\u003C\u002Fp>\u003Cul>\u003Cli>傳統向量資料庫常要橫向擴到數十到數百台\u003C\u002Fli>\u003Cli>IBM 說 CPU 建索引要約 120 天\u003C\u002Fli>\u003Cli>6 張 NVIDIA H200 GPU 可壓到 4 天\u003C\u002Fli>\u003Cli>資料載入與分割還要 9 天\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>IBM 高層想講的故事\u003C\u002Fh2>\u003Cp>IBM 這次不是只在秀硬體數字。它也在講企業價值。\u003Ca href=\"https:\u002F\u002Fwww.ibm.com\" target=\"_blank\" rel=\"noopener\">IBM\u003C\u002Fa> Storage GM Sam Werner 的意思很明白。很多文件早就躺在儲存系統裡，只是企業一直沒把它們吃乾抹淨。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1776125931294-l8c0.png\" alt=\"IBM 單機塞進 1000 億向量\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>Sam Werner 說：「Enterprises can \u003Ca href=\"\u002Fnews\u002Fanthropic-claude-mythos-preview-bank-fears-zh\">de\u003C\u002Fa>rive unprecedented insights from all of their documents in storage systems.」這句話很像行銷稿，但意思很實際。資料都在那裡了，為什麼還要多搬一層？\u003C\u002Fp>\u003Cp>IBM Storage CTO Vincent Hsu 則把焦點放在基礎設施。企業資料集變大很快，不能等到最後才想擴充策略。Daniel Waddington 也提到維運問題。系統不只要跑得動，還要能持續更新。\u003C\u002Fp>\u003Cblockquote>“Enterprises can derive unprecedented insights from all of their documents in storage systems,” said Sam Werner, GM IBM Storage.\u003C\u002Fblockquote>\u003Cp>IBM 還放了一句很直白的說法。它說安全性已經內建在向量資料庫裡，現在要做的是在不拉高基礎設施 footprint 的前提下擴大規模。這句話很像賣點，但也很像企業真實痛點。\u003C\u002Fp>\u003Ch2>跟一般做法比，差在哪\u003C\u002Fh2>\u003Cp>現在多數 RAG 架構都很碎。資料進來先做 ingestion，再丟向量資料庫，旁邊還有物件儲存、快取、模型服務。每一層都能出問題。每一層都要維運。\u003C\u002Fp>\u003Cp>IBM 想做的是把這些層壓扁。儲存不再只是放資料。它也要參與檢索。這種設計很像把倉庫直接改造成半個搜尋引擎。\u003C\u002Fp>\u003Cp>從數字看，IBM 這次的 demo 已經不是小打小鬧。1000 億向量、694 毫秒、90% 以上召回率，這組數字至少證明一件事。向量檢索的戰場，已經從「能不能做」變成「怎麼做得划算」。\u003C\u002Fp>\u003Cul>\u003Cli>一般大型向量 DB：十億級向量，常要數十到數百台\u003C\u002Fli>\u003Cli>IBM CAS 原型：單機 1000 億向量\u003C\u002Fli>\u003Cli>常見 CPU 索引：可能拖到數月\u003C\u002Fli>\u003Cli>IBM GPU 索引：4 天完成，前置載入 9 天\u003C\u002Fli>\u003Cli>傳統 RAG：切成多層管線\u003C\u002Fli>\u003Cli>IBM CAS：更多流程放進儲存層\u003C\u002Fli>\u003C\u002Ful>\u003Cp>IBM 和 NVIDIA 也在推 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FcuVS\" target=\"_blank\" rel=\"noopener\">cuVS\u003C\u002Fa> 相關的向量索引工作。它們的目標很明確。1000 億以上向量，索引時間壓到 1 天內，載入時間從 9 天壓到 1 天，搜尋延遲往 50 到 100 毫秒靠近，召回率維持 90%。\u003C\u002Fp>\u003Cp>這組目標很誠實。它沒有說要把一切變魔法。它只是在告訴你，瓶頸在哪裡。現在不是向量檢索能不能用。是它能不能在企業裡面活得久、活得便宜。\u003C\u002Fp>\u003Ch2>這波對產業代表什麼\u003C\u002Fh2>\u003Cp>這件事不只是在比誰能塞更多向量。它也在改寫儲存廠商的角色。以前大家談 AI 基礎設施，主角常是 GPU、模型、API、向量資料庫。儲存廠商常站在後面。\u003C\u002Fp>\u003Cp>現在 IBM 想把自己往前推。它的邏輯是，既然企業資料本來就放在儲存系統，那檢索也可以從那裡開始。這對有大量內部文件的公司，像金融、製造、醫療、法務，都很有吸引力。\u003C\u002Fp>\u003Cp>我覺得這也會逼其他廠商重新想架構。像 \u003Ca href=\"https:\u002F\u002Fwww.pinecone.io\" target=\"_blank\" rel=\"noopener\">Pinecone\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fweaviate.io\" target=\"_blank\" rel=\"noopener\">Weaviate\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fmilvus.io\" target=\"_blank\" rel=\"noopener\">Milvus\u003C\u002Fa> 這類向量資料庫，強項還是在搜尋與索引。IBM 走的是另一條路，直接把儲存層拉進來打。\u003C\u002Fp>\u003Cp>這裡的競爭點很清楚。不是誰的 ANN 演算法名字比較炫。是誰能把整體成本壓下來，還能維持可維運性。\u003C\u002Fp>\u003Cp>如果你看企業採購，這件事更現實。很多團隊不是買不起 GPU，而是養不起一整套分散式檢索堆疊。少一層服務，就少一份故障點。少一份故障點，就少一次半夜被叫醒。\u003C\u002Fp>\u003Ch2>接下來該看什麼\u003C\u002Fh2>\u003Cp>IBM 這次的 demo，最有意思的地方不是 1000 億這個數字本身。是它把「向量檢索」從獨立服務，往儲存系統裡面推了一步。這件事如果做順，企業 RAG 的架構會簡單很多。\u003C\u002Fp>\u003Cp>但我也不會把它說得太神。694 毫秒平均延遲，對某些即時互動場景還是偏慢。它比較像大規模企業檢索的工程解，而不是聊天機器人秒回的理想答案。\u003C\u002Fp>\u003Cp>接下來最該盯的，是 IBM 能不能把索引時間從 4 天再壓下去，還有搜尋延遲能不能往 100 毫秒內靠攏。如果做得到，這套 CAS 才真的有機會進到正式部署清單。\u003C\u002Fp>\u003Cp>我的判斷很直接。下一波企業 RAG 競爭，不會只看誰的 LLM 比較會講。會看誰能把資料、索引、儲存、GPU 串得更省錢。你如果正在規劃內部知識庫，現在就該問一句：你要再養一個向量叢集，還是讓儲存系統多做一點事？\u003C\u002Fp>","IBM 宣稱 CAS 原型在單一伺服器上索引 1000 億向量，平均延遲 694 毫秒、召回率超過 90%。這篇拆解它怎麼做、跟一般向量資料庫差在哪、以及對企業 RAG 架構的影響。","research.ibm.com","https:\u002F\u002Fresearch.ibm.com\u002Fblog\u002Fcas-100-billion-vector-storage-ai",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1776125936277-ct7n.png",[13,14,15,16,17,18,19,20],"IBM","向量資料庫","RAG","CAS","AI儲存","NVIDIA H200","企業AI","向量檢索","zh",0,false,"2026-04-14T00:18:35.333469+00:00","2026-04-14T00:18:35.157+00:00","done","7c2261ad-ae44-4600-99dd-f3c255c78b3d","ibm-100b-vector-database-single-server-zh","research","10619d9e-17e5-426e-8139-5ad963627565","published","2026-04-14T09:00:10.651+00:00",[34,36,38,40,41,43,45,47],{"name":18,"slug":35},"nvidia-h200",{"name":15,"slug":37},"rag",{"name":16,"slug":39},"cas",{"name":20,"slug":20},{"name":19,"slug":42},"企業ai",{"name":13,"slug":44},"ibm",{"name":17,"slug":46},"ai儲存",{"name":14,"slug":14},{"id":30,"slug":49,"title":50,"language":51},"ibm-100b-vector-database-single-server-en","IBM hits 100B vectors on one server","en",[53,59,65,71,77,83],{"id":54,"slug":55,"title":56,"cover_image":57,"image_url":57,"created_at":58,"category":29},"667b72b6-e821-4d68-80a1-e03340bc85f1","turboquant-seo-shift-small-sites-zh","TurboQuant 與小站 SEO 變化","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778840440690-kcw9.png","2026-05-15T10:20:27.319472+00:00",{"id":60,"slug":61,"title":62,"cover_image":63,"image_url":63,"created_at":64,"category":29},"381fb6c6-6da7-4444-831f-8c5eed8d685c","turboquant-vllm-comparison-fp8-kv-cache-zh","TurboQuant 與 FP8 實測結果","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778839867551-4v9g.png","2026-05-15T10:10:36.034569+00:00",{"id":66,"slug":67,"title":68,"cover_image":69,"image_url":69,"created_at":70,"category":29},"c15f45ee-a548-4dbf-8152-91de159c1a11","llmbda-calculus-agent-safety-rules-zh","LLMbda 演算替 AI 代理人立安全規則","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778825503412-mlbf.png","2026-05-15T06:10:34.832664+00:00",{"id":72,"slug":73,"title":74,"cover_image":75,"image_url":75,"created_at":76,"category":29},"0c02225c-d6ff-44f8-bc92-884c8921c4a3","low-complexity-beamspace-denoiser-mmwave-mimo-zh","更簡單的毫米波波束域去噪器","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778814650361-xtc2.png","2026-05-15T03:10:30.06639+00:00",{"id":78,"slug":79,"title":80,"cover_image":81,"image_url":81,"created_at":82,"category":29},"9d27f967-62cc-433f-8cdb-9300937ade13","ai-benchmark-wins-cyber-scare-defenders-zh","為什麼 AI 基準賽在資安領域的勝利，應該讓防守方警醒","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778807450006-nofx.png","2026-05-15T01:10:29.379041+00:00",{"id":84,"slug":85,"title":86,"cover_image":87,"image_url":87,"created_at":88,"category":29},"bc402dc6-5da6-46fc-9d66-d09cb215f72b","why-linux-security-needs-patch-wave-mindset-zh","為什麼 Linux 安全需要「補丁浪潮」思維","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778741449813-s2wn.png","2026-05-14T06:50:24.052583+00:00",[90,95,100,105,110,115,120,125,130,135],{"id":91,"slug":92,"title":93,"created_at":94},"f18dbadb-8c59-4723-84a4-6ad22746c77a","deepmind-bets-on-continuous-learning-ai-2026-zh","DeepMind 押注 2026 連續學習 AI","2026-03-26T08:16:02.367355+00:00",{"id":96,"slug":97,"title":98,"created_at":99},"f4a106cb-02a6-4508-8f39-9720a0a93cee","ml-papers-of-the-week-github-research-desk-zh","每週 ML 論文清單，為何紅到 GitHub","2026-03-27T01:11:39.284175+00:00",{"id":101,"slug":102,"title":103,"created_at":104},"c4f807ca-4e5f-47f1-a48c-961cf3fc44dc","ai-ml-conferences-to-watch-in-2026-zh","2026 AI 研討會投稿時程整理","2026-03-27T01:51:53.874432+00:00",{"id":106,"slug":107,"title":108,"created_at":109},"9f50561b-aebd-46ba-94a8-363198aa7091","openclaw-agents-manipulated-self-sabotage-zh","OpenClaw Agent 會自己搞砸自己","2026-03-28T03:03:18.786425+00:00",{"id":111,"slug":112,"title":113,"created_at":114},"11f22e92-7066-4978-a544-31f5f2156ec6","vega-learning-to-drive-with-natural-language-instructions-zh","Vega：使用自然語言指示進行自駕車控制","2026-03-28T14:54:04.847912+00:00",{"id":116,"slug":117,"title":118,"created_at":119},"a4c7cfec-8d0e-4fec-93cf-1b9699a530b8","drive-my-way-en-zh","Drive My Way：個性化自駕車風格的實現","2026-03-28T14:54:26.207495+00:00",{"id":121,"slug":122,"title":123,"created_at":124},"dec02f89-fd39-41ba-8e4d-11ede93a536d","training-knowledge-bases-with-writeback-rag-zh","用 WriteBack-RAG 強化知識庫提升檢索效能","2026-03-28T14:54:45.775606+00:00",{"id":126,"slug":127,"title":128,"created_at":129},"3886be5c-a137-40cc-b9e2-0bf18430c002","packforcing-efficient-long-video-generation-method-zh","PackForcing：短影片訓練也能生成長影片","2026-03-28T14:55:02.688141+00:00",{"id":131,"slug":132,"title":133,"created_at":134},"72b90667-d930-4cc9-8ced-aaa0f8968d44","pixelsmile-toward-fine-grained-facial-expression-editing-zh","PixelSmile：提升精細臉部表情編輯的新方法","2026-03-28T14:55:20.678181+00:00",{"id":136,"slug":137,"title":138,"created_at":139},"cf046742-efb2-4753-aef9-caed5da5e32e","adaptive-block-scaled-data-types-zh","IF4：神經網路量化的聰明選擇","2026-03-31T06:00:36.990273+00:00"]