[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-harness-engineering-long-running-multi-agent-systems-zh":3,"tags-harness-engineering-long-running-multi-agent-systems-zh":32,"related-lang-harness-engineering-long-running-multi-agent-systems-zh":47,"related-posts-harness-engineering-long-running-multi-agent-systems-zh":51,"series-ai-agent-35b17db6-a915-4a3c-87c6-733fbb7f5a31":88},{"id":4,"title":5,"content":6,"summary":7,"source":8,"source_url":9,"author":10,"image_url":11,"keywords":12,"language":20,"translated_content":10,"views":21,"is_premium":22,"created_at":23,"updated_at":23,"cover_image":11,"published_at":24,"rewrite_status":25,"rewrite_error":10,"rewritten_from_id":26,"slug":27,"category":28,"related_article_id":29,"status":30,"google_indexed_at":31,"x_posted_at":10,"tweet_text":10,"title_rewritten_at":10,"title_original":10,"key_takeaways":10,"topic_cluster_id":10,"embedding":10,"is_canonical_seed":22},"35b17db6-a915-4a3c-87c6-733fbb7f5a31","長跑型多代理系統的 Harness 設計","\u003Cp>長跑型多代理系統，最常死在小地方。記憶污染、提示詞漂移、上一個 agent 的半成品推理，全都會混進下一步。\u003C\u002Fp>\u003Cp>這篇講的 Harness Engineering，做法很直白。每次都開新的 \u003Ca href=\"https:\u002F\u002Fwww.anthropic.com\u002Fclaude-code\" target=\"_blank\" rel=\"noopener\">Claude Code\u003C\u002Fa> process。再把 \u003Ca href=\"\u002Fnews\u002Fin-place-ttt-llms-adapt-at-inference-zh\">Pla\u003C\u002Fa>nner 的輸出轉成 JSON，才交給 Generator。\u003C\u002Fp>\u003Cp>聽起來很樸素，但很實用。因為它把「思考」和「執行」切開了。長時間跑任務時，這種切法比花俏 prompt 更重要。\u003C\u002Fp>\u003Ch2>為什麼要重置 context\u003C\u002Fh2>\u003Cp>很多多代理系統，會把對話歷史當共享工作區。短任務還行。任務一長，就開始出事。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775629509415-9j0g.png\" alt=\"長跑型多代理系統的 Harness 設計\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>Planner 會試很多方向。它也可能改主意。那些試探性推理，根本不該流到執行層。可是一旦共用 context，Generator 就可能吃到這些髒資料。\u003C\u002Fp>\u003Cp>重置 context 的核心，就是讓每個 agent 各跑各的。Generator 不繼承 Planner 的內心戲。它只拿到一個任務 payload，然後照著做。這樣出錯時，也比較好查。\u003C\u002Fp>\u003Cul>\u003Cli>每次執行都從新 process 開始\u003C\u002Fli>\u003Cli>Planner 輸出先變成 JSON\u003C\u002Fli>\u003Cli>Generator 只看最終任務定義\u003C\u002Fli>\u003Cli>上下文污染會少很多\u003C\u002Fli>\u003C\u002Ful>\u003Cp>這不是什麼記憶魔法。這是介面設計。講白了，就是把 Planner 當上游編譯器，把 Generator 當下游 runtime。\u003C\u002Fp>\u003Ch2>從對話變成任務規格\u003C\u002Fh2>\u003Cp>這個設計最漂亮的地方，就是把 Planner 的輸出先整理成 JSON。Planner 可以自由思考。真正送到下一步的，只保留必要資訊。\u003C\u002Fp>\u003Cp>這樣比直接傳文字乾淨很多。因為 JSON 會逼你把欄位講清楚。目標、限制、輸入、輸出，全部都能明確命名。Generator 看到的不是聊天紀錄，而是任務規格。\u003C\u002Fp>\u003Cp>對做 agent 系統的團隊來說，這還有一個好處。log 比較好看。你可以直接 diff 任務物件，也能先驗證欄位，再決定要不要送去執行。\u003C\u002Fp>\u003Cul>\u003Cli>Planner 輸出會先被過濾\u003C\u002Fli>\u003Cli>JSON 讓任務邊界很明確\u003C\u002Fli>\u003Cli>結構化欄位比自由文字好驗證\u003C\u002Fli>\u003Cli>執行紀錄更容易稽核\u003C\u002Fli>\u003C\u002Ful>\u003Cp>這種設計一開始不太炫。真的。可是系統一拉長，你就會發現，介面品質比 prompt 內容更值錢。\u003C\u002Fp>\u003Ch2>Anthropic 怎麼看這件事\u003C\u002Fh2>\u003Cp>\u003Ca href=\"https:\u002F\u002Fwww.anthropic.com\u002Fclaude-code\" target=\"_blank\" rel=\"noopener\">Anthropic\u003C\u002Fa> 在 \u003Ca href=\"https:\u002F\u002Fdocs.anthropic.com\u002Fen\u002Fdocs\u002Fclaude-code\" target=\"_blank\" rel=\"noopener\">Claude Code 文件\u003C\u002Fa>裡講得很直接。它說，Claude Code 是一個直接在 terminal 運作的 agentic coding tool。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775629492110-w9a8.png\" alt=\"長跑型多代理系統的 Harness 設計\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cblockquote>“Claude Code is an agentic coding tool that operates directly in your terminal.” — Anthropic\u003C\u002Fblockquote>\u003Cp>這句話很有意思。因為它把產品定位成執行工具，不是聊天玩具。既然是執行工具，外層 harness 就該幫它隔離雜訊。\u003C\u002Fp>\u003Cp>新 process 和結構化 prompt，正好就是這個用途。它們不是在加戲。它們是在保護執行層，別被上游的混亂狀態拖下水。\u003C\u002Fp>\u003Cp>說真的，這對長時間任務很重要。你跑個幾分鐘還好。跑幾小時、幾十個 job 之後，沒有這層保護，系統很容易開始鬼打牆。\u003C\u002Fp>\u003Ch2>跟其他 agent 架構比起來\u003C\u002Fh2>\u003Cp>很多框架會讓多個步驟共用同一條 conversation \u003Ca href=\"\u002Fnews\u002Fanthropic-google-broadcom-next-gen-compute-zh\">thr\u003C\u002Fa>ead。做 demo 很方便。做正式系統，就很危險。\u003C\u002Fp>\u003Cp>因為一個 agent 的半成品推理，會變成下一個 agent 的背景知識。系統慢慢就不像 pipeline，比較像群組聊天室。這種架構很難保證穩定。\u003C\u002Fp>\u003Cp>相對地，context-reset 模型把每一步當獨立計算。Planner 先產出 task description。Generator 只執行 task。兩者之間沒有隱性記憶。\u003C\u002Fp>\u003Cul>\u003Cli>\u003Ca href=\"https:\u002F\u002Fwww.anthropic.com\u002Fclaude-code\" target=\"_blank\" rel=\"noopener\">Claude Code\u003C\u002Fa> 的新 process 模式，隔離性比較好\u003C\u002Fli>\u003Cli>\u003Ca href=\"https:\u002F\u002Fopenai.com\u002Findex\u002Fintroducing-openai-codex\u002F\" target=\"_blank\" rel=\"noopener\">OpenAI Codex\u003C\u002Fa> 類型的 loop，適合快速原型\u003C\u002Fli>\u003Cli>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Flangchain-ai\u002Flanggraph\" target=\"_blank\" rel=\"noopener\">LangGraph\u003C\u002Fa> 適合狀態圖很複雜的 orchestration\u003C\u002Fli>\u003Cli>JSON task spec 比自由 prompt chain 更好測試\u003C\u002Fli>\u003C\u002Ful>\u003Cp>當然，重置 context 不是沒代價。你會失去一些對話連續性。可是對長跑系統來說，這個代價通常值得付。穩定執行，比聰明地重用記憶更重要。\u003C\u002Fp>\u003Ch2>這代表什麼產業脈絡\u003C\u002Fh2>\u003Cp>現在很多團隊都在做 agentic workflow。從內部自動化，到 c\u003Ca href=\"\u002Fnews\u002Fclaude-code-compaction-context-management-zh\">ode\u003C\u002Fa> generation，再到研究助理，大家都想把 LLM 接進工作流。\u003C\u002Fp>\u003Cp>問題是，很多人先想「怎麼讓 agent 記得更多」。我反而覺得，先想「哪些東西不該記住」更實際。當任務變長，少記一點，常常比多記一點更可靠。\u003C\u002Fp>\u003Cp>這也解釋了為什麼結構化資料越來越重要。不是每一步都要聊天。很多時候，直接傳 JSON、schema、或 task object 就夠了。這跟傳統軟體工程很像。介面定義清楚，系統才跑得久。\u003C\u002Fp>\u003Cp>我覺得接下來一年，做得好的多代理系統，會更像 typed pipeline，而不是一串 prompt。誰能把 agent 之間的邊界定清楚，誰就比較少踩 prompt drift 的坑。\u003C\u002Fp>\u003Ch2>你可以怎麼用這套思路\u003C\u002Fh2>\u003Cp>如果你自己也在做 Harness，我會先問一個問題。每個 step 真的都需要 conversational memory 嗎？如果不需要，就把 context reset 掉。\u003C\u002Fp>\u003Cp>再來，把 Planner 的輸出改成結構化資料。目標、限制、輸入、驗收條件，都寫進 JSON。Generator 只吃這份資料，不吃聊天紀錄。這樣做，debug 會輕鬆很多。\u003C\u002Fp>\u003Cp>最後，先在介面層做驗證。不要等到 LLM 亂寫完才補救。你可以在送進執行層前檢查欄位、型別、長度，甚至先做 schema validation。這些工作看起來土，卻很值。\u003C\u002Fp>\u003Cp>我的預測很簡單。接下來的 agent 系統，會越來越少靠長對話記憶。更多團隊會改用短生命週期的 process，加上 JSON 任務流。你如果現在就在做這件事，後面會少掉很多 debug 地獄。\u003C\u002Fp>\u003Cp>如果你在設計自己的 multi-agent workflow，先別急著加更多 memory。先把邊界切乾淨。這件事，通常比加一個更大的 prompt 有用得多。\u003C\u002Fp>","長跑型多代理系統最怕記憶污染。這篇看 Harness Engineering 怎麼用新 process、JSON 任務與 Claude Code，切開 Planner 和 Generator。","zhuanlan.zhihu.com","https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F2021965663103665310",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775629509415-9j0g.png",[13,14,15,16,17,18,19],"Harness Engineering","multi-agent systems","Claude Code","context reset","JSON task spec","agent orchestration","LLM workflow","zh",1,false,"2026-04-08T06:24:33.356723+00:00","2026-04-08T06:24:33.234+00:00","done","6c3c5580-d728-4258-a615-c26f7ea6a5d3","harness-engineering-long-running-multi-agent-systems-zh","ai-agent","ed53d5e3-fe2f-4824-91f7-9ab3cfb89bed","published","2026-04-08T09:00:47.18+00:00",[33,35,37,39,41,43,45],{"name":13,"slug":34},"harness-engineering",{"name":17,"slug":36},"json-task-spec",{"name":15,"slug":38},"claude-code",{"name":14,"slug":40},"multi-agent-systems",{"name":19,"slug":42},"llm-workflow",{"name":18,"slug":44},"agent-orchestration",{"name":16,"slug":46},"context-reset",{"id":29,"slug":48,"title":49,"language":50},"harness-engineering-long-running-multi-agent-systems-en","Harness Engineering for Long-Running Multi-Agent Systems","en",[52,58,64,70,76,82],{"id":53,"slug":54,"title":55,"cover_image":56,"image_url":56,"created_at":57,"category":28},"e7874ed9-592f-4e06-b7b7-ab733fe779db","claude-agent-dreaming-outcomes-multiagent-zh","Claude 幫 Agent 加了做夢功能","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778868642412-7woy.png","2026-05-15T18:10:24.427608+00:00",{"id":59,"slug":60,"title":61,"cover_image":62,"image_url":62,"created_at":63,"category":28},"38406a12-f833-4c69-ae22-99c31f03dd52","switch-ai-outputs-markdown-to-html-zh","怎麼把 AI 輸出改成 HTML","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778743243861-8901.png","2026-05-14T07:20:21.545364+00:00",{"id":65,"slug":66,"title":67,"cover_image":68,"image_url":68,"created_at":69,"category":28},"c7c69fe4-97e3-4edf-a9d6-a79d0c4495b4","anthropic-cat-wu-proactive-ai-assistants-zh","Cat Wu 談 Claude 的主動式 AI","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778735455993-gnw7.png","2026-05-14T05:10:30.453046+00:00",{"id":71,"slug":72,"title":73,"cover_image":74,"image_url":74,"created_at":75,"category":28},"e1d6acda-fa49-4514-aa75-709504be9f93","how-to-run-hermes-agent-on-discord-zh","如何在 Discord 執行 Hermes Agent","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778724655796-cjul.png","2026-05-14T02:10:34.362605+00:00",{"id":77,"slug":78,"title":79,"cover_image":80,"image_url":80,"created_at":81,"category":28},"4104fa5f-d95f-45c5-9032-99416cf0365c","why-ragflow-is-the-right-open-source-rag-engine-to-self-host-zh","為什麼 RAGFlow 是最適合自架的開源 RAG 引擎","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778674262278-1630.png","2026-05-13T12:10:23.762632+00:00",{"id":83,"slug":84,"title":85,"cover_image":86,"image_url":86,"created_at":87,"category":28},"7095f05c-34f5-469f-a044-2525d2010ce9","how-to-add-temporal-rag-in-production-zh","如何在正式環境加入 Temporal RAG","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778667053844-osvs.png","2026-05-13T10:10:30.930982+00:00",[89,94,99,104,109,114,119,124,129,134],{"id":90,"slug":91,"title":92,"created_at":93},"4ae1e197-1d3d-4233-8733-eafe9cb6438b","claude-now-uses-your-pc-to-finish-tasks-zh","Claude 開始幫你操作電腦","2026-03-26T07:20:48.457387+00:00",{"id":95,"slug":96,"title":97,"created_at":98},"5bede67f-e21c-413d-9ab8-54a3c3d26227","googles-2026-ai-agent-report-decoded-zh","Google 2026 AI Agent 報告解讀","2026-03-26T11:15:22.651956+00:00",{"id":100,"slug":101,"title":102,"created_at":103},"2987d097-563f-46c7-b76f-b558d8ef7c2b","kimi-k25-review-stronger-still-not-legend-zh","Kimi K2.5 評測：更強，但還不是神作","2026-03-27T07:15:55.277513+00:00",{"id":105,"slug":106,"title":107,"created_at":108},"95c9053b-e3f4-4cb5-aace-5c54f4c9e044","claude-code-controls-mac-desktop-zh","Claude Code 也能操控 Mac 了","2026-03-28T03:01:58.58121+00:00",{"id":110,"slug":111,"title":112,"created_at":113},"dc58e153-e3a8-4c06-9b96-1aa64eabbf5f","cloudflare-100x-faster-ai-agent-sandbox-zh","Cloudflare 的 AI 沙箱跑超快","2026-03-28T03:09:44.142236+00:00",{"id":115,"slug":116,"title":117,"created_at":118},"1c8afc56-253f-47a2-979f-1065ff072f2a","openai-backs-isara-agent-swarm-bet-zh","OpenAI 挺 Isara 的 agent swarm …","2026-03-28T03:15:27.513155+00:00",{"id":120,"slug":121,"title":122,"created_at":123},"7379b422-576e-45df-ad5a-d57a0d9dd467","openai-plan-automated-ai-researcher-zh","OpenAI 想做自動化 AI 研究員","2026-03-28T03:17:42.090548+00:00",{"id":125,"slug":126,"title":127,"created_at":128},"48c9889e-86df-450b-a356-e4a4b7c83c5b","harness-engineering-ai-agent-reliability-2026-zh","駕馭工程：從「馬具」到「作業系統」，AI Agent 可靠性的終極密碼","2026-03-31T06:42:53.556721+00:00",{"id":130,"slug":131,"title":132,"created_at":133},"e41546b8-ba9e-455f-9159-88d4614ad711","openai-codex-plugin-claude-code-zh","OpenAI 把 Codex 放進 Claude Code","2026-04-01T09:21:54.687617+00:00",{"id":135,"slug":136,"title":137,"created_at":138},"96d8e8c8-1edd-475d-9145-b1e7a1b02b65","mcp-explained-from-prompts-to-production-zh","MCP 怎麼把提示詞變工作流","2026-04-01T09:24:39.321274+00:00"]