[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-how-to-build-harness-for-ai-agents-en":3,"article-related-how-to-build-harness-for-ai-agents-en":31,"series-ai-agent-171dca3d-20d5-4205-a20d-406cd426fc6d":85},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":23,"views":27,"created_at":28,"published_at":29,"topic_cluster_id":30},"171dca3d-20d5-4205-a20d-406cd426fc6d","how-to-build-harness-for-ai-agents-en","How to Build a Harness for AI Agents","\u003Cp data-speakable=\"summary\">\u003Ca href=\"\u002Ftag\u002Fharness-engineering\">Harness engineering\u003C\u002Fa> defines the control system that lets an \u003Ca href=\"\u002Ftag\u002Fai-agent\">AI agent\u003C\u002Fa> perceive, act, and verify output.\u003C\u002Fp>\u003Cp>This guide is for developers who want to move beyond prompt tuning and build a dependable \u003Ca href=\"\u002Ftag\u002Fagent\">agent\u003C\u002Fa> loop with clear inputs, actions, and checks. After following the steps, you will have a simple harness design you can adapt to any model, plus a working mental model for where the model ends and the control logic begins.\u003C\u002Fp>\u003Cp>The core idea is simple: an agent is not just a model, it is a model wrapped in a harness that decides what it can see, what it can do, and how its results are validated. That separation is what makes agent behavior easier to debug, safer to run, and more consistent in production.\u003C\u002Fp>\u003Ch2>Before you start\u003C\u002Fh2>\u003Cul>\u003Cli>OpenAI or Anthropic API account with a valid API key.\u003C\u002Fli>\u003Cli>Node.js 20+ or Python 3.11+.\u003C\u002Fli>\u003Cli>Git 2.40+.\u003C\u002Fli>\u003Cli>A terminal and a code editor.\u003C\u002Fli>\u003Cli>Basic familiarity with JSON, HTTP requests, and function calling.\u003C\u002Fli>\u003Cli>Optional: access to the [OpenAI docs](https:\u002F\u002Fplatform.openai.com\u002Fdocs) and the [OpenAI GitHub repo](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fopenai-openapi), or the equivalent docs and SDK for your chosen model provider.\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Step 1: Define the agent boundary\u003C\u002Fh2>\u003Cp>Your first goal is to separate the model from the harness so you can reason about each part independently. The model should generate candidate actions or answers, while the harness owns state, tool access, retries, and validation.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779844562168-ar28.png\" alt=\"How to Build a Harness for AI Agents\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>Write down three boxes: \u003Cstrong>input\u003C\u002Fstrong> from the environment, \u003Cstrong>policy\u003C\u002Fstrong> inside the harness, and \u003Cstrong>output\u003C\u002Fstrong> back to the user or tool. A minimal boundary definition looks like this:\u003C\u002Fp>\u003Cpre>\u003Ccode>Environment -> Harness -> Model -> Harness -> Tools\u002FChecks -> Final Output\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>You should be able to explain, in one sentence, which component is allowed to call APIs, which component stores memory, and which component decides whether a response is acceptable. If that is unclear, the boundary is still too loose.\u003C\u002Fp>\u003Ch2>Step 2: Model the observation schema\u003C\u002Fh2>\u003Cp>The next goal is to control what the agent can perceive. In harness engineering, the observation schema is the structured view of the world that you pass into the model, such as user intent, recent messages, tool results, and constraints.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779844558611-2jk6.png\" alt=\"How to Build a Harness for AI Agents\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>Use a JSON shape so the model sees stable fields instead of an unstructured blob. For example, keep observations small and explicit:\u003C\u002Fp>\u003Cpre>\u003Ccode>{\n  \"goal\": \"summarize invoice\",\n  \"messages\": [\"...\"],\n  \"tool_results\": [],\n  \"constraints\": [\"no PII\", \"return JSON\"]\n}\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>You should see the same keys on every turn, even when the values change. That consistency makes prompt updates safer and makes it easier to test whether the harness, not the model, is responsible for bad behavior.\u003C\u002Fp>\u003Ch2>Step 3: Register allowed actions\u003C\u002Fh2>\u003Cp>Your next goal is to restrict what the agent can do. Instead of letting the model improvise, define a small action set such as search, fetch, calculate, write, or escalate. The harness should translate those actions into real \u003Ca href=\"\u002Ftag\u002Fapi\">API\u003C\u002Fa> calls.\u003C\u002Fp>\u003Cp>Create a tool registry with names, input schemas, and permission rules. A simple registry might look like this:\u003C\u002Fp>\u003Cpre>\u003Ccode>tools:\n  - name: search_docs\n    input: { query: string }\n  - name: fetch_record\n    input: { id: string }\n  - name: submit_answer\n    input: { text: string }\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>You should be able to reject any action that is not in the registry. If the model asks for an unsupported tool, the harness should return a controlled error instead of executing anything unexpected.\u003C\u002Fp>\u003Ch2>Step 4: Add validation and retry logic\u003C\u002Fh2>\u003Cp>Your goal here is to make the harness verify output before it is accepted. This is where harness engineering differs from \u003Ca href=\"\u002Ftag\u002Fprompt-engineering\">prompt engineering\u003C\u002Fa>, because the harness can inspect structure, policy, and confidence before it commits a result.\u003C\u002Fp>\u003Cp>Implement checks for schema validity, forbidden content, and task-specific rules. Then add a retry path that feeds the failure reason back to the model. A practical pattern is:\u003C\u002Fp>\u003Cpre>\u003Ccode>if !valid_json(output) or !passes_policy(output) {\n  retry_with_error_context();\n}\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>You should see fewer malformed responses and fewer silent failures. A good verification step is to log accepted versus rejected outputs so you can tell whether the harness is catching issues early.\u003C\u002Fp>\u003Ch2>Step 5: Close the loop with state and memory\u003C\u002Fh2>\u003Cp>The final goal is to preserve only the state that helps the agent do its job. The harness should decide what to remember, what to summarize, and what to discard between turns.\u003C\u002Fp>\u003Cp>Store durable state separately from transient context, and update it only after a successful validation step. That can be as simple as a session record with current task, tool history, and last confirmed result.\u003C\u002Fp>\u003Cp>You should see the agent behave more consistently across multiple turns because the harness is now carrying the state instead of relying on the model to reconstruct everything from raw chat history. At this point, you have the basic agent equation in place: model for reasoning, harness for control.\u003C\u002Fp>\u003Ch2>Common mistakes\u003C\u002Fh2>\u003Cul>\u003Cli>\u003Cstrong>Letting the prompt do all the work.\u003C\u002Fstrong> Fix: move tool choice, schema checks, and retries into the harness so the prompt only supplies reasoning context.\u003C\u002Fli>\u003Cli>\u003Cstrong>Giving the model too much context.\u003C\u002Fstrong> Fix: pass a compact observation schema and summarize older state before each turn.\u003C\u002Fli>\u003Cli>\u003Cstrong>Trusting the first answer.\u003C\u002Fstrong> Fix: validate structure and policy before acceptance, then retry or escalate when checks fail.\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>What's next\u003C\u002Fh2>\u003Cp>Once your harness works, the next step is to add evaluation scripts, tracing, and sandboxed tool execution so you can measure reliability over time and harden the agent for production use.\u003C\u002Fp>","Harness engineering defines the control system that lets an AI agent perceive, act, and verify output.","zhuanlan.zhihu.com","https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F2036738130649330427",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779844562168-ar28.png","ai-agent","en","97bb6252-5422-45e5-ad39-8e541ce6a4ae",[17,18,19,20,21,22],"harness engineering","AI agents","prompt engineering","function calling","tool registry","output validation",[24,25,26],"Harness engineering separates model reasoning from control logic.","A good harness defines observations, actions, validation, and memory.","Validation and retries make agent behavior safer and easier to debug.",2,"2026-05-27T01:15:29.046272+00:00","2026-05-27T01:15:29.032+00:00","a9bee732-b07c-4e5b-a0e6-3048577e32a7",{"tags":32,"relatedLang":44,"relatedPosts":48},[33,35,38,40,42],{"name":19,"slug":34},"prompt-engineering",{"name":36,"slug":37},"Harness Engineering","harness-engineering",{"name":21,"slug":39},"tool-registry",{"name":20,"slug":41},"function-calling",{"name":18,"slug":43},"ai-agents",{"id":15,"slug":45,"title":46,"language":47},"how-to-build-harness-for-ai-agents-zh","如何打造 AI Agent Harness","zh",[49,55,61,67,73,79],{"id":50,"slug":51,"title":52,"cover_image":53,"image_url":53,"created_at":54,"category":13},"5efa67dd-b9f7-4a2f-8c68-3a4bc6a6b7d9","claude-code-dynamic-workflow-ai-harness-en","Claude Code 动态工作流：AI 自写 Harness","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781035372495-9czj.png","2026-06-09T20:02:22.33375+00:00",{"id":56,"slug":57,"title":58,"cover_image":59,"image_url":59,"created_at":60,"category":13},"2bd28e0e-0f4b-4987-a961-28763c1e1926","agent-orchestration-enterprise-ai-layer-en","Agent orchestration is the missing layer for enterprise AI","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780984981174-08mj.png","2026-06-09T06:02:31.384174+00:00",{"id":62,"slug":63,"title":64,"cover_image":65,"image_url":65,"created_at":66,"category":13},"95684312-23dc-4a78-a917-df14d132c5fa","ai-agents-use-blockchain-trust-layer-en","AI agents use blockchain as a trust layer","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780980506080-ki4s.png","2026-06-09T04:48:01.710214+00:00",{"id":68,"slug":69,"title":70,"cover_image":71,"image_url":71,"created_at":72,"category":13},"0208e47f-7d4c-4473-a0f9-4cd193b5c139","8-rag-patterns-demos-into-prod-en","8 RAG patterns that turn demos into prod","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780971552707-qpl7.png","2026-06-09T02:18:36.760049+00:00",{"id":74,"slug":75,"title":76,"cover_image":77,"image_url":77,"created_at":78,"category":13},"b413d484-6786-4c32-abdc-77f010ac7eba","fine-tuning-beats-rag-style-not-facts-en","Fine-tuning beats RAG when the goal is style, not facts","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780924681800-5xji.png","2026-06-08T13:17:25.701649+00:00",{"id":80,"slug":81,"title":82,"cover_image":83,"image_url":83,"created_at":84,"category":13},"57beb8b4-c233-400f-b95b-a97be1cf9d02","openclaw-small-business-ai-staff-en","OpenClaw shows how small businesses use AI staff","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780904882032-yp13.png","2026-06-08T07:47:27.730921+00:00",[86,91,96,101,106,111,116,121,126,131],{"id":87,"slug":88,"title":89,"created_at":90},"03db8de8-8dc2-4ac1-9cf7-898782efbb1f","anthropic-claude-ai-agent-task-automation-en","Anthropic's Claude AI Agent: A New Era of Task Automation","2026-03-25T16:25:06.513026+00:00",{"id":92,"slug":93,"title":94,"created_at":95},"045d1abc-190d-4594-8c95-91e2a26f0c5a","googles-2026-ai-agent-report-decoded-en","Google’s 2026 AI Agent Report, Decoded","2026-03-26T11:15:23.046616+00:00",{"id":97,"slug":98,"title":99,"created_at":100},"e64aba21-254b-4f93-aa21-837484bb52ec","kimi-k25-review-stronger-still-not-legend-en","Kimi K2.5 review: stronger, still not a legend","2026-03-27T07:15:55.385951+00:00",{"id":102,"slug":103,"title":104,"created_at":105},"30dfb781-a1b2-4add-aebe-b3df40247c37","claude-code-controls-mac-desktop-en","Claude Code now controls your Mac desktop","2026-03-28T03:01:59.384091+00:00",{"id":107,"slug":108,"title":109,"created_at":110},"254405b6-7833-4800-8e13-f5196deefbe6","cloudflare-100x-faster-ai-agent-sandbox-en","Cloudflare’s 100x Faster AI Agent Sandbox","2026-03-28T03:09:44.356437+00:00",{"id":112,"slug":113,"title":114,"created_at":115},"04f29b7f-9b91-4306-89a7-97d725e6e1ba","openai-backs-isara-agent-swarm-bet-en","OpenAI backs Isara’s agent-swarm bet","2026-03-28T03:15:27.849766+00:00",{"id":117,"slug":118,"title":119,"created_at":120},"3b0bf479-e4ae-4703-9666-721a7e0cdb91","openai-plan-automated-ai-researcher-en","OpenAI’s plan for an automated AI researcher","2026-03-28T03:17:42.312819+00:00",{"id":122,"slug":123,"title":124,"created_at":125},"fe91bce0-b85d-4efa-a207-24ae9939c29f","harness-engineering-ai-agent-reliability-2026","Harness Engineering: From Bridle to Operating System, The Missing Link in AI Agent Reliability","2026-03-31T06:36:55.648751+00:00",{"id":127,"slug":128,"title":129,"created_at":130},"7a09007d-820f-43b3-8607-8ad1bfcb94c8","mcp-explained-from-prompts-to-production-en","MCP Explained: From Prompts to Production","2026-04-01T09:24:40.089177+00:00",{"id":132,"slug":133,"title":134,"created_at":135},"116d5ee9-a4f1-4b5a-aac5-5d035dd22bbe","amazon-bedrock-agents-multi-agent-workflows-en","Amazon Bedrock Agents Gets Multi-Agent Workflows","2026-04-01T09:30:30.197685+00:00"]