[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-agent-harness-ai-engineering-2026-en":3,"tags-agent-harness-ai-engineering-2026-en":30,"related-lang-agent-harness-ai-engineering-2026-en":41,"related-posts-agent-harness-ai-engineering-2026-en":45,"series-industry-a9b22aa6-768c-44b1-967a-1b4ea3c28ce9":82},{"id":4,"title":5,"content":6,"summary":7,"source":8,"source_url":9,"author":10,"image_url":11,"keywords":12,"language":18,"translated_content":10,"views":19,"is_premium":20,"created_at":21,"updated_at":21,"cover_image":11,"published_at":22,"rewrite_status":23,"rewrite_error":10,"rewritten_from_id":24,"slug":25,"category":26,"related_article_id":27,"status":28,"google_indexed_at":29,"x_posted_at":10,"tweet_text":10,"title_rewritten_at":10,"title_original":10,"key_takeaways":10,"topic_cluster_id":10,"embedding":10,"is_canonical_seed":20},"a9b22aa6-768c-44b1-967a-1b4ea3c28ce9","Agent Harness Is Quietly Defining AI Engineering","\u003Cp>In February 2026, Martin Fowler put a name on something AI teams had already been building in pieces: \u003Ca href=\"https:\u002F\u002Fmartinfowler.com\u002F\" target=\"_blank\" rel=\"noopener\">Harness Engineering\u003C\u002Fa>. Around the same time, \u003Ca href=\"https:\u002F\u002Fwww.anthropic.com\u002F\" target=\"_blank\" rel=\"noopener\">Anthropic\u003C\u002Fa> published its guide to effective harnesses for long-running agents, and \u003Ca href=\"https:\u002F\u002Fopenai.com\u002F\" target=\"_blank\" rel=\"noopener\">OpenAI\u003C\u002Fa> said its Codex team had generated more than 1 million lines of production code with zero manual input. The common thread is simple: the model matters, but the system around it decides whether the work holds up.\u003C\u002Fp>\u003Cp>If you build with agents, this is the part worth paying attention to. A good \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fanthropics\u002Fclaude-code\" target=\"_blank\" rel=\"noopener\">Claude Code\u003C\u002Fa>-style workflow or a custom agent stack can make a capable model feel dependable, while a weak wrapper can turn a strong model into an expensive source of retries, hallucinations, and half-finished tasks.\u003C\u002Fp>\u003Cp>That gap is why “agent harness” is becoming one of the most useful phrases in AI engineering. It describes the scaffolding that keeps an agent on task: memory, tools, checkpoints, retries, evaluation, permissioning, and recovery when the model drifts.\u003C\u002Fp>\u003Ch2>What an agent harness actually is\u003C\u002Fh2>\u003Cp>An agent harness is the control layer around an LLM agent. It is the code that decides what the agent can see, what it can do, when it should pause, and how it should recover after a mistake. Think of the model as the reasoning engine and the harness as the \u003Ca href=\"\u002Fnews\u002Fharness-engineering-ai-agent-reliability-2026\">operating system\u003C\u002Fa> around it.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775057900119-b2du.png\" alt=\"Agent Harness Is Quietly Defining AI Engineering\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>That distinction matters because raw model output is rarely enough for production work. A model can draft code, summarize a document, or plan a task, but a harness turns those outputs into a repeatable workflow with guardrails and feedback loops.\u003C\u002Fp>\u003Cp>In practice, the harness often includes a few recurring pieces:\u003C\u002Fp>\u003Cul>\u003Cli>Tool calling for file access, search, API requests, or code execution\u003C\u002Fli>\u003Cli>State management so the agent remembers task progress across steps\u003C\u002Fli>\u003Cli>Validation checkpoints that test output before the agent continues\u003C\u002Fli>\u003Cli>Retry logic for failed actions, timeouts, and partial tool errors\u003C\u002Fli>\u003Cli>Permission controls that limit risky actions in production environments\u003C\u002Fli>\u003Cli>Logging and traces that help engineers inspect every step later\u003C\u002Fli>\u003C\u002Ful>\u003Cp>This is why a polished agent demo can be misleading. The demo usually hides the messy parts: tool failures, context loss, and the need to stop an agent before it goes off-script. The harness is where those problems get handled.\u003C\u002Fp>\u003Cp>Martin Fowler’s framing matters because he has spent decades describing how software systems fail in the real world. When someone like that coins a term, it usually means the industry has moved from experimentation to engineering discipline.\u003C\u002Fp>\u003Ch2>Why the model is only half the story\u003C\u002Fh2>\u003Cp>People still talk about AI as if better models automatically mean better products. That is true in a narrow sense, but it misses the operational reality. A model can score higher on benchmarks and still perform badly in a long-running task if it lacks the right controls.\u003C\u002Fp>\u003Cp>Anthropic’s work on long-running agents makes this point clearly. Long tasks create more opportunities for drift, forgetting, and accidental side effects. A harness has to keep the agent oriented, especially when the task spans many tool calls or depends on external systems.\u003C\u002Fp>\u003Cp>OpenAI’s Codex example is useful because it shows scale. More than 1 million lines of production code is not a toy benchmark; it is evidence that the surrounding workflow can absorb a lot of real engineering work if the execution layer is disciplined enough.\u003C\u002Fp>\u003Cblockquote>“The most important thing is to be able to understand what the model is doing.” — Dario Amodei, Anthropic co-founder and CEO, in a 2023 interview with Lex Fridman\u003C\u002Fblockquote>\u003Cp>That quote gets to the heart of harness design. If you cannot inspect, constrain, and explain agent behavior, you do not have an engineering system. You have a probabilistic black box with a UI.\u003C\u002Fp>\u003Cp>The companies building serious agent products are converging on the same lesson: reliability comes from observability, tool discipline, and recovery paths, not from hoping the model behaves itself.\u003C\u002Fp>\u003Ch2>What the best harnesses include today\u003C\u002Fh2>\u003Cp>There is no single standard implementation yet, but the strongest agent harnesses share a familiar structure. They are less about one clever prompt and more about a stack of small controls that make the agent predictable enough to trust.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775057917887-xvvn.png\" alt=\"Agent Harness Is Quietly Defining AI Engineering\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>Here is the practical comparison:\u003C\u002Fp>\u003Cul>\u003Cli>\u003Cstrong>Basic chat wrapper:\u003C\u002Fstrong> one prompt, one response, little state, little control, and high variance\u003C\u002Fli>\u003Cli>\u003Cstrong>Task agent:\u003C\u002Fstrong> tool access, short-term memory, and some retry logic, good for bounded workflows\u003C\u002Fli>\u003Cli>\u003Cstrong>Production harness:\u003C\u002Fstrong> validation gates, audit logs, policy checks, sandboxed execution, and rollback paths\u003C\u002Fli>\u003Cli>\u003Cstrong>Long-running agent system:\u003C\u002Fstrong> persistent state, evaluation loops, human approval steps, and recovery from partial failure\u003C\u002Fli>\u003C\u002Ful>\u003Cp>The jump from the first line to the last is huge. A chat wrapper can be built in an afternoon. A production harness takes real engineering work because every tool call creates a new failure mode.\u003C\u002Fp>\u003Cp>That is also why teams are starting to measure agent systems in operational terms instead of model terms alone. They track task completion rate, tool error rate, time to recovery, number of unsafe actions blocked, and how often a human had to step in.\u003C\u002Fp>\u003Cp>Those metrics matter more than flashy benchmark scores when the agent is touching codebases, support systems, or customer data.\u003C\u002Fp>\u003Cp>There is also a cultural shift here. In the early wave of AI products, the model was the product. In the harness era, the product is the workflow: what the agent can do, what it is forbidden to do, and how quickly it can recover when the world gets messy.\u003C\u002Fp>\u003Ch2>What this means for builders in 2026\u003C\u002Fh2>\u003Cp>If you are building with agents this year, the right question is not “Which model should I use?” It is “What harness do I need around this model to make the task safe, inspectable, and repeatable?”\u003C\u002Fp>\u003Cp>That question changes architecture decisions. You may choose a smaller model with a stronger harness over a larger model with weak controls. You may add a sandbox, a planner-executor split, a verifier, or a human approval step before shipping anything that can change state.\u003C\u002Fp>\u003Cp>For teams already experimenting with \u003Ca href=\"\u002Fnews\u002Fclaude-code-vs-cursor-agent-workflows\" target=\"_blank\" rel=\"noopener\">agentic coding workflows\u003C\u002Fa>, the next step is to stop treating the agent as a clever assistant and start treating it like an unreliable junior engineer that needs process, tests, and supervision.\u003C\u002Fp>\u003Cp>The companies that win here will probably not be the ones with the fanciest prompts. They will be the ones that build clean execution loops, strong observability, and tight permissions around their agents. That is the real shape of AI engineering in 2026.\u003C\u002Fp>\u003Cp>My bet is that within a year, “agent harness” will be a normal line item in architecture reviews, right next to auth, logging, and testing. The interesting question is which teams will treat it as optional until the first expensive failure forces the lesson home.\u003C\u002Fp>","Martin Fowler, Anthropic, and OpenAI are all pointing to the same idea: agent reliability depends on the system around the model.","zhuanlan.zhihu.com","https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F2022027288405976801",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775057900119-b2du.png",[13,14,15,16,17],"agent harness","AI engineering","Anthropic","OpenAI","Martin Fowler","en",0,false,"2026-04-01T10:15:35.076717+00:00","2026-04-01T10:15:35.049+00:00","done","0f1c5784-ada0-4ee1-8293-a836bf5bfa26","agent-harness-ai-engineering-2026-en","industry","bc3cc36d-ee23-4731-8583-3517df995e09","published","2026-04-09T09:00:53.953+00:00",[31,33,35,37,39],{"name":14,"slug":32},"ai-engineering",{"name":16,"slug":34},"openai",{"name":13,"slug":36},"agent-harness",{"name":15,"slug":38},"anthropic",{"name":17,"slug":40},"martin-fowler",{"id":27,"slug":42,"title":43,"language":44},"agent-harness-ai-engineering-2026-zh","Agent Harness 正在定義 AI 工程","zh",[46,52,58,64,70,76],{"id":47,"slug":48,"title":49,"cover_image":50,"image_url":50,"created_at":51,"category":26},"6ff3920d-c8ea-4cf3-8543-9cf9efc3fe36","circles-agent-stack-targets-machine-speed-payments-en","Circle’s Agent Stack targets machine-speed payments","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778871659638-hur1.png","2026-05-15T19:00:44.756112+00:00",{"id":53,"slug":54,"title":55,"cover_image":56,"image_url":56,"created_at":57,"category":26},"1270e2f4-6f3b-4772-9075-87c54b07a8d1","iren-signs-nvidia-ai-infrastructure-pact-en","IREN signs Nvidia AI infrastructure pact","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778871059665-3vhi.png","2026-05-15T18:50:38.162691+00:00",{"id":59,"slug":60,"title":61,"cover_image":62,"image_url":62,"created_at":63,"category":26},"b308c85e-ee9c-4de6-b702-dfad6d8da36f","circle-agent-stack-ai-payments-en","Circle launches Agent Stack for AI payments","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778870450891-zv1j.png","2026-05-15T18:40:31.462625+00:00",{"id":65,"slug":66,"title":67,"cover_image":68,"image_url":68,"created_at":69,"category":26},"f7028083-46ba-493b-a3db-dd6616a8c21f","why-nebius-ai-pivot-is-more-real-than-hype-en","Why Nebius’s AI Pivot Is More Real Than Hype","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778823055711-tbfv.png","2026-05-15T05:30:26.829489+00:00",{"id":71,"slug":72,"title":73,"cover_image":74,"image_url":74,"created_at":75,"category":26},"b63692ed-db6a-4dbd-b771-e1babdc94af7","nvidia-backs-corning-factories-with-billions-en","Nvidia backs Corning factories with billions","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778822444685-tvx6.png","2026-05-15T05:20:28.914908+00:00",{"id":77,"slug":78,"title":79,"cover_image":80,"image_url":80,"created_at":81,"category":26},"26ab4480-2476-4ec7-b43a-5d46def6487e","why-anthropic-gates-foundation-ai-public-goods-en","Why Anthropic and the Gates Foundation should fund AI public goods","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778796645685-wbw0.png","2026-05-14T22:10:22.60302+00:00",[83,88,93,98,103,108,113,118,123,128],{"id":84,"slug":85,"title":86,"created_at":87},"d35a1bd9-e709-412e-a2df-392df1dc572a","ai-impact-2026-developments-market-en","AI's Impact in 2026: Key Developments and Market Shifts","2026-03-25T16:20:33.205823+00:00",{"id":89,"slug":90,"title":91,"created_at":92},"5ed27921-5fd6-492e-8c59-78393bf37710","trumps-ai-legislative-framework-en","Trump's AI Legislative Framework: What's Inside?","2026-03-25T16:22:20.005325+00:00",{"id":94,"slug":95,"title":96,"created_at":97},"e454a642-f03c-4794-b185-5f651aebbaca","nvidia-gtc-2026-key-highlights-innovations-en","NVIDIA GTC 2026: Key Highlights and Innovations","2026-03-25T16:22:47.882615+00:00",{"id":99,"slug":100,"title":101,"created_at":102},"0ebb5b16-774a-4922-945d-5f2ce1df5a6d","claude-usage-diversifies-learning-curves-en","Claude Usage Diversifies, Learning Curves Emerge","2026-03-25T16:25:50.770376+00:00",{"id":104,"slug":105,"title":106,"created_at":107},"69934e86-2fc5-4280-8223-7b917a48ace8","openclaw-ai-commoditization-concerns-en","OpenClaw's Rise Raises Concerns of AI Model Commoditization","2026-03-25T16:26:30.582047+00:00",{"id":109,"slug":110,"title":111,"created_at":112},"b4b2575b-2ac8-46b2-b90e-ab1d7c060797","google-gemini-ai-rollout-2026-en","Google's Gemini AI Rollout Extended to 2026","2026-03-25T16:28:14.808842+00:00",{"id":114,"slug":115,"title":116,"created_at":117},"6e18bc65-42ae-4ad0-b564-67d7f66b979e","meta-llama4-fabricated-results-scandal-en","Meta's Llama 4 Scandal: Fabricated AI Test Results Unveiled","2026-03-25T16:29:15.482836+00:00",{"id":119,"slug":120,"title":121,"created_at":122},"bf888e9d-08be-4f47-996c-7b24b5ab3500","accenture-mistral-ai-deployment-en","Accenture and Mistral AI Team Up for AI Deployment","2026-03-25T16:31:01.894655+00:00",{"id":124,"slug":125,"title":126,"created_at":127},"5382b536-fad2-49c6-ac85-9eb2bae49f35","mistral-ai-high-stakes-2026-en","Mistral AI: Facing High Stakes in 2026","2026-03-25T16:31:39.941974+00:00",{"id":129,"slug":130,"title":131,"created_at":132},"9da3d2d6-b669-4971-ba1d-17fdb3548ed5","cursors-meteoric-rise-pressures-en","Cursor's Meteoric Rise Faces Industry Pressures","2026-03-25T16:32:21.899217+00:00"]