[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-harness-engineering-ai-agent-reliability-2026":3,"article-related-harness-engineering-ai-agent-reliability-2026":24,"series-ai-agent-fe91bce0-b85d-4efa-a207-24ae9939c29f":77},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":10,"image_url":10,"cover_image":11,"category":12,"language":13,"translated_content":10,"related_article_id":14,"keywords":15,"key_takeaways":10,"views":21,"created_at":22,"published_at":23,"topic_cluster_id":10},"fe91bce0-b85d-4efa-a207-24ae9939c29f","harness-engineering-ai-agent-reliability-2026","Harness Engineering: From Bridle to Operating System, The Missing Link in AI Agent Reliability","\u003Ch2>The Problem: Why GPT-5 Still Fails at Simple Tasks\u003C\u002Fh2>\n\u003Cp>You&#39;ve probably noticed something strange: the most powerful AI models sometimes fail spectacularly at tasks they should ace.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1774940036776-4v7i.png\" alt=\"Harness Engineering: From Bridle to Operating System, The Missing Link in AI Agent Reliability\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\n\u003Cp>In August 2025, OpenAI&#39;s internal team started an ambitious experiment: let a Codex Agent build a production application from scratch, on a blank repository. The constraint was radical: \u003Cstrong>zero lines of manually written code\u003C\u002Fstrong>. The result? \u003Cstrong>Over 1 million lines of code in five months with a team of seven engineers\u003C\u002Fstrong>—averaging 3.5 merged pull requests per engineer per day. Productivity increased as the team grew (opposite of what usually happens).\u003C\u002Fp>\n\u003Cp>But this success wasn&#39;t built on a smarter model. It was built on something invisible: \u003Cstrong>the infrastructure surrounding the Agent\u003C\u002Fstrong>.\u003C\u002Fp>\n\u003Cp>This is the story of Harness Engineering.\u003C\u002Fp>\n\u003Chr>\n\u003Ch2>What Is Harness Engineering?\u003C\u002Fh2>\n\u003Cp>Harness Engineering is the discipline of designing the external control and execution framework for AI Agents. If an AI model is a horse, a Harness is the reins, saddle, and entire system of horsemanship—it determines where the horse goes, what it can touch, and how it recovers from panic.\u003C\u002Fp>\n\u003Ch3>The Term&#39;s Origin\u003C\u002Fh3>\n\u003Cp>The concept was formally named by Mitchell Hashimoto, co-founder of HashiCorp, in February 2026. In his article &quot;My AI Adoption Journey,&quot; Hashimoto crystallized a key insight under the section &quot;Engineer the Harness&quot;:\u003C\u002Fp>\n\u003Cblockquote>\n\u003Cp>&quot;Every time the agent makes a mistake, don&#39;t hope it does better next time. Engineer the environment so it can&#39;t make that specific mistake the same way again.&quot;\u003C\u002Fp>\n\u003C\u002Fblockquote>\n\u003Cp>This simple principle ignited a field. Weeks later, OpenAI released detailed research on Harness Engineering. Anthropic built it into \u003Ca href=\"\u002Ftag\u002Fclaude-code\">Claude Code\u003C\u002Fa>&#39;s architecture. Google DeepMind applied it to AlphaCode 2.\u003C\u002Fp>\n\u003Ch3>Why &quot;Harness&quot;?\u003C\u002Fh3>\n\u003Cp>The term comes from horsemanship—a harness is the equipment connecting rider to horse. The metaphor is surprisingly precise:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>\u003Cstrong>Horse = Large Language Model\u003C\u002Fstrong> — Raw power, unpredictable behavior\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Rider = Developer or User\u003C\u002Fstrong> — Wants to direct and control\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Harness = Harness Engineering\u003C\u002Fstrong> — Makes control possible\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>Without a harness, no cart moves, no matter how strong the horse. Without Harness Engineering, no Agent stays reliable in production, no matter how intelligent.\u003C\u002Fp>\n\u003Chr>\n\u003Ch2>Three Ages of AI Engineering: From Prompt to Harness\u003C\u002Fh2>\n\u003Cp>The past three years saw AI engineering evolve through three distinct eras. Understanding this progression is essential to understanding why Harness Engineering dominates 2026.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1774940054035-r1ta.png\" alt=\"Harness Engineering: From Bridle to Operating System, The Missing Link in AI Agent Reliability\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\n\u003Ch3>Age One: Prompt Engineering (2023–2024)\u003C\u002Fh3>\n\u003Cp>\u003Cstrong>Defining characteristic\u003C\u002Fstrong>: Magic incantations\u003C\u002Fp>\n\u003Cp>In the early ChatGPT days, developers obsessed over prompting. The logic: \u003Cstrong>write smarter instructions, extract more intelligence from the model\u003C\u002Fstrong>.\u003C\u002Fp>\n\u003Cp>Classic techniques:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>&quot;Let&#39;s think step by step…&quot;\u003C\u002Fli>\n\u003Cli>&quot;You are a senior software engineer…&quot;\u003C\u002Fli>\n\u003Cli>&quot;Output JSON format…&quot;\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>These worked, but hit a ceiling. For complex, multi-step tasks, Prompt Engineering&#39;s limitations surfaced:\u003C\u002Fp>\n\u003Col>\n\u003Cli>\u003Cstrong>Context Window Curse\u003C\u002Fstrong> — Your detailed prompt competes with the actual work for token space\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Magic Numbers\u003C\u002Fstrong> — A prompt that works for you fails for someone else\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Zero Learning\u003C\u002Fstrong> — Each failure resets; the agent learns nothing\u003C\u002Fli>\n\u003C\u002Fol>\n\u003Ch3>Age Two: Context Engineering (2024–2025)\u003C\u002Fh3>\n\u003Cp>\u003Cstrong>Defining characteristic\u003C\u002Fstrong>: Dynamic knowledge management\u003C\u002Fp>\n\u003Cp>In 2024, Hugging Face&#39;s Philipp Schmid published &quot;The New Skill in AI is Not Prompting, It&#39;s Context Engineering.&quot; It changed the game.\u003C\u002Fp>\n\u003Cp>Core insight: \u003Cstrong>Most agent failures aren&#39;t model failures, they&#39;re context failures.\u003C\u002Fstrong>\u003C\u002Fp>\n\u003Cp>Context Engineering meant:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>\u003Cstrong>Dynamic context assembly\u003C\u002Fstrong> — Assemble relevant information on-demand, not static prompts\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Knowledge base optimization\u003C\u002Fstrong> — Build searchable documentation, code structure, API references the agent can query\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Tool discovery\u003C\u002Fstrong> — Agents don&#39;t just know tools exist; they know when and why to use them\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>By mid-2025, Context Engineering was standard at LangChain, OpenAI, and Anthropic. But teams hit a new bottleneck: \u003Cstrong>Good context wasn&#39;t enough.\u003C\u002Fstrong>\u003C\u002Fp>\n\u003Cp>Agents could know what to do but still lose control in complex workflows. Why?\u003C\u002Fp>\n\u003Ch3>Age Three: Harness Engineering (2026+)\u003C\u002Fh3>\n\u003Cp>\u003Cstrong>Defining characteristic\u003C\u002Fstrong>: External control infrastructure\u003C\u002Fp>\n\u003Cp>Harness Engineering answers: \u003Cstrong>We don&#39;t just give the agent more information; we give it a bounded, predictable, recoverable execution environment.\u003C\u002Fstrong>\u003C\u002Fp>\n\u003Cp>This isn&#39;t better prompting. This isn&#39;t smarter context. This is \u003Cstrong>rearchitecting the entire system\u003C\u002Fstrong>.\u003C\u002Fp>\n\u003Cp>The progression:\u003C\u002Fp>\n\u003Cpre>\u003Ccode>Prompt Engineering\n    ↓\n  &quot;Write better magic incantations&quot;\n    ↓\n  Fails: Limited context window\n    ↓\nContext Engineering\n    ↓\n  &quot;Dynamically assemble more relevant information&quot;\n    ↓\n  Fails: Agent still loses control in complex workflows\n    ↓\nHarness Engineering\n    ↓\n  &quot;Design the environment so the agent can&#39;t fail that way&quot;\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Chr>\n\u003Ch2>The Operating System Metaphor: More Precise Than Bridles\u003C\u002Fh2>\n\u003Cp>Though the &quot;harness&quot; metaphor is vivid, Schmid&#39;s &quot;operating system&quot; analogy captures the essence better.\u003C\u002Fp>\n\u003Ch3>Four-Layer Compute Stack\u003C\u002Fh3>\n\u003Ctable>\n\u003Cthead>\n\u003Ctr>\n\u003Cth>Layer\u003C\u002Fth>\n\u003Cth>Traditional Computing\u003C\u002Fth>\n\u003Cth>AI Agent System\u003C\u002Fth>\n\u003Cth>Role\u003C\u002Fth>\n\u003C\u002Ftr>\n\u003C\u002Fthead>\n\u003Ctbody>\u003Ctr>\n\u003Ctd>\u003Cstrong>Application\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>Word processors, games, browsers\u003C\u002Ftd>\n\u003Ctd>Concrete agent tasks (e.g., &quot;write tests&quot;)\u003C\u002Ftd>\n\u003Ctd>End user directly uses\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>\u003Cstrong>Operating System\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>Windows, Linux, macOS\u003C\u002Ftd>\n\u003Ctd>\u003Cstrong>Harness Engineering\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>Manages resources, enforces control\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>\u003Cstrong>RAM\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>8GB, 16GB physical memory\u003C\u002Ftd>\n\u003Ctd>\u003Cstrong>Context Window\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>Limited working space\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>\u003Cstrong>CPU\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>Intel, AMD processors\u003C\u002Ftd>\n\u003Ctd>\u003Cstrong>Large Language Model\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>Raw computational power\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\u003Ch3>Why the OS Metaphor Is More Accurate\u003C\u002Fh3>\n\u003Cp>A modern OS isn&#39;t just &quot;make CPU faster.&quot; It:\u003C\u002Fp>\n\u003Col>\n\u003Cli>\u003Cp>\u003Cstrong>Manages Memory\u003C\u002Fstrong> — Runs huge applications in limited RAM\u003C\u002Fp>\n\u003Cul>\n\u003Cli>AI analogy: Handle complex tasks in limited context windows\u003C\u002Fli>\n\u003C\u002Ful>\n\u003C\u002Fli>\n\u003Cli>\u003Cp>\u003Cstrong>Schedules Processes\u003C\u002Fstrong> — Decides which task runs when\u003C\u002Fp>\n\u003Cul>\n\u003Cli>AI analogy: Decompose work into sub-tasks, sequence execution\u003C\u002Fli>\n\u003C\u002Ful>\n\u003C\u002Fli>\n\u003Cli>\u003Cp>\u003Cstrong>Provides Drivers\u003C\u002Fstrong> — Standardizes software-hardware interaction\u003C\u002Fp>\n\u003Cul>\n\u003Cli>AI analogy: Standardizes agent-to-tool, agent-to-API communication\u003C\u002Fli>\n\u003C\u002Ful>\n\u003C\u002Fli>\n\u003Cli>\u003Cp>\u003Cstrong>Enforces Permissions\u003C\u002Fstrong> — Prevents apps from causing damage\u003C\u002Fp>\n\u003Cul>\n\u003Cli>AI analogy: Restrict agent actions to safe operating bounds\u003C\u002Fli>\n\u003C\u002Ful>\n\u003C\u002Fli>\n\u003Cli>\u003Cp>\u003Cstrong>Recovers from Crashes\u003C\u002Fstrong> — Returns to consistent state on failure\u003C\u002Fp>\n\u003Cul>\n\u003Cli>AI analogy: Detect when agent loops or makes bad decisions, recover\u003C\u002Fli>\n\u003C\u002Ful>\n\u003C\u002Fli>\n\u003C\u002Fol>\n\u003Cp>The harness metaphor tells you &quot;control.&quot; The OS metaphor tells you &quot;control, manage, optimize, recover&quot;—the complete picture.\u003C\u002Fp>\n\u003Chr>\n\u003Ch2>Three Cornerstone Implementations\u003C\u002Fh2>\n\u003Cp>Theory matters, but how does Harness Engineering work in practice? Three case studies show different approaches.\u003C\u002Fp>\n\u003Ch3>OpenAI: Seven Engineers × One Million Lines of Code\u003C\u002Fh3>\n\u003Cp>\u003Cstrong>Timeline\u003C\u002Fstrong>: August 2025 – January 2026\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Goal\u003C\u002Fstrong>: Build a production application using only Codex Agents on a blank repository\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Outcome\u003C\u002Fstrong>:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Over \u003Cstrong>1 million lines of code\u003C\u002Fstrong>\u003C\u002Fli>\n\u003Cli>\u003Cstrong>1,500+ pull requests\u003C\u002Fstrong> merged\u003C\u002Fli>\n\u003Cli>\u003Cstrong>7 engineers\u003C\u002Fstrong> (scaled from 3)\u003C\u002Fli>\n\u003Cli>\u003Cstrong>3.5 PR\u002Fengineer\u002Fday\u003C\u002Fstrong> average throughput\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Throughput increased\u003C\u002Fstrong> as team grew (unusual)\u003C\u002Fli>\n\u003Cli>\u003Cstrong>One-tenth the time\u003C\u002Fstrong> compared to manual coding\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>The radical constraint: \u003Cstrong>zero manually written code\u003C\u002Fstrong>.\u003C\u002Fp>\n\u003Ch4>OpenAI&#39;s Four-Pillar Harness\u003C\u002Fh4>\n\u003Cp>Based on OpenAI&#39;s published report, their harness consists of:\u003C\u002Fp>\n\u003Ch5>1. Context Engineering: Continuously Enhanced Knowledge Base\u003C\u002Fh5>\n\u003Cp>OpenAI built a &quot;continuously enhanced knowledge base in the codebase, plus agent access to dynamic context like observability data and browser navigation.&quot;\u003C\u002Fp>\n\u003Cp>Not static documentation. Rather:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>\u003Cstrong>Architecture documentation\u003C\u002Fstrong> — When new modules are created, the Harness enforces documentation updates\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Searchable tool index\u003C\u002Fstrong> — Tools with usage examples, not just names\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Observability integration\u003C\u002Fstrong> — Agents query logs from previous agent runs, learning from failures\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Ch5>2. Architectural Constraints: LLM + Deterministic Dual Verification\u003C\u002Fh5>\n\u003Cp>The most innovative part: OpenAI uses \u003Cstrong>both LLMs and traditional linters\u003C\u002Fstrong>.\u003C\u002Fp>\n\u003Cul>\n\u003Cli>\u003Cstrong>LLM layer\u003C\u002Fstrong> — Agent reviews its own code for logical correctness\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Deterministic layer\u003C\u002Fstrong> — Custom linters and structural tests enforce style, module boundaries, naming conventions\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>Why dual? Because LLMs sometimes miss things. Deterministic checks don&#39;t.\u003C\u002Fp>\n\u003Ch5>3. Garbage Collection: The Entropy War\u003C\u002Fh5>\n\u003Cp>Even with good Harness, agent-generated code accumulates debt:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Dead code\u003C\u002Fli>\n\u003Cli>Unnecessary files\u003C\u002Fli>\n\u003Cli>Stale comments\u003C\u002Fli>\n\u003Cli>Architectural violations\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>OpenAI&#39;s solution: \u003Cstrong>Run cleanup agents periodically\u003C\u002Fstrong>, whose sole job is finding inconsistencies and fixing them. This is garbage collection.\u003C\u002Fp>\n\u003Ch5>4. Feedback Loop: Failure → Signal → Improvement\u003C\u002Fh5>\n\u003Cp>OpenAI&#39;s most important philosophy:\u003C\u002Fp>\n\u003Cblockquote>\n\u003Cp>&quot;When the agent struggles, we treat it as a signal: identify what is missing—tools, guardrails, documentation—and feed it back into the repository.&quot;\u003C\u002Fp>\n\u003C\u002Fblockquote>\n\u003Cp>Not &quot;hope the agent does better.&quot; But &quot;identify system defects and repair the system.&quot;\u003C\u002Fp>\n\u003Ch3>Anthropic: Generator-Evaluator Separation Architecture\u003C\u002Fh3>\n\u003Cp>\u003Cstrong>Approach\u003C\u002Fstrong>: Multi-agent collaboration, not single superhuman agent\u003C\u002Fp>\n\u003Cp>Anthropic&#39;s Harness in Claude Code uses a different pattern: \u003Cstrong>specialized agent teams\u003C\u002Fstrong>.\u003C\u002Fp>\n\u003Ch4>Three-Tier Architecture\u003C\u002Fh4>\n\u003Col>\n\u003Cli>\u003Cp>\u003Cstrong>Orchestrator Agent\u003C\u002Fstrong> (Leadership tier)\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Runs the smartest model (Claude Opus 4.5)\u003C\u002Fli>\n\u003Cli>Analyzes user request\u003C\u002Fli>\n\u003Cli>Decomposes into sub-tasks\u003C\u002Fli>\n\u003Cli>Coordinates execution order\u003C\u002Fli>\n\u003C\u002Ful>\n\u003C\u002Fli>\n\u003Cli>\u003Cp>\u003Cstrong>Specialist Sub-agents\u003C\u002Fstrong> (Execution tier)\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Run faster, cheaper models (Claude Sonnet 4, Haiku 4.5)\u003C\u002Fli>\n\u003Cli>Execute tasks in parallel\u003C\u002Fli>\n\u003Cli>Example: one agent writes code, another writes tests, another writes documentation\u003C\u002Fli>\n\u003C\u002Ful>\n\u003C\u002Fli>\n\u003Cli>\u003Cp>\u003Cstrong>Verification Agent\u003C\u002Fstrong> (Validation tier)\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Reviews all outputs\u003C\u002Fli>\n\u003Cli>Checks code correctness, documentation completeness, test coverage\u003C\u002Fli>\n\u003Cli>Elevates quality before returning to user\u003C\u002Fli>\n\u003C\u002Ful>\n\u003C\u002Fli>\n\u003C\u002Fol>\n\u003Ch4>Why Separation Works\u003C\u002Fh4>\n\u003Cp>\u003Cstrong>Performance Improvement\u003C\u002Fstrong>: Internal evaluations show this architecture outperforms a single Claude Opus 4 by \u003Cstrong>90.2%\u003C\u002Fstrong>.\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Why\u003C\u002Fstrong>:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>\u003Cstrong>Parallelism\u003C\u002Fstrong> — Multiple agents work simultaneously without blocking each other\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Specialization\u003C\u002Fstrong> — Each agent optimizes for specific tasks vs. being a generalist\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Recoverability\u003C\u002Fstrong> — One sub-agent&#39;s failure doesn&#39;t cascade; Orchestrator reroutes\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>Claude Code is the public implementation of this Harness. When you code in Claude Code, you&#39;re not interacting with one agent—you&#39;re orchestrating a team.\u003C\u002Fp>\n\u003Ch3>Google DeepMind: Iterative Verification Loop (AlphaCode 2)\u003C\u002Fh3>\n\u003Cp>Google DeepMind emphasizes \u003Cstrong>iterative refinement\u003C\u002Fstrong>, not single-pass generation.\u003C\u002Fp>\n\u003Cp>While Google hasn&#39;t published a detailed &quot;Generator-Verifier-Reviser&quot; paper, their AlphaCode 2 practice embodies Harness Engineering&#39;s core:\u003C\u002Fp>\n\u003Ch4>Three-Stage Loop\u003C\u002Fh4>\n\u003Col>\n\u003Cli>\u003Cstrong>Generator\u003C\u002Fstrong> — Generate multiple code candidates (typically &gt;100)\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Verifier\u003C\u002Fstrong> — Test candidates on test cases, eliminate failures\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Reviser\u003C\u002Fstrong> — Refine verified candidates\u003C\u002Fli>\n\u003C\u002Fol>\n\u003Cp>Not linear &quot;write once, submit.&quot; Rather \u003Cstrong>cyclical\u003C\u002Fstrong>: Generator sees Verifier feedback and regenerates better candidates.\u003C\u002Fp>\n\u003Ch4>CodeContests Performance\u003C\u002Fh4>\n\u003Cp>Using AlphaCode 2, Google DeepMind ranked in the \u003Cstrong>top 15% of human programmers\u003C\u002Fstrong> on CodeContests. This exceeded GPT-4 and Claude Opus single-generation performance.\u003C\u002Fp>\n\u003Cp>Where&#39;s the difference? \u003Cstrong>The Harness\u003C\u002Fstrong>—the verification and revision system surrounding the generator.\u003C\u002Fp>\n\u003Chr>\n\u003Ch2>The Counterintuitive Lesson: Why Vercel Deleted 80% of Tools\u003C\u002Fh2>\n\u003Cp>In February 2026, Vercel published a confusing article: &quot;We Removed 80% of Our Agent&#39;s Tools.&quot; Result? Performance improved.\u003C\u002Fp>\n\u003Ch3>The Setup\u003C\u002Fh3>\n\u003Cp>Vercel built a text-to-SQL Agent for Vercel Data Platform. The initial version had many carefully designed tools:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>SQL query executor\u003C\u002Fli>\n\u003Cli>Database schema checker\u003C\u002Fli>\n\u003Cli>Table statistics tool\u003C\u002Fli>\n\u003Cli>Custom Vercel API wrappers\u003C\u002Fli>\n\u003Cli>Error handling utilities\u003C\u002Fli>\n\u003Cli>Plus many more\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>Initial performance: \u003Cstrong>80% success rate\u003C\u002Fstrong>. But the process hurt:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Average \u003Cstrong>100 steps\u003C\u002Fstrong> to complete a query\u003C\u002Fli>\n\u003Cli>\u003Cstrong>145,000 tokens\u003C\u002Fstrong> (expensive)\u003C\u002Fli>\n\u003Cli>\u003Cstrong>724 seconds\u003C\u002Fstrong> worst-case latency\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Ch3>The Bold Move\u003C\u002Fh3>\n\u003Cp>Vercel did the counterintuitive: \u003Cstrong>Delete all custom tools. Keep one: execute arbitrary bash.\u003C\u002Fstrong>\u003C\u002Fp>\n\u003Cp>New Harness:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Give Claude file system access\u003C\u002Fli>\n\u003Cli>Give Claude standard Unix tools: \u003Ccode>cat\u003C\u002Fcode>, \u003Ccode>grep\u003C\u002Fcode>, \u003Ccode>ls\u003C\u002Fcode>\u003C\u002Fli>\n\u003Cli>Trust Claude to figure out navigation\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Ch3>Shocking Results\u003C\u002Fh3>\n\u003Cp>New version:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>\u003Cstrong>100% success rate\u003C\u002Fstrong> (vs. 80%)\u003C\u002Fli>\n\u003Cli>\u003Cstrong>19 steps\u003C\u002Fstrong> (vs. 100)\u003C\u002Fli>\n\u003Cli>\u003Cstrong>67,000 tokens\u003C\u002Fstrong> (vs. 145,000—40% savings)\u003C\u002Fli>\n\u003Cli>\u003Cstrong>141 seconds\u003C\u002Fstrong> (vs. 724—5x faster)\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Ch3>Why Fewer Tools = Better Performance?\u003C\u002Fh3>\n\u003Cp>Vercel&#39;s hypothesis: \u003Cstrong>Models got smarter, context windows grew larger, so maybe the best agent architecture is almost no architecture.\u003C\u002Fstrong>\u003C\u002Fp>\n\u003Cp>Deeper reasons:\u003C\u002Fp>\n\u003Col>\n\u003Cli>\u003Cp>\u003Cstrong>Cognitive Overload\u003C\u002Fstrong> — Too many tools confuse the agent. It spends time deciding which tool to use instead of solving the problem.\u003C\u002Fp>\n\u003C\u002Fli>\n\u003Cli>\u003Cp>\u003Cstrong>Trust and Freedom\u003C\u002Fstrong> — Given basic but powerful primitives, agents perform better.\u003C\u002Fp>\n\u003C\u002Fli>\n\u003Cli>\u003Cp>\u003Cstrong>Universality Beats Specialization\u003C\u002Fstrong> — Custom tools can miss edge cases. Universal tools are more robust.\u003C\u002Fp>\n\u003C\u002Fli>\n\u003C\u002Fol>\n\u003Cp>This reveals a deep truth about Harness Engineering: \u003Cstrong>The best harness isn&#39;t restrictive, it&#39;s enabling.\u003C\u002Fstrong>\u003C\u002Fp>\n\u003Chr>\n\u003Ch2>LangChain&#39;s Evidence: Harness-Only Improvement from Rank 30 to 5\u003C\u002Fh2>\n\u003Cp>LangChain&#39;s case is the clearest proof of Harness Engineering&#39;s power.\u003C\u002Fp>\n\u003Ch3>Baseline\u003C\u002Fh3>\n\u003Cp>LangChain&#39;s deep Agent on \u003Cstrong>Terminal Bench 2.0\u003C\u002Fstrong> ranked \u003Cstrong>#30\u003C\u002Fstrong> with a score of \u003Cstrong>52.8%\u003C\u002Fstrong>.\u003C\u002Fp>\n\u003Cp>Terminal Bench is a code generation benchmark testing agents in real software development scenarios. Rank 30 means 29 systems beat it.\u003C\u002Fp>\n\u003Ch3>Experimental Design\u003C\u002Fh3>\n\u003Cp>LangChain&#39;s critical decision: \u003Cstrong>Keep the model fixed, change only the Harness.\u003C\u002Fstrong>\u003C\u002Fp>\n\u003Cp>Model used: \u003Cstrong>GPT-5.2-Codex\u003C\u002Fstrong> (fixed throughout)\u003C\u002Fp>\n\u003Cp>Variables changed:\u003C\u002Fp>\n\u003Col>\n\u003Cli>System prompt\u003C\u002Fli>\n\u003Cli>Tool set and tool design\u003C\u002Fli>\n\u003Cli>Middleware hooks and control flow\u003C\u002Fli>\n\u003C\u002Fol>\n\u003Ch3>Key Findings\u003C\u002Fh3>\n\u003Ch4>1. Verification Loop is a Game-Changer\u003C\u002Fh4>\n\u003Cp>\u003Cstrong>Problem\u003C\u002Fstrong>: Agent writes code, re-reads it, thinks &quot;looks good,&quot; stops. No actual testing.\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Solution\u003C\u002Fstrong>: \u003Ccode>PreCompletionChecklistMiddleware\u003C\u002Fcode> forces verification pass before exit.\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Impact\u003C\u002Fstrong>: This single hook contributed \u003Cstrong>13.7 percentage points\u003C\u002Fstrong> improvement.\u003C\u002Fp>\n\u003Ch4>2. Context Injection Beats Lecture\u003C\u002Fh4>\n\u003Cp>\u003Cstrong>Problem\u003C\u002Fstrong>: Agent drowned in documentation, missed critical details.\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Solution\u003C\u002Fstrong>: \u003Ccode>LocalContextMiddleware\u003C\u002Fcode> scans local structure upfront, proactively injects relevant information (file tree, key file contents, test commands).\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Impact\u003C\u002Fstrong>: Context injection alone contributed \u003Cstrong>7.2 percentage points\u003C\u002Fstrong> improvement.\u003C\u002Fp>\n\u003Ch4>3. The Counterintuitive Compute Budget Discovery\u003C\u002Fh4>\n\u003Cp>\u003Cstrong>Finding\u003C\u002Fstrong>: Setting reasoning budget to maximum (\u003Ccode>xhigh\u003C\u002Fcode>) actually \u003Cstrong>decreased performance\u003C\u002Fstrong>.\u003C\u002Fp>\n\u003Cul>\n\u003Cli>\u003Ccode>xhigh\u003C\u002Fcode>: 53.9% (due to timeouts)\u003C\u002Fli>\n\u003Cli>\u003Ccode>high\u003C\u002Fcode>: 63.6% (optimal)\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>\u003Cstrong>Lesson\u003C\u002Fstrong>: More thinking time isn&#39;t always better. Agents can suffer analysis paralysis or timeout. Sometimes \u003Cstrong>constraints improve performance\u003C\u002Fstrong>.\u003C\u002Fp>\n\u003Ch3>Final Results\u003C\u002Fh3>\n\u003Cp>After these changes, LangChain&#39;s Agent:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>\u003Cstrong>Ranked #5\u003C\u002Fstrong> (up from #30)\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Score 66.5%\u003C\u002Fstrong> (from 52.8%)\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Model unchanged, only Harness improved\u003C\u002Fstrong>\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>This is the strongest evidence for Harness Engineering&#39;s power: \u003Cstrong>the problem isn&#39;t the model, it&#39;s how you use it.\u003C\u002Fstrong>\u003C\u002Fp>\n\u003Chr>\n\u003Ch2>Martin Fowler&#39;s Three-Component Framework\u003C\u002Fh2>\n\u003Cp>Let&#39;s examine Harness structure through a more formal lens. The framework articulated by Martin Fowler and Birgitta Böckeler has become the industry standard.\u003C\u002Fp>\n\u003Ch3>1. Context Engineering\u003C\u002Fh3>\n\u003Cp>\u003Cstrong>Definition\u003C\u002Fstrong>: Continuously enhanced knowledge base + agent access to dynamic data\u003C\u002Fp>\n\u003Cp>Context Engineering isn&#39;t writing longer prompts. It&#39;s:\u003C\u002Fp>\n\u003Ch4>Core Elements\u003C\u002Fh4>\n\u003Ctable>\n\u003Cthead>\n\u003Ctr>\n\u003Cth>Element\u003C\u002Fth>\n\u003Cth>Description\u003C\u002Fth>\n\u003Cth>Examples\u003C\u002Fth>\n\u003C\u002Ftr>\n\u003C\u002Fthead>\n\u003Ctbody>\u003Ctr>\n\u003Ctd>\u003Cstrong>Static Knowledge Base\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>Code structure, API docs, architecture decisions\u003C\u002Ftd>\n\u003Ctd>README.md, API index\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>\u003Cstrong>Dynamic Context\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>Real-time data, varies by task\u003C\u002Ftd>\n\u003Ctd>Current file tree, relevant code snippets\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>\u003Cstrong>Tool Discovery\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>Agent knows what tools exist and why\u003C\u002Ftd>\n\u003Ctd>Curated tool list with usage examples\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>\u003Cstrong>Observability Integration\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>Agent queries logs from previous runs\u003C\u002Ftd>\n\u003Ctd>Error logs, performance data\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\u003Ch4>Static vs. Dynamic\u003C\u002Fh4>\n\u003Cp>Static docs go stale. Dynamically generated context balloons. Best practice: \u003Cstrong>hybrid\u003C\u002Fstrong>:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Core architecture and API docs stay static, regularly updated\u003C\u002Fli>\n\u003Cli>Runtime context generated dynamically (file tree, recently edited files)\u003C\u002Fli>\n\u003Cli>Combine both when sending to agent\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Ch3>2. Architectural Constraints\u003C\u002Fh3>\n\u003Cp>\u003Cstrong>Definition\u003C\u002Fstrong>: Enforce code structure and patterns using both LLMs and deterministic tools\u003C\u002Fp>\n\u003Cp>This is the Harness&#39;s &quot;rule enforcer.&quot;\u003C\u002Fp>\n\u003Ch4>Dual-Layer Verification\u003C\u002Fh4>\n\u003Cp>\u003Cstrong>Layer One: LLM Verification\u003C\u002Fstrong>\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Agent reviews its own code\u003C\u002Fli>\n\u003Cli>Checks logical correctness, naming, structure\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>Weakness: LLMs sometimes miss things or aren&#39;t strict.\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Layer Two: Deterministic Checks\u003C\u002Fstrong>\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Custom linters\u003C\u002Fli>\n\u003Cli>Structural tests (e.g., all \u003Ccode>user_\u003C\u002Fcode> functions must live in \u003Ccode>user.ts\u003C\u002Fcode>)\u003C\u002Fli>\n\u003Cli>Module boundary checks (e.g., \u003Ccode>data\u002F\u003C\u002Fcode> layer can&#39;t import from \u003Ccode>ui\u002F\u003C\u002Fcode>)\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Ch4>Example\u003C\u002Fh4>\n\u003Cp>Suppose you enforce Clean Architecture. Harness can mandate:\u003C\u002Fp>\n\u003Cpre>\u003Ccode class=\"language-typescript\">\u002F\u002F Violation ❌ — data layer importing ui layer\nimport { Button } from &#39;..\u002Fui\u002Fbutton&#39;;  \u002F\u002F Linter rejects\n\n\u002F\u002F Correct ✅\nimport { UserRepository } from &#39;.\u002Fuser.repository&#39;;  \u002F\u002F Linter allows\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Cp>Not a suggestion. \u003Cstrong>Enforced\u003C\u002Fstrong>. Every commit must pass.\u003C\u002Fp>\n\u003Ch3>3. Garbage Collection\u003C\u002Fh3>\n\u003Cp>\u003Cstrong>Definition\u003C\u002Fstrong>: Regularly run cleanup agents to find and fix inconsistencies\u003C\u002Fp>\n\u003Cp>Code entropy is real. Agent-generated code especially accumulates debt:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Dead code (functions from removed features)\u003C\u002Fli>\n\u003Cli>Stale comments\u003C\u002Fli>\n\u003Cli>Missing unit tests\u003C\u002Fli>\n\u003Cli>Naming violations\u003C\u002Fli>\n\u003Cli>Documentation-implementation drift\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Ch4>How GC Agents Work\u003C\u002Fh4>\n\u003Col>\n\u003Cli>\u003Cstrong>Scan\u003C\u002Fstrong> — Periodically scan entire codebase\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Detect\u003C\u002Fstrong> — Identify inconsistencies using rules and LLMs\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Report\u003C\u002Fstrong> — Generate fix proposals\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Fix\u003C\u002Fstrong> — Auto-fix or flag for review\u003C\u002Fli>\n\u003C\u002Fol>\n\u003Ch4>Example\u003C\u002Fh4>\n\u003Cpre>\u003Ccode class=\"language-bash\">$ npm run gc\n\nResults:\n- Found 12 dead code blocks from removed APIs\n- Detected 3 stale documentation files\n- Identified 5 naming convention violations\n- Suggested repairs (auto-apply or review)\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Chr>\n\u003Ch2>Six Core Modules of Harness Engineering\u003C\u002Fh2>\n\u003Cp>Synthesizing the practices above, a complete Harness Engineering framework includes six core modules.\u003C\u002Fp>\n\u003Ch3>1. Context Management Engine\u003C\u002Fh3>\n\u003Cp>\u003Cstrong>Responsibility\u003C\u002Fstrong>: Place the most relevant information in the limited context window\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Implementation\u003C\u002Fstrong>:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Declarative context rules (&quot;When running Python scripts, include .env template&quot;)\u003C\u002Fli>\n\u003Cli>Vector similarity search (find most relevant code snippets)\u003C\u002Fli>\n\u003Cli>Priority queues (critical information first)\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>\u003Cstrong>Tools\u003C\u002Fstrong>: Supabase Vector DB, Pinecone, LangChain&#39;s RecursiveCharacterTextSplitter\u003C\u002Fp>\n\u003Ch3>2. Tool and Capability Layer\u003C\u002Fh3>\n\u003Cp>\u003Cstrong>Responsibility\u003C\u002Fstrong>: Define what agents can do and how to do it\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Key Decision\u003C\u002Fstrong>: High-level abstractions (\u003Ccode>run_command\u003C\u002Fcode>) vs. fine-grained tools?\n   → Vercel&#39;s lesson: High-level abstractions win. Fewer tools, more power.\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Typical Tool Set\u003C\u002Fstrong>:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>File system access (read, write, delete)\u003C\u002Fli>\n\u003Cli>Code execution (Python, bash)\u003C\u002Fli>\n\u003Cli>Search and browsing (Google, Brave, web)\u003C\u002Fli>\n\u003Cli>External APIs (Stripe, AWS, custom)\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Ch3>3. Control Flow Orchestrator\u003C\u002Fh3>\n\u003Cp>\u003Cstrong>Responsibility\u003C\u002Fstrong>: Decide task execution order and branching\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Three Common Patterns\u003C\u002Fstrong>:\u003C\u002Fp>\n\u003Cp>a) \u003Cstrong>Linear\u003C\u002Fstrong> — One step after another\u003C\u002Fp>\n\u003Cpre>\u003Ccode>Plan → Code → Test → Deploy\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Cp>b) \u003Cstrong>Parallel\u003C\u002Fstrong> — Multiple agents simultaneously\u003C\u002Fp>\n\u003Cpre>\u003Ccode>Code Agent ──┐\nTest Agent ─┼→ Verify\nDoc Agent ──┘\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Cp>c) \u003Cstrong>Cyclic\u003C\u002Fstrong> — Generate → Verify → Revise → Verify (loop)\u003C\u002Fp>\n\u003Cpre>\u003Ccode>Generate → Verify → Revise → Verify (repeat)\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Ch3>4. Verification and Feedback Layer\u003C\u002Fh3>\n\u003Cp>\u003Cstrong>Responsibility\u003C\u002Fstrong>: Check output quality, provide actionable feedback\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Verification Types\u003C\u002Fstrong>:\u003C\u002Fp>\n\u003Ctable>\n\u003Cthead>\n\u003Ctr>\n\u003Cth>Type\u003C\u002Fth>\n\u003Cth>Method\u003C\u002Fth>\n\u003Cth>Example\u003C\u002Fth>\n\u003C\u002Ftr>\n\u003C\u002Fthead>\n\u003Ctbody>\u003Ctr>\n\u003Ctd>\u003Cstrong>Syntax\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>Deterministic (linter)\u003C\u002Ftd>\n\u003Ctd>TypeScript \u003Ccode>tsc --noEmit\u003C\u002Fcode>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>\u003Cstrong>Logic\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>Automated tests\u003C\u002Ftd>\n\u003Ctd>Unit tests, integration tests\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>\u003Cstrong>Style\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>Rule engine\u003C\u002Ftd>\n\u003Ctd>Prettier, ESLint\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>\u003Cstrong>Semantic\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>LLM review\u003C\u002Ftd>\n\u003Ctd>&quot;Is this function name meaningful?&quot;\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>\u003Cstrong>Business\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>Humans or rules\u003C\u002Ftd>\n\u003Ctd>&quot;Does this match product requirements?&quot;\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\u003Ch3>5. Recovery and Retry Mechanism\u003C\u002Fh3>\n\u003Cp>\u003Cstrong>Responsibility\u003C\u002Fstrong>: Gracefully recover when agents fail\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Failure Modes and Strategies\u003C\u002Fstrong>:\u003C\u002Fp>\n\u003Ctable>\n\u003Cthead>\n\u003Ctr>\n\u003Cth>Failure\u003C\u002Fth>\n\u003Cth>Symptom\u003C\u002Fth>\n\u003Cth>Recovery\u003C\u002Fth>\n\u003C\u002Ftr>\n\u003C\u002Fthead>\n\u003Ctbody>\u003Ctr>\n\u003Ctd>Tool Timeout\u003C\u002Ftd>\n\u003Ctd>API unresponsive &gt;30s\u003C\u002Ftd>\n\u003Ctd>Exponential backoff (1s, 2s, 4s)\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Context Overflow\u003C\u002Ftd>\n\u003Ctd>Exceeds token limit\u003C\u002Ftd>\n\u003Ctd>Dynamic truncation or sub-tasks\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Infinite Loop\u003C\u002Ftd>\n\u003Ctd>Same step repeated &gt;5 times\u003C\u002Ftd>\n\u003Ctd>Mark failed, rollback to checkpoint\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Permission Error\u003C\u002Ftd>\n\u003Ctd>&quot;Access Denied&quot;\u003C\u002Ftd>\n\u003Ctd>Alert user, don&#39;t auto-retry\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Model Refusal\u003C\u002Ftd>\n\u003Ctd>&quot;I can&#39;t do this&quot;\u003C\u002Ftd>\n\u003Ctd>Restructure context or upgrade model\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\u003Ch3>6. Observability and Learning Layer\u003C\u002Fh3>\n\u003Cp>\u003Cstrong>Responsibility\u003C\u002Fstrong>: Record execution traces for debugging and improvement\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Critical Data\u003C\u002Fstrong>:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>\u003Cstrong>Execution logs\u003C\u002Fstrong> — What happened at each step and why\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Decision points\u003C\u002Fstrong> — Where agent chose, based on what\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Performance metrics\u003C\u002Fstrong> — Tokens spent, execution time, success\u002Ffailure\u003C\u002Fli>\n\u003Cli>\u003Cstrong>User feedback\u003C\u002Fstrong> — &quot;Was this helpful?&quot;\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>\u003Cstrong>Uses\u003C\u002Fstrong>:\u003C\u002Fp>\n\u003Col>\n\u003Cli>\u003Cstrong>Real-time debugging\u003C\u002Fstrong> — When agent fails, see the trace\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Continuous improvement\u003C\u002Fstrong> — Identify patterns, improve Harness\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Training data\u003C\u002Fstrong> — Seed fine-tuning or reinforcement learning\u003C\u002Fli>\n\u003C\u002Fol>\n\u003Chr>\n\u003Ch2>Risks, Controversies, and Engineering Challenges\u003C\u002Fh2>\n\u003Cp>Harness Engineering isn&#39;t a silver bullet. It introduces new complexity and new risks.\u003C\u002Fp>\n\u003Ch3>Challenge 1: Documentation Decay and Entropy\u003C\u002Fh3>\n\u003Cp>\u003Cstrong>Problem\u003C\u002Fstrong>: Even with good Harness, knowledge in the codebase goes stale.\u003C\u002Fp>\n\u003Cp>A simple markdown file decays. Too many rules overwhelm the task.\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Example\u003C\u002Fstrong>:\u003C\u002Fp>\n\u003Cpre>\u003Ccode class=\"language-markdown\"># Our Architecture Rules (written June 2025)\n\n1. All API responses should return { data, error }\n2. Use PostgreSQL JSONB for nested structures\n3. Service layer should use dependency injection\n... (50 more rules)\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Cp>Six months later, #1 and #3 changed, but docs didn&#39;t. Agent follows outdated rules.\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Partial Solutions\u003C\u002Fstrong>:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Write architecture rules as executable tests, not comments\u003C\u002Fli>\n\u003Cli>Use LLM verification to complement deterministic checks\u003C\u002Fli>\n\u003Cli>Run periodic garbage collection to audit documentation-implementation alignment\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Ch3>Challenge 2: Model Iteration Speed vs. Harness Stability\u003C\u002Fh3>\n\u003Cp>\u003Cstrong>Problem\u003C\u002Fstrong>: Harness is designed for a specific model. What happens when new models launch?\u003C\u002Fp>\n\u003Cp>Each model has different optimal prompting strategies, tool usage patterns, reasoning styles. A perfect Harness for GPT-5 may fail on Claude Opus.\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Example\u003C\u002Fstrong>:\u003C\u002Fp>\n\u003Cpre>\u003Ccode class=\"language-python\"># Harness optimized for GPT-5\nsystem_prompt = &quot;Think step by step...&quot;  # GPT-5 loves this\ntools = [file_read, bash_execute]  # Minimal tool set\n\n# Claude Opus might prefer\nsystem_prompt = &quot;Analyze carefully, consider alternatives...&quot;\ntools = [file_read, bash_execute, web_search, ...]  # More tools\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Cp>\u003Cstrong>Schmid&#39;s Recommendation\u003C\u002Fstrong>: &quot;Build to Delete&quot;—design Harness assuming it&#39;ll be replaced with each new model release.\u003C\u002Fp>\n\u003Ch3>Challenge 3: Over-Engineering Risk\u003C\u002Fh3>\n\u003Cp>\u003Cstrong>Problem\u003C\u002Fstrong>: Teams may over-invest in Harness optimization, creating complexity.\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Red Flags\u003C\u002Fstrong>:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Harness code exceeds application code\u003C\u002Fli>\n\u003Cli>10+ middleware layers, each &quot;optimizing&quot;\u003C\u002Fli>\n\u003Cli>Documentation-implementation sync becomes night work\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>\u003Cstrong>Balance Point\u003C\u002Fstrong>:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Start simple (maybe just a prompt + verification layer)\u003C\u002Fli>\n\u003Cli>Optimize when you see specific bottlenecks\u003C\u002Fli>\n\u003Cli>Regular audits: Is the Harness helping or hurting?\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Ch3>Challenge 4: Deliverability and Explainability\u003C\u002Fh3>\n\u003Cp>\u003Cstrong>Problem\u003C\u002Fstrong>: Complex harnesses are hard to explain to non-technical users.\u003C\u002Fp>\n\u003Cp>User wants: &quot;Why did the agent reject my request?&quot;\u003C\u002Fp>\n\u003Cp>Answer is: &quot;Because architectural constraint layer 3 detected…&quot; Too technical.\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Solutions\u003C\u002Fstrong>:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>User-readable rejection messages\u003C\u002Fli>\n\u003Cli>Provide repair suggestions, not just &quot;no&quot;\u003C\u002Fli>\n\u003Cli>Escalation paths (&quot;This needs human review&quot;)\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Ch3>Challenge 5: Governance: How Much Human-in-the-Loop?\u003C\u002Fh3>\n\u003Cp>\u003Cstrong>Problem\u003C\u002Fstrong>: Where to inject humans? Too much, agent value disappears. Too little, risk is high.\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Typical Governance Levels\u003C\u002Fstrong>:\u003C\u002Fp>\n\u003Ctable>\n\u003Cthead>\n\u003Ctr>\n\u003Cth>Operation\u003C\u002Fth>\n\u003Cth>Human Intervention\u003C\u002Fth>\n\u003C\u002Ftr>\n\u003C\u002Fthead>\n\u003Ctbody>\u003Ctr>\n\u003Ctd>Modify non-critical file\u003C\u002Ftd>\n\u003Ctd>Auto, post-review\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Delete code\u003C\u002Ftd>\n\u003Ctd>Auto, post-review\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Deploy to production\u003C\u002Ftd>\n\u003Ctd>Required approval\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Modify schema\u002FAPI\u003C\u002Ftd>\n\u003Ctd>Required approval\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Create new database table\u003C\u002Ftd>\n\u003Ctd>Required approval\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\u003Cp>No perfect answer. Depends on risk tolerance and trust.\u003C\u002Fp>\n\u003Ch3>Challenge 6: Learning Curve and Knowledge Transfer\u003C\u002Fh3>\n\u003Cp>\u003Cstrong>Problem\u003C\u002Fstrong>: Building and maintaining Harness requires specialized skills.\u003C\u002Fp>\n\u003Cp>Not every team has them. When the Harness expert leaves, what happens?\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Long-term Solutions\u003C\u002Fstrong>:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Open-source Harness best practices (LangChain, Anthropic doing this)\u003C\u002Fli>\n\u003Cli>Develop Harness engineering as a career path\u003C\u002Fli>\n\u003Cli>Provide tools and frameworks to lower entry barriers\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Chr>\n\u003Ch2>The Great Shift: Competition Moves from Models to Harnesses\u003C\u002Fh2>\n\u003Cp>In 2025, everyone competed on model quality. In 2026, everyone competes on Harness quality.\u003C\u002Fp>\n\u003Ch3>Why the Shift?\u003C\u002Fh3>\n\u003Cp>Three reasons:\u003C\u002Fp>\n\u003Col>\n\u003Cli>\u003Cp>\u003Cstrong>Model Convergence\u003C\u002Fstrong>\u003C\u002Fp>\n\u003Cul>\n\u003Cli>GPT-5, Claude Opus 4.5, Gemini 2.0 capabilities are converging\u003C\u002Fli>\n\u003Cli>Incremental improvements are expensive and hard\u003C\u002Fli>\n\u003Cli>Model-based competitive advantage is eroding\u003C\u002Fli>\n\u003C\u002Ful>\n\u003C\u002Fli>\n\u003Cli>\u003Cp>\u003Cstrong>Harness Multiplier Effect\u003C\u002Fstrong>\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Good Harness can improve existing model performance by 20-30%\u003C\u002Fli>\n\u003Cli>LangChain case: 25 rank positions, 13.7% score improvement\u003C\u002Fli>\n\u003Cli>Cost: improving Harness vs. training new models\u003C\u002Fli>\n\u003C\u002Ful>\n\u003C\u002Fli>\n\u003Cli>\u003Cp>\u003Cstrong>Production Reality\u003C\u002Fstrong>\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Reliability matters more than raw capability\u003C\u002Fli>\n\u003Cli>Agent not losing control &gt; Agent&#39;s raw IQ\u003C\u002Fli>\n\u003Cli>Vercel case: Removing complexity improved performance\u003C\u002Fli>\n\u003C\u002Ful>\n\u003C\u002Fli>\n\u003C\u002Fol>\n\u003Ch3>New Division of Labor\u003C\u002Fh3>\n\u003Cp>\u003Cstrong>Old\u003C\u002Fstrong>:\u003C\u002Fp>\n\u003Cpre>\u003Ccode>AI Researcher → Build better model → Engineer → Integrate\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Cp>\u003Cstrong>New\u003C\u002Fstrong>:\u003C\u002Fp>\n\u003Cpre>\u003Ccode>Model Provider (OpenAI, Anthropic, Google)\n        ↓\n     Model\n        ↓\nHarness Engineer → Design framework → App Engineer → Build product\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Cp>Harness Engineer becomes a distinct role. Not model expert, not app developer, but \u003Cstrong>systems designer\u003C\u002Fstrong>.\u003C\u002Fp>\n\u003Ch3>Business Implications\u003C\u002Fh3>\n\u003Cp>If competition shifts from models to harnesses:\u003C\u002Fp>\n\u003Col>\n\u003Cli>\u003Cstrong>Smaller teams can compete\u003C\u002Fstrong> — Harness development is lighter weight than model training\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Open-source tools matter more\u003C\u002Fstrong> — LangChain, LlamaIndex, Claude Agent SDK become critical\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Consulting and implementation services boom\u003C\u002Fstrong> — Many teams need help building harnesses\u003C\u002Fli>\n\u003C\u002Fol>\n\u003Chr>\n\u003Ch2>Conclusion: The System Wins\u003C\u002Fh2>\n\u003Cp>In 2026, Harness Engineering has evolved from a new idea to a core production requirement. Mitchell Hashimoto&#39;s simple observation—&quot;Engineer the environment so agents can&#39;t fail that way&quot;—has crystallized into an engineering discipline.\u003C\u002Fp>\n\u003Cp>Seven engineers built a million-line product through Harness. Vercel won by deletion. Anthropic won through orchestration. LangChain jumped 25 ranks by improving system design.\u003C\u002Fp>\n\u003Cp>Models still matter. But they&#39;re no longer the whole story. \u003Cstrong>Real competition happens in the invisible places: system boundaries, constraints, verification loops, and recovery mechanisms.\u003C\u002Fstrong>\u003C\u002Fp>\n\u003Cp>For engineers building reliable AI systems, Harness Engineering is no longer optional. It&#39;s essential. Not because it&#39;s trendy, but because \u003Cstrong>it works\u003C\u002Fstrong>.\u003C\u002Fp>\n\u003Chr>\n\u003Ch2>References\u003C\u002Fh2>\n\u003Ch3>Primary Sources\u003C\u002Fh3>\n\u003Col>\n\u003Cli>\u003Ca href=\"https:\u002F\u002Fmitchellh.com\u002Fwriting\u002Fmy-ai-adoption-journey\" target=\"_blank\" rel=\"noopener\">Mitchell Hashimoto - My AI Adoption Journey\u003C\u002Fa> — Origin of Harness Engineering naming\u003C\u002Fli>\n\u003Cli>\u003Ca href=\"https:\u002F\u002Fmartinfowler.com\u002Farticles\u002Fexploring-gen-ai\u002Fharness-engineering.html\" target=\"_blank\" rel=\"noopener\">Martin Fowler - Harness Engineering\u003C\u002Fa> — Classic articulation of three components\u003C\u002Fli>\n\u003Cli>\u003Ca href=\"https:\u002F\u002Fopenai.com\u002Findex\u002Fharness-engineering\u002F\" target=\"_blank\" rel=\"noopener\">OpenAI - Harness Engineering: Leveraging Codex in an Agent-First World\u003C\u002Fa> — One million lines of code case study\u003C\u002Fli>\n\u003Cli>\u003Ca href=\"https:\u002F\u002Fwww.philschmid.de\u002Fagent-harness-2026\" target=\"_blank\" rel=\"noopener\">Philipp Schmid - The Importance of Agent Harness in 2026\u003C\u002Fa> — OS metaphor and context engineering\u003C\u002Fli>\n\u003Cli>\u003Ca href=\"https:\u002F\u002Fvercel.com\u002Fblog\u002Fwe-removed-80-percent-of-our-agents-tools\" target=\"_blank\" rel=\"noopener\">Vercel - We Removed 80% of Our Agent&#39;s Tools\u003C\u002Fa> — Simplicity &gt; Complexity evidence\u003C\u002Fli>\n\u003Cli>\u003Ca href=\"https:\u002F\u002Fblog.langchain.com\u002Fimproving-deep-agents-with-harness-engineering\u002F\" target=\"_blank\" rel=\"noopener\">LangChain - Improving Deep Agents with Harness Engineering\u003C\u002Fa> — Terminal Bench 2.0 case study (rank 30 to 5)\u003C\u002Fli>\n\u003C\u002Fol>\n\u003Ch3>Secondary Analysis\u003C\u002Fh3>\n\u003Col start=\"7\">\n\u003Cli>\u003Ca href=\"https:\u002F\u002Fwww.anthropic.com\u002Fengineering\u002Fmulti-agent-research-system\" target=\"_blank\" rel=\"noopener\">Anthropic - How We Built Our Multi-Agent Research System\u003C\u002Fa> — Agent orchestration patterns\u003C\u002Fli>\n\u003Cli>\u003Ca href=\"https:\u002F\u002Fwww.epsilla.com\u002Fblogs\u002Fharness-engineering-evolution-prompt-context-autonomous-agents\" target=\"_blank\" rel=\"noopener\">Epsilla - Harness Engineering: The Evolution of AI Development\u003C\u002Fa> — Prompt → Context → Harness trajectory\u003C\u002Fli>\n\u003Cli>\u003Ca href=\"https:\u002F\u002Fwww.nxcode.io\u002Fresources\u002Fnews\u002Fharness-engineering-complete-guide-2026\" target=\"_blank\" rel=\"noopener\">NxCode - Harness Engineering Complete Guide for 2026\u003C\u002Fa> — Practical patterns synthesis\u003C\u002Fli>\n\u003Cli>\u003Ca href=\"https:\u002F\u002Fsmartscope.blog\u002Fen\u002Fblog\u002Fharness-engineering-overview\u002F\" target=\"_blank\" rel=\"noopener\">SmartScope - Harness Engineering Overview\u003C\u002Fa> — Concept clarification\u003C\u002Fli>\n\u003C\u002Fol>\n\u003Ch3>Tools and SDKs\u003C\u002Fh3>\n\u003Col start=\"11\">\n\u003Cli>\u003Ca href=\"https:\u002F\u002Fplatform.claude.com\u002Fdocs\u002Fen\u002Fagent-sdk\u002F\" target=\"_blank\" rel=\"noopener\">Claude Agent SDK Documentation\u003C\u002Fa> — Permissions and hooks implementation\u003C\u002Fli>\n\u003Cli>\u003Ca href=\"https:\u002F\u002Fblog.langchain.com\u002Fthe-anatomy-of-an-agent-harness\u002F\" target=\"_blank\" rel=\"noopener\">LangChain - The Anatomy of an Agent Harness\u003C\u002Fa> — Open-source design patterns\u003C\u002Fli>\n\u003C\u002Fol>\n","Harness Engineering is the discipline of designing external control frameworks for AI Agents. By integrating context engineering, architectural constraints, and garbage collection, it transforms unreliable large models into dependable production systems.","oracore-original",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1774940070981-vjgu.png","ai-agent","en","48c9889e-86df-450b-a356-e4a4b7c83c5b",[16,17,18,19,20],"Harness Engineering","AI Agent","agent orchestration","context engineering","LLM reliability",12,"2026-03-31T06:36:55.648751+00:00","2026-03-31T06:54:14.479+00:00",{"tags":25,"relatedLang":36,"relatedPosts":40},[26,28,30,32,34],{"name":16,"slug":27},"harness-engineering",{"name":20,"slug":29},"llm-reliability",{"name":19,"slug":31},"context-engineering",{"name":33,"slug":12},"AI agent",{"name":18,"slug":35},"agent-orchestration",{"id":14,"slug":37,"title":38,"language":39},"harness-engineering-ai-agent-reliability-2026-zh","駕馭工程：從「馬具」到「作業系統」，AI Agent 可靠性的終極密碼","zh",[41,47,53,59,65,71],{"id":42,"slug":43,"title":44,"cover_image":45,"image_url":45,"created_at":46,"category":12},"5efa67dd-b9f7-4a2f-8c68-3a4bc6a6b7d9","claude-code-dynamic-workflow-ai-harness-en","Claude Code 动态工作流：AI 自写 Harness","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781035372495-9czj.png","2026-06-09T20:02:22.33375+00:00",{"id":48,"slug":49,"title":50,"cover_image":51,"image_url":51,"created_at":52,"category":12},"2bd28e0e-0f4b-4987-a961-28763c1e1926","agent-orchestration-enterprise-ai-layer-en","Agent orchestration is the missing layer for enterprise AI","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780984981174-08mj.png","2026-06-09T06:02:31.384174+00:00",{"id":54,"slug":55,"title":56,"cover_image":57,"image_url":57,"created_at":58,"category":12},"95684312-23dc-4a78-a917-df14d132c5fa","ai-agents-use-blockchain-trust-layer-en","AI agents use blockchain as a trust layer","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780980506080-ki4s.png","2026-06-09T04:48:01.710214+00:00",{"id":60,"slug":61,"title":62,"cover_image":63,"image_url":63,"created_at":64,"category":12},"0208e47f-7d4c-4473-a0f9-4cd193b5c139","8-rag-patterns-demos-into-prod-en","8 RAG patterns that turn demos into prod","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780971552707-qpl7.png","2026-06-09T02:18:36.760049+00:00",{"id":66,"slug":67,"title":68,"cover_image":69,"image_url":69,"created_at":70,"category":12},"b413d484-6786-4c32-abdc-77f010ac7eba","fine-tuning-beats-rag-style-not-facts-en","Fine-tuning beats RAG when the goal is style, not facts","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780924681800-5xji.png","2026-06-08T13:17:25.701649+00:00",{"id":72,"slug":73,"title":74,"cover_image":75,"image_url":75,"created_at":76,"category":12},"57beb8b4-c233-400f-b95b-a97be1cf9d02","openclaw-small-business-ai-staff-en","OpenClaw shows how small businesses use AI staff","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780904882032-yp13.png","2026-06-08T07:47:27.730921+00:00",[78,83,88,93,98,103,108,113,114,119],{"id":79,"slug":80,"title":81,"created_at":82},"03db8de8-8dc2-4ac1-9cf7-898782efbb1f","anthropic-claude-ai-agent-task-automation-en","Anthropic's Claude AI Agent: A New Era of Task Automation","2026-03-25T16:25:06.513026+00:00",{"id":84,"slug":85,"title":86,"created_at":87},"045d1abc-190d-4594-8c95-91e2a26f0c5a","googles-2026-ai-agent-report-decoded-en","Google’s 2026 AI Agent Report, Decoded","2026-03-26T11:15:23.046616+00:00",{"id":89,"slug":90,"title":91,"created_at":92},"e64aba21-254b-4f93-aa21-837484bb52ec","kimi-k25-review-stronger-still-not-legend-en","Kimi K2.5 review: stronger, still not a legend","2026-03-27T07:15:55.385951+00:00",{"id":94,"slug":95,"title":96,"created_at":97},"30dfb781-a1b2-4add-aebe-b3df40247c37","claude-code-controls-mac-desktop-en","Claude Code now controls your Mac desktop","2026-03-28T03:01:59.384091+00:00",{"id":99,"slug":100,"title":101,"created_at":102},"254405b6-7833-4800-8e13-f5196deefbe6","cloudflare-100x-faster-ai-agent-sandbox-en","Cloudflare’s 100x Faster AI Agent Sandbox","2026-03-28T03:09:44.356437+00:00",{"id":104,"slug":105,"title":106,"created_at":107},"04f29b7f-9b91-4306-89a7-97d725e6e1ba","openai-backs-isara-agent-swarm-bet-en","OpenAI backs Isara’s agent-swarm bet","2026-03-28T03:15:27.849766+00:00",{"id":109,"slug":110,"title":111,"created_at":112},"3b0bf479-e4ae-4703-9666-721a7e0cdb91","openai-plan-automated-ai-researcher-en","OpenAI’s plan for an automated AI researcher","2026-03-28T03:17:42.312819+00:00",{"id":4,"slug":5,"title":6,"created_at":22},{"id":115,"slug":116,"title":117,"created_at":118},"7a09007d-820f-43b3-8607-8ad1bfcb94c8","mcp-explained-from-prompts-to-production-en","MCP Explained: From Prompts to Production","2026-04-01T09:24:40.089177+00:00",{"id":120,"slug":121,"title":122,"created_at":123},"116d5ee9-a4f1-4b5a-aac5-5d035dd22bbe","amazon-bedrock-agents-multi-agent-workflows-en","Amazon Bedrock Agents Gets Multi-Agent Workflows","2026-04-01T09:30:30.197685+00:00"]