[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-skillopt-agent-skills-text-space-optimizer-en":3,"article-related-skillopt-agent-skills-text-space-optimizer-en":30,"series-research-7f605257-ebd0-4301-b48a-08e7550c9fa6":83},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":22,"views":26,"created_at":27,"published_at":28,"topic_cluster_id":29},"7f605257-ebd0-4301-b48a-08e7550c9fa6","skillopt-agent-skills-text-space-optimizer-en","SkillOpt trains agent skills like model weights","\u003Cp data-speakable=\"summary\">SkillOpt treats agent \u003Ca href=\"\u002Ftag\u002Fskills\">skills\u003C\u002Fa> as editable text and optimizes them with validation-gated updates.\u003C\u002Fp>\u003Cul>\u003Cli>\u003Cstrong>Research org\u003C\u002Fstrong>: Unspecified in arXiv abstract\u003C\u002Fli>\u003Cli>\u003Cstrong>Core data\u003C\u002Fstrong>: +23.5 points on GPT-5.5 in direct chat\u003C\u002Fli>\u003Cli>\u003Cstrong>Breakthrough\u003C\u002Fstrong>: Separate optimizer model makes bounded add\u002Fdelete\u002Freplace edits to a skill document\u003C\u002Fli>\u003C\u002Ful>\u003Cp>\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.23904\">SkillOpt: Executive Strategy for Self-Evolving Agent Skills\u003C\u002Fa> argues that agent skills should be trained more like model parameters and less like ad hoc prompt tweaks. For developers building \u003Ca href=\"\u002Ftag\u002Fagents\">agents\u003C\u002Fa>, the practical question is simple: if a skill can be improved by feedback, can that improvement be made reproducible, controllable, and cheap to deploy?\u003C\u002Fp>\u003Cp>The paper’s answer is yes, at least in the setting it studies. SkillOpt keeps the base agent frozen and uses a separate optimizer model to revise a single skill document from scored rollouts, while only accepting edits that strictly improve a held-out validation score. That means the skill itself becomes the thing that evolves, but the deployed agent does not need extra inference-time model calls.\u003C\u002Fp>\u003Ch2>Why this problem matters\u003C\u002Fh2>\u003Cp>Most agent skills today are either hand-written, generated in one shot, or revised through loose self-editing loops. The authors’ critique is that these approaches do not behave like a real optimizer: they are hard to control, hard to reproduce, and they do not reliably improve beyond their starting point when feedback is available.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779689162977-gdtb.png\" alt=\"SkillOpt trains agent skills like model weights\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>That matters because agent skills are increasingly the part of the system that determines whether the model can actually use tools, follow workflows, or stay on task. If skill updates are noisy or unbounded, teams end up with brittle \u003Ca href=\"\u002Ftag\u002Fprompt-engineering\">prompt engineering\u003C\u002Fa> instead of something closer to a training loop.\u003C\u002Fp>\u003Cp>The paper frames skill text as an external state of a frozen agent. That is a useful mental model for engineers: instead of treating a prompt as static configuration, SkillOpt treats it as an artifact that can be optimized with discipline similar to weight-space training.\u003C\u002Fp>\u003Ch2>How SkillOpt works in plain English\u003C\u002Fh2>\u003Cp>SkillOpt is described as a controllable text-space optimizer for agent skills. A separate optimizer model looks at scored rollouts and proposes bounded edits to one skill document. Those edits are limited to add, delete, and replace operations, which keeps the search space from becoming an unconstrained rewrite of the whole skill.\u003C\u002Fp>\u003Cp>The key safeguard is validation gating. An edit only gets accepted if it strictly improves a held-out validation score. In other words, the system does not trust the optimizer’s intuition alone; it requires measurable improvement before the skill changes.\u003C\u002Fp>\u003Cp>The paper also adds three stability mechanisms: a textual learning-rate budget, a rejected-edit buffer, and an epoch-wise slow\u002Fmeta update. The abstract does not spell out implementation details beyond those names, but the intent is clear: control how much the skill can change, remember failed edits, and update more conservatively over time.\u003C\u002Fp>\u003Cp>For practitioners, this is the most interesting part of the design. It looks like prompt optimization, but with the kinds of constraints you would expect from a serious training pipeline. That makes it more plausible as an engineering workflow than a free-form self-reflection loop.\u003C\u002Fp>\u003Ch2>What the paper actually shows\u003C\u002Fh2>\u003Cp>The authors evaluate SkillOpt across six benchmarks, seven target models, and three execution harnesses: direct chat, Codex, and \u003Ca href=\"\u002Fnews\u002Fwhy-claude-code-and-qoder-beat-chatty-ai-coding-tools-en\">Claude Code\u003C\u002Fa>. That is a broad test matrix for a paper in this space, and it matters because agent behavior often changes a lot depending on the runtime environment.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779689156415-uvs7.png\" alt=\"SkillOpt trains agent skills like model weights\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>According to the abstract, SkillOpt is best or tied on all 52 evaluated model-benchmark-harness cells. It also beats every per-cell competitor among human, one-shot LLM, Trace2Skill, TextGrad, GEPA, and EvoSkill skills. Those are strong comparative claims, though the abstract does not provide the per-benchmark breakdown in the text we have here.\u003C\u002Fp>\u003Cp>The clearest number in the abstract is on GPT-5.5. SkillOpt lifts the average no-skill accuracy by +23.5 points in direct chat, by +24.8 inside the Codex agentic loop, and by +19.1 inside \u003Ca href=\"\u002Ftag\u002Fclaude-code\">Claude Code\u003C\u002Fa>. The abstract does not provide the underlying baseline percentages, so those gains should be read as deltas rather than absolute scores.\u003C\u002Fp>\u003Cp>There is also a transfer story. The paper says optimized skill artifacts retain value when moved across model scales, between Codex and \u003Ca href=\"\u002Ftag\u002Fclaude\">Claude\u003C\u002Fa> Code execution environments, and even to a nearby math benchmark without further optimization. That suggests the learned skill is not just overfit to one exact setup.\u003C\u002Fp>\u003Ch2>What developers should take away\u003C\u002Fh2>\u003Cp>If you build agents, the practical implication is that skills may be something you can train, version, and transfer instead of hand-tuning forever. SkillOpt’s core promise is that skill improvement can be made more systematic without adding runtime cost at deployment, because the optimization happens offline and the deployed agent stays frozen.\u003C\u002Fp>\u003Cp>That zero inference-time model-call claim is important. Many agent-improvement loops add complexity to every request. SkillOpt’s design tries to shift that cost into the training phase, which is usually where you want it if the goal is stable production behavior.\u003C\u002Fp>\u003Cp>Still, the abstract leaves some open questions. We do not get the benchmark names, the exact validation protocol, the size of the skill documents, or the failure modes of the optimizer. We also do not see whether the method is sensitive to the choice of held-out validation set, or how much manual oversight is required when the optimizer proposes edits.\u003C\u002Fp>\u003Cp>There is also a broader limitation that comes with any text-space optimizer: the skill is only as good as the search process that edits it. If the optimizer model is weak, or the reward signal is misaligned, the system can still drift in the wrong direction. The paper’s discipline helps, but it does not remove the need for careful evaluation.\u003C\u002Fp>\u003Ch2>Why this is worth watching\u003C\u002Fh2>\u003Cp>SkillOpt is interesting because it moves agent improvement closer to something engineers already understand: iterative optimization with constraints, validation, and transfer testing. That is a more production-friendly story than “let the agent rewrite itself and hope for the best.”\u003C\u002Fp>\u003Cp>Even if you do not adopt the exact method, the paper points toward a useful design pattern for agent systems: keep the base model fixed, make skills explicit artifacts, and improve those artifacts with measurable gates. That is a cleaner mental model for debugging, rollback, and reuse.\u003C\u002Fp>\u003Cp>For teams shipping agentic workflows, the biggest takeaway is not just the accuracy gains. It is the possibility of treating skills as durable assets that can evolve across models and execution environments without paying extra at inference time.\u003C\u002Fp>\u003Cul>\u003Cli>SkillOpt optimizes a single skill document with bounded text edits and validation gating.\u003C\u002Fli>\u003Cli>It reports gains across six benchmarks, seven models, and three execution harnesses.\u003C\u002Fli>\u003Cli>The abstract claims transfer across model scales and agent runtimes, but leaves some details unspecified.\u003C\u002Fli>\u003C\u002Ful>","SkillOpt treats agent skills as editable text and optimizes them with validation-gated updates.","arxiv.org","https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.23904",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779689162977-gdtb.png","research","en","628d9cc5-f7d9-46c8-90be-a0475f7a2ddb",[17,18,19,20,21],"agent skills","prompt optimization","text-space optimization","self-evolving agents","validation gating",[23,24,25],"Treats agent skills as editable state rather than static prompts","Uses bounded text edits plus held-out validation to control updates","Reports broad gains and transfer without extra inference-time calls",6,"2026-05-25T06:05:32.454633+00:00","2026-05-25T06:05:32.445+00:00","3103988e-c4fe-45e3-98ab-846500c9d507",{"tags":31,"relatedLang":42,"relatedPosts":46},[32,34,36,38,40],{"name":18,"slug":33},"prompt-optimization",{"name":21,"slug":35},"validation-gating",{"name":20,"slug":37},"self-evolving-agents",{"name":17,"slug":39},"agent-skills",{"name":19,"slug":41},"text-space-optimization",{"id":15,"slug":43,"title":44,"language":45},"skillopt-agent-skills-text-space-optimizer-zh","SkillOpt 把技能當權重訓練","zh",[47,53,59,65,71,77],{"id":48,"slug":49,"title":50,"cover_image":51,"image_url":51,"created_at":52,"category":13},"850449f2-e75b-4dbf-97c0-3590c6cbf097","crdts-keep-replicas-in-sync-without-locks-en","CRDTs keep replicas in sync without locks","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781011086602-cokl.png","2026-06-09T13:17:35.890527+00:00",{"id":54,"slug":55,"title":56,"cover_image":57,"image_url":57,"created_at":58,"category":13},"7c6b6428-ba8d-4c59-840b-cf96a95139e5","post-deterministic-systems-autonomous-infra-en","Post-Deterministic Systems for Autonomous Infra","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781010190497-1grq.png","2026-06-09T13:02:33.235795+00:00",{"id":60,"slug":61,"title":62,"cover_image":63,"image_url":63,"created_at":64,"category":13},"53ec2203-e127-4bf8-8b3d-2dce8d156a54","causal-learnability-formal-language-tasks-en","Causal methods for measuring task learnability","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780987698514-ky8m.png","2026-06-09T06:47:35.103221+00:00",{"id":66,"slug":67,"title":68,"cover_image":69,"image_url":69,"created_at":70,"category":13},"55e7197e-f114-4b6c-b3e2-af1a3cd9dfa4","rl-training-hands-off-control-gradually-en","RL Training That Hands Off Control Gradually","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780986801034-gf8m.png","2026-06-09T06:32:33.516452+00:00",{"id":72,"slug":73,"title":74,"cover_image":75,"image_url":75,"created_at":76,"category":13},"93fc6735-b524-4baf-989f-645c4c47d593","omnigamearena-vlm-game-agent-benchmark-en","OmniGameArena benchmarks VLM game agents better","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780985895695-ugcj.png","2026-06-09T06:17:32.668876+00:00",{"id":78,"slug":79,"title":80,"cover_image":81,"image_url":81,"created_at":82,"category":13},"9f0c9505-6d75-411c-ba46-2382e8f295a5","turboquant-cuts-kv-cache-memory-6x-google-tests-en","TurboQuant cuts KV cache memory 6x in Google tests","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780906679116-fqdo.png","2026-06-08T08:17:22.276769+00:00",[84,89,94,99,104,109,114,119,124,129],{"id":85,"slug":86,"title":87,"created_at":88},"a2715e72-1fe8-41b3-abb1-d0cf1f710189","ai-predictions-2026-big-changes-en","AI Predictions for 2026: Brace for Big Changes","2026-03-26T01:25:07.788356+00:00",{"id":90,"slug":91,"title":92,"created_at":93},"8404bd7b-4c2f-4109-9ec4-baf29d88af2b","ml-papers-of-the-week-github-research-desk-en","ML Papers of the Week Turns GitHub Into a Research Desk","2026-03-27T01:11:39.480259+00:00",{"id":95,"slug":96,"title":97,"created_at":98},"87897a94-8065-4464-a016-1f23e89e17cc","ai-ml-conferences-to-watch-in-2026-en","AI\u002FML Conferences to Watch in 2026","2026-03-27T01:51:54.184108+00:00",{"id":100,"slug":101,"title":102,"created_at":103},"6f1987cf-25f3-47a4-b3e6-db0997695be8","openclaw-agents-manipulated-self-sabotage-en","OpenClaw Agents Can Be Manipulated Into Failure","2026-03-28T03:03:18.899465+00:00",{"id":105,"slug":106,"title":107,"created_at":108},"a53571ad-735a-4178-9f93-cb09b699d99c","vega-driving-language-instructions-en","Vega: Driving with Natural Language Instructions","2026-03-28T14:54:04.698882+00:00",{"id":110,"slug":111,"title":112,"created_at":113},"a34581d6-f36e-46da-88bb-582fb3e7425c","personalizing-autonomous-driving-styles-en","Drive My Way: Personalizing Autonomous Driving Styles","2026-03-28T14:54:26.148181+00:00",{"id":115,"slug":116,"title":117,"created_at":118},"2bc1ad7f-26ce-4f02-9885-803b35fd229d","training-knowledge-bases-writeback-rag-en","Training Knowledge Bases with WriteBack-RAG","2026-03-28T14:54:45.643433+00:00",{"id":120,"slug":121,"title":122,"created_at":123},"71adc507-3c54-4605-bbe2-c966acd6187e","packforcing-long-video-generation-en","PackForcing: Efficient Long-Video Generation Method","2026-03-28T14:55:02.646943+00:00",{"id":125,"slug":126,"title":127,"created_at":128},"675942ef-b9ec-4c5f-a997-381250b6eacb","pixelsmile-facial-expression-editing-en","PixelSmile Framework Enhances Facial Expression Editing","2026-03-28T14:55:20.633463+00:00",{"id":130,"slug":131,"title":132,"created_at":133},"6954fa2b-8b66-4839-884b-e46f89fa1bc3","adaptive-block-scaled-data-types-en","IF4: Smarter 4-Bit Quantization That Adapts to Your Data","2026-03-31T06:00:36.65963+00:00"]