[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-openai-plan-automated-ai-researcher-en":3,"article-related-openai-plan-automated-ai-researcher-en":24,"series-ai-agent-3b0bf479-e4ae-4703-9666-721a7e0cdb91":77},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":11,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":11,"views":22,"created_at":23,"published_at":23,"topic_cluster_id":11},"3b0bf479-e4ae-4703-9666-721a7e0cdb91","openai-plan-automated-ai-researcher-en","OpenAI’s plan for an automated AI researcher","\u003Cp>OpenAI says it wants an \u003Ca href=\"https:\u002F\u002Fopenai.com\" target=\"_blank\" rel=\"noopener\">AI researcher\u003C\u002Fa> that can work on hard problems with very little supervision. The first milestone is an autonomous research intern by September, and the longer-term target is a multi-agent system in 2028. That is a bold timeline for a company whose latest model, GPT-5, still makes plenty of mistakes on scientific tasks.\u003C\u002Fp>\u003Cp>The plan matters because OpenAI is shifting a lot of its research energy into one goal: building software that can spend hours, or even days, on a problem without a human micromanaging every step. If it works, the tool could help with math, physics, biology, chemistry, and some policy or business questions. If it falls short, it will be another reminder that agentic AI gets harder fast once you ask it to chain tasks together.\u003C\u002Fp>\u003Ch2>OpenAI’s new north star is an AI researcher\u003C\u002Fh2>\u003Cp>OpenAI chief scientist \u003Ca href=\"https:\u002F\u002Fopenai.com\u002Findex\u002Fjakub-pachocki\u002F\" target=\"_blank\" rel=\"noopener\">Jakub Pachocki\u003C\u002Fa> told \u003Ca href=\"https:\u002F\u002Fwww.technologyreview.com\u002F2026\u002F03\u002F20\u002F1134438\u002Fopenai-is-throwing-everything-into-building-a-fully-automated-researcher\u002F\" target=\"_blank\" rel=\"noopener\">MIT Technology Review\u003C\u002Fa> that the company is treating this as a long-term research goal. The idea is to connect the best parts of its reasoning models, coding agents, and interpretability work into one system that can tackle large problems on its own.\u003C\u002Fp>\u003Cp>OpenAI already has a useful testbed in \u003Ca href=\"https:\u002F\u002Fopenai.com\u002Fcodex\u002F\" target=\"_blank\" rel=\"noopener\">Codex\u003C\u002Fa>, its coding agent. Pachocki described Codex as an early version of the researcher system, and said he expects it to improve a lot. The argument is simple: if an agent can write code, run experiments, and keep track of intermediate steps, then it can eventually help with broader scientific work too.\u003C\u002Fp>\u003Cp>That logic is attractive, but it also hides a lot of hard engineering. A coding task with a clear output is one thing. A research project with fuzzy goals, messy data, and a dozen possible dead ends is another. OpenAI is betting that better models plus longer run times will close that gap.\u003C\u002Fp>\u003Cul>\u003Cli>OpenAI wants an autonomous AI research intern by September\u003C\u002Fli>\u003Cli>The company is aiming for a fuller multi-agent research system in 2028\u003C\u002Fli>\u003Cli>Pachocki says the target includes math, physics, biology, chemistry, and policy work\u003C\u002Fli>\u003Cli>OpenAI says many of its technical staff now use Codex in daily work\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Why OpenAI thinks this can work\u003C\u002Fh2>\u003Cp>Pachocki’s case rests on a few visible trends. First, models have gotten better at reasoning through problems step by step. Second, they can now operate for longer stretches without human intervention. Third, OpenAI is training on harder tasks that force models to manage large context windows and split work into subtasks.\u003C\u002Fp>\u003Cp>He also points to the jump from GPT-3 to \u003Ca href=\"https:\u002F\u002Fopenai.com\u002Findex\u002Fgpt-4\u002F\" target=\"_blank\" rel=\"noopener\">GPT-4\u003C\u002Fa> as evidence that general capability gains can translate into longer, more coherent work sessions. That is a fair point. A model that can hold a problem in mind for longer is already more useful than one that loses the thread after a few turns.\u003C\u002Fp>\u003Cblockquote>“I think we are getting close to a point where we’ll have models capable of working indefinitely in a coherent way just like people do,” Pachocki said in the interview with MIT Technology Review.\u003C\u002Fblockquote>\u003Cp>That quote is doing a lot of work. It implies that the gap between a chatbot and a research assistant is mostly a matter of scale and training, not a fundamental barrier. Plenty of researchers are not convinced, but OpenAI clearly is acting as if the remaining distance is engineering, not philosophy.\u003C\u002Fp>\u003Cp>The company is also using task sets that are easier to verify than open-ended science. Math contests and coding challenges are useful because they produce clean signals. If a model can solve those well, OpenAI can say the machinery works before it is exposed to the messier world of lab work or product research.\u003C\u002Fp>\u003Ch2>The numbers that matter more than the hype\u003C\u002Fh2>\u003Cp>There is a real reason to keep the skepticism dial turned up. Doug Downey, a research scientist at the \u003Ca href=\"https:\u002F\u002Fallenai.org\" target=\"_blank\" rel=\"noopener\">Allen Institute for AI\u003C\u002Fa>, said his team tested several top-tier LLMs on scientific tasks last summer. \u003Ca href=\"https:\u002F\u002Fopenai.com\u002Findex\u002Fgpt-5\u002F\" target=\"_blank\" rel=\"noopener\">GPT-5\u003C\u002Fa> did best, but it still made lots of errors. That matters because research work is often a chain of steps, and one bad step can poison the whole result.\u003C\u002Fp>\u003Cp>Downey’s warning is practical, not philosophical: if you need an agent to complete several tasks in sequence, the chance of failure rises at each step. That is very different from asking a model to answer a single question or draft a short block of code. The more autonomy you give it, the more you need confidence in every link in the chain.\u003C\u002Fp>\u003Cul>\u003Cli>OpenAI says GPT-5 powers Codex\u003C\u002Fli>\u003Cli>OpenAI released GPT-5.4 two weeks before the interview was published\u003C\u002Fli>\u003Cli>Downey’s team tested multiple top-tier LLMs on scientific tasks last summer\u003C\u002Fli>\u003Cli>GPT-5 ranked first in those tests, but still produced many errors\u003C\u002Fli>\u003C\u002Ful>\u003Cp>There is also a workflow question. Pachocki says his own habits changed after seeing what recent models could do. He still likes typing code by hand in \u003Ca href=\"https:\u002F\u002Fwww.vim.org\" target=\"_blank\" rel=\"noopener\">Vim\u003C\u002Fa>, but he now uses models to run experiments over a weekend that would have taken him a week to write himself. That is the kind of productivity gain people actually feel.\u003C\u002Fp>\u003Cp>OpenAI is not alone here. Anthropic has \u003Ca href=\"https:\u002F\u002Fwww.anthropic.com\u002Fclaude-code\" target=\"_blank\" rel=\"noopener\">Claude Code\u003C\u002Fa>, and Google DeepMind keeps pushing its own agentic research systems. The difference is that OpenAI is framing the entire company around one audacious objective, and it has attached dates to it. Dates are useful because they force accountability. They also make failure easier to measure.\u003C\u002Fp>\u003Ch2>What could go wrong if the system gets too capable\u003C\u002Fh2>\u003Cp>Pachocki did not dodge the safety issue. He said OpenAI talks about the risks constantly, especially if a system can run a full research program with limited oversight. The obvious problems are model mistakes, hacking, and simple misinterpretation of instructions. The darker worries are about systems that can generate harmful cyber ideas or help design biological threats.\u003C\u002Fp>\u003Cp>OpenAI’s current answer is \u003Ca href=\"https:\u002F\u002Fopenai.com\u002Findex\u002Fchain-of-thought-monitoring\u002F\" target=\"_blank\" rel=\"noopener\">chain-of-thought monitoring\u003C\u002Fa>, which uses the model’s scratchpad-style reasoning traces to watch what it is doing. The company wants to inspect those traces with other models and catch bad behavior early. That is a sensible start, but it is not the same as true control.\u003C\u002Fp>\u003Cp>Here is the uncomfortable part: the more autonomous these systems become, the less useful simple guardrails look. Sandboxing helps. Monitoring helps. Human review helps. Yet none of those remove the basic problem that a very capable system can still surprise you. OpenAI knows that, which is why Pachocki kept returning to restrictions and supervision.\u003C\u002Fp>\u003Cp>The company’s vision is easy to summarize and hard to build: let an AI do the kind of research work that would normally take a human several days, then stretch that to larger, messier projects. If OpenAI hits its September target, we will learn a lot about how far current agent systems can go before they break. If it misses, the gap between impressive demos and real research automation will look wider than the hype cycle suggests.\u003C\u002Fp>\u003Cp>My bet is simple: the first version of this researcher will be useful inside narrow, well-defined workflows, and disappointing anywhere the problem statement is fuzzy. The real question for 2028 is whether OpenAI can turn a fast coding agent into something scientists actually trust with open-ended work, or whether the errors pile up before the agent ever gets close to running a lab on its own.\u003C\u002Fp>","OpenAI wants an autonomous research intern by September and a multi-agent AI researcher by 2028, but errors still pile up fast.","www.technologyreview.com","https:\u002F\u002Fwww.technologyreview.com\u002F2026\u002F03\u002F20\u002F1134438\u002Fopenai-is-throwing-everything-into-building-a-fully-automated-researcher\u002F",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1774499352284-qah4.png","ai-agent","en","7379b422-576e-45df-ad5a-d57a0d9dd467",[17,18,19,20,21],"OpenAI","AI researcher","Codex","GPT-5","agentic AI",3,"2026-03-28T03:17:42.312819+00:00",{"tags":25,"relatedLang":36,"relatedPosts":40},[26,28,30,32,34],{"name":18,"slug":27},"ai-researcher",{"name":17,"slug":29},"openai",{"name":20,"slug":31},"gpt-5",{"name":19,"slug":33},"codex",{"name":21,"slug":35},"agentic-ai",{"id":15,"slug":37,"title":38,"language":39},"openai-plan-automated-ai-researcher-zh","OpenAI 想做自動化 AI 研究員","zh",[41,47,53,59,65,71],{"id":42,"slug":43,"title":44,"cover_image":45,"image_url":45,"created_at":46,"category":13},"5efa67dd-b9f7-4a2f-8c68-3a4bc6a6b7d9","claude-code-dynamic-workflow-ai-harness-en","Claude Code 动态工作流：AI 自写 Harness","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781035372495-9czj.png","2026-06-09T20:02:22.33375+00:00",{"id":48,"slug":49,"title":50,"cover_image":51,"image_url":51,"created_at":52,"category":13},"2bd28e0e-0f4b-4987-a961-28763c1e1926","agent-orchestration-enterprise-ai-layer-en","Agent orchestration is the missing layer for enterprise AI","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780984981174-08mj.png","2026-06-09T06:02:31.384174+00:00",{"id":54,"slug":55,"title":56,"cover_image":57,"image_url":57,"created_at":58,"category":13},"95684312-23dc-4a78-a917-df14d132c5fa","ai-agents-use-blockchain-trust-layer-en","AI agents use blockchain as a trust layer","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780980506080-ki4s.png","2026-06-09T04:48:01.710214+00:00",{"id":60,"slug":61,"title":62,"cover_image":63,"image_url":63,"created_at":64,"category":13},"0208e47f-7d4c-4473-a0f9-4cd193b5c139","8-rag-patterns-demos-into-prod-en","8 RAG patterns that turn demos into prod","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780971552707-qpl7.png","2026-06-09T02:18:36.760049+00:00",{"id":66,"slug":67,"title":68,"cover_image":69,"image_url":69,"created_at":70,"category":13},"b413d484-6786-4c32-abdc-77f010ac7eba","fine-tuning-beats-rag-style-not-facts-en","Fine-tuning beats RAG when the goal is style, not facts","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780924681800-5xji.png","2026-06-08T13:17:25.701649+00:00",{"id":72,"slug":73,"title":74,"cover_image":75,"image_url":75,"created_at":76,"category":13},"57beb8b4-c233-400f-b95b-a97be1cf9d02","openclaw-small-business-ai-staff-en","OpenClaw shows how small businesses use AI staff","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780904882032-yp13.png","2026-06-08T07:47:27.730921+00:00",[78,83,88,93,98,103,108,109,114,119],{"id":79,"slug":80,"title":81,"created_at":82},"03db8de8-8dc2-4ac1-9cf7-898782efbb1f","anthropic-claude-ai-agent-task-automation-en","Anthropic's Claude AI Agent: A New Era of Task Automation","2026-03-25T16:25:06.513026+00:00",{"id":84,"slug":85,"title":86,"created_at":87},"045d1abc-190d-4594-8c95-91e2a26f0c5a","googles-2026-ai-agent-report-decoded-en","Google’s 2026 AI Agent Report, Decoded","2026-03-26T11:15:23.046616+00:00",{"id":89,"slug":90,"title":91,"created_at":92},"e64aba21-254b-4f93-aa21-837484bb52ec","kimi-k25-review-stronger-still-not-legend-en","Kimi K2.5 review: stronger, still not a legend","2026-03-27T07:15:55.385951+00:00",{"id":94,"slug":95,"title":96,"created_at":97},"30dfb781-a1b2-4add-aebe-b3df40247c37","claude-code-controls-mac-desktop-en","Claude Code now controls your Mac desktop","2026-03-28T03:01:59.384091+00:00",{"id":99,"slug":100,"title":101,"created_at":102},"254405b6-7833-4800-8e13-f5196deefbe6","cloudflare-100x-faster-ai-agent-sandbox-en","Cloudflare’s 100x Faster AI Agent Sandbox","2026-03-28T03:09:44.356437+00:00",{"id":104,"slug":105,"title":106,"created_at":107},"04f29b7f-9b91-4306-89a7-97d725e6e1ba","openai-backs-isara-agent-swarm-bet-en","OpenAI backs Isara’s agent-swarm bet","2026-03-28T03:15:27.849766+00:00",{"id":4,"slug":5,"title":6,"created_at":23},{"id":110,"slug":111,"title":112,"created_at":113},"fe91bce0-b85d-4efa-a207-24ae9939c29f","harness-engineering-ai-agent-reliability-2026","Harness Engineering: From Bridle to Operating System, The Missing Link in AI Agent Reliability","2026-03-31T06:36:55.648751+00:00",{"id":115,"slug":116,"title":117,"created_at":118},"7a09007d-820f-43b3-8607-8ad1bfcb94c8","mcp-explained-from-prompts-to-production-en","MCP Explained: From Prompts to Production","2026-04-01T09:24:40.089177+00:00",{"id":120,"slug":121,"title":122,"created_at":123},"116d5ee9-a4f1-4b5a-aac5-5d035dd22bbe","amazon-bedrock-agents-multi-agent-workflows-en","Amazon Bedrock Agents Gets Multi-Agent Workflows","2026-04-01T09:30:30.197685+00:00"]