[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tag-llm-reasoning":3},{"tag":4,"articles":11},{"id":5,"name":6,"slug":7,"article_count":8,"description_zh":9,"description_en":10},"dd34aa2b-f099-4561-bf4f-b365609d3209","LLM reasoning","llm-reasoning",3,"LLM 推理指模型在數學、物理與多步驟任務中進行規劃、驗證與錯誤修正的能力。這個主題涵蓋強化學習、pre-train space 訓練、以及用物理模擬器產生合成資料，反映模型如何從答案生成走向可檢驗的推理。","LLM reasoning covers how models plan, verify, and correct multi-step solutions in math, physics, and other structured tasks. Recent work spans reinforcement learning in pre-train space, synthetic simulator data, and zero-shot gains on benchmark problems beyond web QA.",[12,21],{"id":13,"slug":14,"title":15,"summary":16,"category":17,"image_url":18,"cover_image":18,"language":19,"created_at":20},"d1bbd868-15d4-459c-9e2b-2626c779b4ef","prerl-training-llms-in-pre-train-space-en","PreRL: Training LLMs in pre-train space","PreRL shifts reinforcement learning from P(y|x) to P(y), using reward-driven updates in pre-train space to improve reasoning and exploration.","research","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1776319621187-aig1.png","en","2026-04-16T06:06:38.24406+00:00",{"id":22,"slug":23,"title":24,"summary":25,"category":17,"image_url":26,"cover_image":26,"language":19,"created_at":27},"8a95a2d8-eb3a-442c-b9c4-c835c79d75c5","physics-simulators-rl-llm-reasoning-en","Physics Simulators as RL Data for LLM Reasoning","Researchers train LLMs on synthetic physics from simulators and report zero-shot gains on IPhO problems, showing a new path beyond web QA data.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1776146992039-q2sc.png","2026-04-14T06:09:33.23692+00:00"]