[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tag-llm-reasoning":3},{"tag":4,"articles":11},{"id":5,"name":6,"slug":7,"article_count":8,"description_zh":9,"description_en":10},"dd34aa2b-f099-4561-bf4f-b365609d3209","LLM reasoning","llm-reasoning",3,"LLM 推理指模型在數學、物理與多步驟任務中進行規劃、驗證與錯誤修正的能力。這個主題涵蓋強化學習、pre-train space 訓練、以及用物理模擬器產生合成資料，反映模型如何從答案生成走向可檢驗的推理。","LLM reasoning covers how models plan, verify, and correct multi-step solutions in math, physics, and other structured tasks. Recent work spans reinforcement learning in pre-train space, synthetic simulator data, and zero-shot gains on benchmark problems beyond web QA.",[12,21],{"id":13,"slug":14,"title":15,"summary":16,"category":17,"image_url":18,"cover_image":18,"language":19,"created_at":20},"1ff5ab46-edd3-4ee3-b21e-a186f08ed550","autotts-llms-discover-test-time-scaling-zh","AutoTTS讓LLM自己找推理策略","AutoTTS把 test-time scaling 變成環境搜尋問題，讓 LLM 在推理時自動找出更省算力的策略，而不是靠人手調 heuristics。","research","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778479857028-4w21.png","zh","2026-05-11T06:10:29.812426+00:00",{"id":22,"slug":23,"title":24,"summary":25,"category":17,"image_url":26,"cover_image":26,"language":19,"created_at":27},"ff7d80fb-56b3-4d87-94cc-ad38b20f6e5d","physics-simulators-rl-llm-reasoning-zh","用物理模擬器訓練 LLM 推理","研究者把物理模擬器變成強化學習資料來源，訓練 LLM 學會物理推理，並在 IPhO 題目上帶來 zero-shot 提升。","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1776146993167-rwzt.png","2026-04-14T06:09:32.812614+00:00"]