[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tag-llm-fine-tuning":3},{"tag":4,"articles":11},{"id":5,"name":6,"slug":7,"article_count":8,"description_zh":9,"description_en":10},"93aa15ea-c3f0-4f7d-a7c2-b22a81051ec1","LLM fine-tuning","llm-fine-tuning",3,"LLM 微調指的是在既有基礎模型上，透過監督式資料或強化學習調整模型行為，讓它更貼近特定任務與領域。這個主題涵蓋資料準備、訓練穩定性、評估與部署，例如 PPO 的替代方法、BPO\u002FGBPO，以及用 S3、SageMaker 和 MLflow 加速實作。","LLM fine-tuning covers the methods used to adapt a base model to a specific task or domain, from supervised training to RL-based alignment. It matters because stability, data pipelines, and tooling shape real outcomes; examples include BPO\u002FGBPO as PPO alternatives and AWS workflows with S3, SageMaker, and MLflow.",[12],{"id":13,"slug":14,"title":15,"summary":16,"category":17,"image_url":18,"cover_image":18,"language":19,"created_at":20},"7a04d752-3f1a-4df7-b7c5-8bcb1e69c565","bounded-ratio-reinforcement-learning-ppo-zh","BRRL 取代 PPO 剪裁：BPO 與 GBPO 的穩定性升級","BRRL 把 PPO 的剪裁目標改寫成有界比例框架，推出 BPO 與 GBPO，主打更穩定的更新與更清楚的理論基礎。","research","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1776751794578-t5j7.png","zh","2026-04-21T06:09:39.661696+00:00"]