Tag

LLM fine-tuning

LLM fine-tuning covers the methods used to adapt a base model to a specific task or domain, from supervised training to RL-based alignment. It matters because stability, data pipelines, and tooling shape real outcomes; examples include BPO/GBPO as PPO alternatives and AWS workflows with S3, SageMaker, and MLflow.

3 articles

Research/May 5

Why Latent Agents Proves Multi-Agent Debate Should Be Internalized

Latent Agents shows multi-agent debate works best when a single model internalizes it.

Research/Apr 21

Why Bounded Ratio RL Replaces PPO's Clipped Objective

BRRL gives PPO a cleaner theory, with BPO and GBPO aiming for more stable policy updates in control and LLM fine-tuning.

Model Releases/Apr 2

AWS uses S3 to speed LLM fine-tuning

AWS shows how SageMaker Unified Studio, S3, and MLflow can fine-tune Llama 3.2 11B Vision Instruct on DocVQA data.