[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-why-distribution-fine-tuning-beats-sft-writing-en":3,"article-related-why-distribution-fine-tuning-beats-sft-writing-en":31,"series-research-57c29f14-f339-40f7-94a1-d7c8b9ef48ae":84},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":23,"views":27,"created_at":28,"published_at":29,"topic_cluster_id":30},"57c29f14-f339-40f7-94a1-d7c8b9ef48ae","why-distribution-fine-tuning-beats-sft-writing-en","Why Distribution Fine Tuning beats SFT for LLM writing","\u003Cp data-speakable=\"summary\">Distribution Fine Tuning beats SFT because it matches human text distributions more closely.\u003C\u002Fp>\u003Cp>Distribution Fine Tuning is the right answer to slop-filled \u003Ca href=\"\u002Ftag\u002Fllm\">LLM\u003C\u002Fa> writing, and SFT alone is not enough to produce text that reads like human prose.\u003C\u002Fp>\u003Cp>Rosmine’s case is simple: models trained with supervised fine-tuning still overuse phrases, drift into generic structure, and miss the texture of the training set even when they follow prompts well. The post backs that claim with three separate measures, including \u003Ca href=\"\u002Ftag\u002Ftoken\">token\u003C\u002Fa> distribution distance, embedding-level distance, and a judge model preference score. On the reported benchmark, DFT beats an SFT “super baseline” on the metrics that matter for writing quality, and it does so without requiring a giant jump in compute or model size. That is not a small improvement. It is evidence that the standard post-training stack is optimizing the wrong target.\u003C\u002Fp>\u003Ch2>First argument: SFT optimizes samples, not distributions\u003C\u002Fh2>\u003Cp>SFT teaches a model to imitate individual examples, but writing quality is a distributional property. If a model learns the right answer format while missing the frequency of details, sentence shapes, and phrase variety in the source data, it can still look polished and feel wrong. That is exactly what the Rosmine post measures with MMD and token L2 distance. The point is not that the model is unhelpful. The point is that it is statistically off. In writing, being statistically off shows up as repetition, generic transitions, and the same tired rhetorical flourishes.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779321840640-f85c.png\" alt=\"Why Distribution Fine Tuning beats SFT for LLM writing\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>The numbers make that gap hard to dismiss. In the post’s table, a 14B DFT model reaches MMD 0.018 and JMQ 0.80, while the 14B SFT super baseline sits at MMD 0.037 and JMQ 0.49. That is a huge change in judge preference and a clear reduction in distribution mismatch. The author also reports that DFT improves creativity by 164%, coherence by 28%, clarity by 16%, and meaningful detail by 146% versus the SFT baseline. Whatever one thinks of the exact metric design, the direction is consistent: matching the training distribution matters more than simply scaling up instruction following.\u003C\u002Fp>\u003Ch2>Second argument: “slop signs” are a training problem, not a style problem\u003C\u002Fh2>\u003Cp>People often talk about slop as if it were just an aesthetic complaint about model tone. It is not. It is a symptom of a training pipeline that rewards the wrong behaviors. The article points to overused tokens and phrases such as em dashes, “it’s not X, it’s Y,” and generic abstractions as artifacts of post-training, especially RLHF-driven reward hacking. That framing is persuasive because it connects surface-level writing failures to the mechanics of optimization. If the model keeps learning that safe, high-agreement phrasing wins, it will keep producing safe, high-agreement phrasing. No amount of prompt polishing fixes that root cause.\u003C\u002Fp>\u003Cp>The sample outputs reinforce the point. At one temperature, the SFT model repeats the same subject over and over. At another, it veers into incoherent transitions and even non-English characters. DFT is presented as the fix because it pushes outputs back toward the training distribution rather than toward a generic “helpful” style. That matters for anyone building customer-facing systems. A chatbot that is technically compliant but stylistically brittle still fails in practice. Users notice when every paragraph sounds like a template, and they notice even more when the model’s confidence masks shallow content.\u003C\u002Fp>\u003Ch2>The counter-argument\u003C\u002Fh2>\u003Cp>The strongest objection is that DFT may simply be overfitting the appearance of human writing. A model can score well on judge preference, token frequency, and embedding similarity while still being less useful, less truthful, or less adaptable than a plain SFT model. There is also a real methodological concern: if the evaluation relies on a specific judge model, a specific dataset slice, and a specific notion of “human-like,” then the gains may not transfer cleanly across domains. For code, legal drafting, support replies, and creative fiction, the right distribution is not the same.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779321829407-iwiv.png\" alt=\"Why Distribution Fine Tuning beats SFT for LLM writing\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>That objection is valid, but it does not rescue SFT. It only defines the boundary of the claim. The right conclusion is not that DFT solves every output problem. The right conclusion is that current post-training stacks are leaving writing quality on the table because they optimize for helpfulness and preference without enough pressure to preserve the actual distribution of good text. Rosmine’s results are strong enough to show that distribution matching is a missing layer. Even if DFT needs domain-specific tuning and broader validation, the burden has shifted. Anyone defending SFT as sufficient now has to explain why a method that better matches human text should not be preferred for writing tasks.\u003C\u002Fp>\u003Ch2>What to do with this\u003C\u002Fh2>\u003Cp>If you are an engineer, stop treating writing quality as a prompt-engineering issue and start measuring it as a distribution problem. Build evals that track repetition, content richness, and human-vs-model preference together, then test post-training methods against a fixed baseline instead of cherry-picking sampler settings. If you are a PM or founder, do not ship a “smart” writing product that merely sounds compliant. Demand outputs that vary naturally, carry details, and survive side-by-side comparison with human text. The practical lesson is blunt: if your model writes like a template, the fix is in training, not in wording.\u003C\u002Fp>","Distribution Fine Tuning beats SFT because it matches human text distributions more closely.","rosmine.ai","https:\u002F\u002Frosmine.ai\u002F2026\u002F05\u002F18\u002Ffixing-llm-writing-with-distribution-fine-tuning\u002F",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779321840640-f85c.png","research","en","63eabb4a-63f4-4ea4-b959-85470c2e5691",[17,18,19,20,21,22],"Distribution Fine Tuning","SFT","RLHF","MMD","Judge Model Quality","LLM writing",[24,25,26],"SFT alone does not match the training distribution well enough for high-quality writing.","DFT improves human-likeness by optimizing distribution-level metrics, not just sample imitation.","Slop signs are best treated as a training and evaluation problem, not a prompt problem.",1,"2026-05-21T00:03:25.641523+00:00","2026-05-21T00:03:25.619+00:00","3103988e-c4fe-45e3-98ab-846500c9d507",{"tags":32,"relatedLang":43,"relatedPosts":47},[33,35,37,39,41],{"name":21,"slug":34},"judge-model-quality",{"name":19,"slug":36},"rlhf",{"name":18,"slug":38},"sft",{"name":17,"slug":40},"distribution-fine-tuning",{"name":20,"slug":42},"mmd",{"id":15,"slug":44,"title":45,"language":46},"why-distribution-fine-tuning-beats-sft-writing-zh","為什麼 Distribution Fine Tuning 比 SFT 更適合 …","zh",[48,54,60,66,72,78],{"id":49,"slug":50,"title":51,"cover_image":52,"image_url":52,"created_at":53,"category":13},"59d28ae7-1e4e-42f0-ac84-3dde3f701419","phase-diagram-multimodal-learning-en","A phase diagram for multimodal learning","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781071379004-sk9p.png","2026-06-10T06:02:31.601939+00:00",{"id":55,"slug":56,"title":57,"cover_image":58,"image_url":58,"created_at":59,"category":13},"850449f2-e75b-4dbf-97c0-3590c6cbf097","crdts-keep-replicas-in-sync-without-locks-en","CRDTs keep replicas in sync without locks","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781011086602-cokl.png","2026-06-09T13:17:35.890527+00:00",{"id":61,"slug":62,"title":63,"cover_image":64,"image_url":64,"created_at":65,"category":13},"7c6b6428-ba8d-4c59-840b-cf96a95139e5","post-deterministic-systems-autonomous-infra-en","Post-Deterministic Systems for Autonomous Infra","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781010190497-1grq.png","2026-06-09T13:02:33.235795+00:00",{"id":67,"slug":68,"title":69,"cover_image":70,"image_url":70,"created_at":71,"category":13},"53ec2203-e127-4bf8-8b3d-2dce8d156a54","causal-learnability-formal-language-tasks-en","Causal methods for measuring task learnability","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780987698514-ky8m.png","2026-06-09T06:47:35.103221+00:00",{"id":73,"slug":74,"title":75,"cover_image":76,"image_url":76,"created_at":77,"category":13},"55e7197e-f114-4b6c-b3e2-af1a3cd9dfa4","rl-training-hands-off-control-gradually-en","RL Training That Hands Off Control Gradually","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780986801034-gf8m.png","2026-06-09T06:32:33.516452+00:00",{"id":79,"slug":80,"title":81,"cover_image":82,"image_url":82,"created_at":83,"category":13},"93fc6735-b524-4baf-989f-645c4c47d593","omnigamearena-vlm-game-agent-benchmark-en","OmniGameArena benchmarks VLM game agents better","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780985895695-ugcj.png","2026-06-09T06:17:32.668876+00:00",[85,90,95,100,105,110,115,120,125,130],{"id":86,"slug":87,"title":88,"created_at":89},"a2715e72-1fe8-41b3-abb1-d0cf1f710189","ai-predictions-2026-big-changes-en","AI Predictions for 2026: Brace for Big Changes","2026-03-26T01:25:07.788356+00:00",{"id":91,"slug":92,"title":93,"created_at":94},"8404bd7b-4c2f-4109-9ec4-baf29d88af2b","ml-papers-of-the-week-github-research-desk-en","ML Papers of the Week Turns GitHub Into a Research Desk","2026-03-27T01:11:39.480259+00:00",{"id":96,"slug":97,"title":98,"created_at":99},"87897a94-8065-4464-a016-1f23e89e17cc","ai-ml-conferences-to-watch-in-2026-en","AI\u002FML Conferences to Watch in 2026","2026-03-27T01:51:54.184108+00:00",{"id":101,"slug":102,"title":103,"created_at":104},"6f1987cf-25f3-47a4-b3e6-db0997695be8","openclaw-agents-manipulated-self-sabotage-en","OpenClaw Agents Can Be Manipulated Into Failure","2026-03-28T03:03:18.899465+00:00",{"id":106,"slug":107,"title":108,"created_at":109},"a53571ad-735a-4178-9f93-cb09b699d99c","vega-driving-language-instructions-en","Vega: Driving with Natural Language Instructions","2026-03-28T14:54:04.698882+00:00",{"id":111,"slug":112,"title":113,"created_at":114},"a34581d6-f36e-46da-88bb-582fb3e7425c","personalizing-autonomous-driving-styles-en","Drive My Way: Personalizing Autonomous Driving Styles","2026-03-28T14:54:26.148181+00:00",{"id":116,"slug":117,"title":118,"created_at":119},"2bc1ad7f-26ce-4f02-9885-803b35fd229d","training-knowledge-bases-writeback-rag-en","Training Knowledge Bases with WriteBack-RAG","2026-03-28T14:54:45.643433+00:00",{"id":121,"slug":122,"title":123,"created_at":124},"71adc507-3c54-4605-bbe2-c966acd6187e","packforcing-long-video-generation-en","PackForcing: Efficient Long-Video Generation Method","2026-03-28T14:55:02.646943+00:00",{"id":126,"slug":127,"title":128,"created_at":129},"675942ef-b9ec-4c5f-a997-381250b6eacb","pixelsmile-facial-expression-editing-en","PixelSmile Framework Enhances Facial Expression Editing","2026-03-28T14:55:20.633463+00:00",{"id":131,"slug":132,"title":133,"created_at":134},"6954fa2b-8b66-4839-884b-e46f89fa1bc3","adaptive-block-scaled-data-types-en","IF4: Smarter 4-Bit Quantization That Adapts to Your Data","2026-03-31T06:00:36.65963+00:00"]