[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-aws-rft-llm-as-a-judge-nova-en":3,"tags-aws-rft-llm-as-a-judge-nova-en":34,"related-lang-aws-rft-llm-as-a-judge-nova-en":46,"related-posts-aws-rft-llm-as-a-judge-nova-en":50,"series-model-release-f4dd6aa0-0b9a-4963-a186-66764c4c7442":87},{"id":4,"title":5,"content":6,"summary":7,"source":8,"source_url":9,"author":10,"image_url":11,"keywords":12,"language":18,"translated_content":10,"views":19,"is_premium":20,"created_at":21,"updated_at":21,"cover_image":11,"published_at":22,"rewrite_status":23,"rewrite_error":10,"rewritten_from_id":24,"slug":25,"category":26,"related_article_id":27,"status":28,"google_indexed_at":29,"x_posted_at":10,"tweet_text":10,"title_rewritten_at":10,"title_original":10,"key_takeaways":30,"topic_cluster_id":10,"embedding":10,"is_canonical_seed":20},"f4dd6aa0-0b9a-4963-a186-66764c4c7442","AWS details RFT with LLM-as-a-judge for Nova","\u003Cp data-speakable=\"summary\">\u003Ca href=\"\u002Ftag\u002Faws\">AWS\u003C\u002Fa> explains how reinforcement fine-tuning can use an \u003Ca href=\"\u002Ftag\u002Fllm\">LLM\u003C\u002Fa> judge to score model outputs and improve alignment.\u003C\u002Fp>\u003Cp>On 30 Apr 2026, AWS published a guide to reinforcement fine-tuning (RFT) with LLM-as-a-judge for Amazon Nova models on \u003Ca href=\"https:\u002F\u002Faws.amazon.com\u002F\" target=\"_blank\" rel=\"noopener\">AWS\u003C\u002Fa> and \u003Ca href=\"https:\u002F\u002Faws.amazon.com\u002Fsagemaker\u002F\" target=\"_blank\" rel=\"noopener\">Amazon SageMaker AI\u003C\u002Fa>. The post says the method can outperform base models and supervised fine-tuning in a legal contract review case study, where a GPT OSS 120B judge helped train a model to flag risks, assessments, and actions from contract text.\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>項目\u003C\u002Fth>\u003Cth>數值\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>Publish date\u003C\u002Ftd>\u003Ctd>30 Apr 2026\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Judge model in case study\u003C\u002Ftd>\u003Ctd>GPT OSS 120B\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Production timeout recommendation\u003C\u002Ftd>\u003Ctd>15 minutes\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Provisioned concurrency guidance\u003C\u002Ftd>\u003Ctd>~100\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>What changed\u003C\u002Fh2>\u003Cp>AWS frames LLM-as-a-judge as a more flexible reward signal than simple rule-based scoring. Instead of checking only for substring matches or fixed labels, the judge can score outputs on correctness, tone, safety, relevance, and domain nuance.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777944048445-refl.png\" alt=\"AWS details RFT with LLM-as-a-judge for Nova\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>The post breaks the workflow into six steps: choose a judge type, define criteria, pick and configure the judge model, refine the prompt, align reward metrics with production evaluation, and build a reward Lambda that can handle scale and failures.\u003C\u002Fp>\u003Cul>\u003Cli>Rubric-based judging scores one response against predefined criteria.\u003C\u002Fli>\u003Cli>Preference-based judging compares two responses and picks the better one.\u003C\u002Fli>\u003Cli>Boolean pass\u002Ffail scoring is recommended for rubric judges.\u003C\u002Fli>\u003Cli>Reward functions should mix LLM judgments with deterministic checks for format, length, language, and safety.\u003C\u002Fli>\u003Cli>Lambda guidance includes exponential backoff, parallel calls, neutral rewards on error, and a 15-minute timeout.\u003C\u002Fli>\u003C\u002Ful>\u003Cp>For model choice, AWS says larger judges fit complex reasoning and multi-dimensional scoring, while smaller models can work for common tasks such as math, coding, or general chat if prompts are tight enough. The post also stresses structured outputs, clear scoring rules, and edge-case handling so reward signals stay parseable and stable.\u003C\u002Fp>\u003Ch2>Why it matters\u003C\u002Fh2>\u003Cp>For developers, the appeal is faster alignment without hand-labeling every sample. An LLM judge can surface why a response failed, which helps teams debug reward logic and spot hidden misalignment before deployment.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777944045237-sk68.png\" alt=\"AWS details RFT with LLM-as-a-judge for Nova\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>The legal contract review example shows the practical angle: a small labeled dataset was enough to train a system that evaluates contract clauses against internal guidance, prior contracts, and local law. That matters for teams building domain tools where quality depends on nuanced judgment, not just exact text matches.\u003C\u002Fp>\u003Cp>AWS also ties reward design to production metrics, arguing that training signals should mirror the same accuracy, safety, and compliance checks used after launch. That reduces the risk of optimizing for the wrong target.\u003C\u002Fp>\u003Cp>The key question now is not whether RFT works, but which tasks are better served by an LLM judge than by cheaper rules or human review.\u003C\u002Fp>","AWS outlines reinforcement fine-tuning with LLM-as-a-judge, plus a legal contract review case study using Amazon Nova and SageMaker AI.","aws.amazon.com","https:\u002F\u002Faws.amazon.com\u002Fblogs\u002Fmachine-learning\u002Freinforcement-fine-tuning-with-llm-as-a-judge\u002F",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777944048445-refl.png",[13,14,15,16,17],"AWS","Reinforcement Fine-Tuning","LLM-as-a-judge","Amazon Nova","SageMaker AI","en",1,false,"2026-05-05T01:20:24.773176+00:00","2026-05-05T01:20:24.757+00:00","done","06724b01-e358-4f6d-b324-b8a82222333f","aws-rft-llm-as-a-judge-nova-en","model-release","c22cf822-ce57-495f-a4ab-643ad9a08200","published","2026-05-05T09:00:18.118+00:00",[31,32,33],"AWS says LLM-as-a-judge can improve RFT when rewards are hard to define by hand.","The post recommends mixing judge scores with deterministic checks and structured Lambda handling.","A legal contract review case study shows the approach working with a small labeled dataset.",[35,37,39,41,43],{"name":15,"slug":36},"llm-as-a-judge",{"name":13,"slug":38},"aws",{"name":17,"slug":40},"sagemaker-ai",{"name":16,"slug":42},"amazon-nova",{"name":44,"slug":45},"reinforcement fine-tuning","reinforcement-fine-tuning",{"id":27,"slug":47,"title":48,"language":49},"aws-rft-llm-as-a-judge-nova-zh","AWS 解析 Nova 的 RFT 評分法","zh",[51,57,63,69,75,81],{"id":52,"slug":53,"title":54,"cover_image":55,"image_url":55,"created_at":56,"category":26},"ebd0ef7f-f14d-4e25-a54e-073b49f9d4b9","why-googles-hidden-gemini-live-models-matter-en","Why Google’s Hidden Gemini Live Models Matter More Than the Demo","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778869237748-4rqx.png","2026-05-15T18:20:23.999239+00:00",{"id":58,"slug":59,"title":60,"cover_image":61,"image_url":61,"created_at":62,"category":26},"6c57f6bf-1023-4a22-a6c0-013bd88ac3d1","minimax-m1-open-hybrid-attention-reasoning-model-en","MiniMax-M1 brings 1M-token open reasoning model","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778797872005-z8uk.png","2026-05-14T22:30:39.599473+00:00",{"id":64,"slug":65,"title":66,"cover_image":67,"image_url":67,"created_at":68,"category":26},"68a2ba2e-f07a-4f28-a69c-24bf66652d2e","gemini-omni-video-review-text-rendering-en","Gemini Omni Video Review: Text Rendering Beats Rivals","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778779286834-fy35.png","2026-05-14T17:20:44.524502+00:00",{"id":70,"slug":71,"title":72,"cover_image":73,"image_url":73,"created_at":74,"category":26},"1d5fc6b1-a87f-48ae-89ee-e5f0da86eb2d","why-xiaomi-mimo-v25-pro-changes-coding-agents-en","Why Xiaomi’s MiMo-V2.5-Pro Changes Coding Agents More Than Chatbots","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778689848027-ocpw.png","2026-05-13T16:30:29.661993+00:00",{"id":76,"slug":77,"title":78,"cover_image":79,"image_url":79,"created_at":80,"category":26},"cb3eac19-4b8d-4ee0-8f7e-d3c2f0b50af5","openai-realtime-audio-models-live-voice-en","OpenAI’s Realtime Audio Models Target Live Voice","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778451653257-dsnq.png","2026-05-10T22:20:33.31082+00:00",{"id":82,"slug":83,"title":84,"cover_image":85,"image_url":85,"created_at":86,"category":26},"84c630af-a060-4b6b-9af2-1b16de0c8f06","anthropic-10-finance-ai-agents-en","Anthropic发布10款金融AI Agent","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778389841959-ktkf.png","2026-05-10T05:10:23.345141+00:00",[88,93,98,103,108,113,118,123,128,133],{"id":89,"slug":90,"title":91,"created_at":92},"d4cffde7-9b50-4cc7-bb68-8bc9e3b15477","nvidia-rubin-ai-supercomputer-en","NVIDIA Unveils Rubin: A Leap in AI Supercomputing","2026-03-25T16:24:35.155565+00:00",{"id":94,"slug":95,"title":96,"created_at":97},"eab919b9-fbac-4048-89fc-afad6749ccef","google-gemini-ai-innovations-2026-en","Google's AI Leap with Gemini Innovations in 2026","2026-03-25T16:27:18.841838+00:00",{"id":99,"slug":100,"title":101,"created_at":102},"5f5cfc67-3384-4816-a8f6-19e44d90113d","gap-google-gemini-ai-checkout-en","Gap Teams Up with Google Gemini for AI-Driven Checkout","2026-03-25T16:27:46.483272+00:00",{"id":104,"slug":105,"title":106,"created_at":107},"f6d04567-47f6-49ec-804c-52e61ab91225","ai-model-release-wave-march-2026-en","Navigating the AI Model Release Wave of March 2026","2026-03-25T16:28:45.409716+00:00",{"id":109,"slug":110,"title":111,"created_at":112},"895c150c-569e-4fdf-939d-dade785c990e","small-language-models-transform-ai-en","Small Language Models: Llama 3.2 and Phi-3 Transform AI","2026-03-25T16:30:26.688313+00:00",{"id":114,"slug":115,"title":116,"created_at":117},"38eb1d26-d961-4fd3-ae12-9c4089680f5f","midjourney-v8-alpha-features-pricing-en","Midjourney V8 Alpha: A Deep Dive into Its Features and Pricing","2026-03-26T01:25:36.387587+00:00",{"id":119,"slug":120,"title":121,"created_at":122},"bf36bb9e-3444-4fb8-ab19-0df6bc9d8271","rag-2026-indispensable-ai-bridge-en","RAG in 2026: The Indispensable AI Bridge","2026-03-26T01:28:34.472046+00:00",{"id":124,"slug":125,"title":126,"created_at":127},"60881d6d-2310-44ef-b1fb-7f98e9dd2f0e","xiaomi-mimo-trio-agents-robots-voice-en","Xiaomi’s MiMo trio targets agents, robots, and voice","2026-03-28T03:05:08.899895+00:00",{"id":129,"slug":130,"title":131,"created_at":132},"f063d8d1-41d1-4de4-8ebc-6c40511b9369","xiaomi-mimo-v2-pro-1t-moe-agents-en","Xiaomi MiMo-V2-Pro: 1T MoE Model for Agents","2026-03-28T03:06:19.238032+00:00",{"id":134,"slug":135,"title":136,"created_at":137},"a1379e9a-6785-4ff5-9b0a-8cff55f8264f","cursor-composer-2-started-from-kimi-en","Cursor’s Composer 2 started from Kimi","2026-03-28T03:11:59.132398+00:00"]