[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-why-routing-belongs-at-the-center-of-model-serving-en":3,"article-related-why-routing-belongs-at-the-center-of-model-serving-en":30,"series-industry-639da4b0-a393-409d-930c-2546ec8a63ab":83},{"id":4,"title":5,"content":6,"summary":7,"source":8,"source_url":9,"author":10,"image_url":11,"keywords":12,"language":18,"translated_content":10,"views":19,"is_premium":20,"created_at":21,"updated_at":21,"cover_image":11,"published_at":22,"rewrite_status":23,"rewrite_error":10,"rewritten_from_id":24,"slug":25,"category":26,"related_article_id":27,"status":28,"google_indexed_at":29,"x_posted_at":10,"tweet_text":10,"title_rewritten_at":10,"title_original":10,"key_takeaways":10,"topic_cluster_id":10,"embedding":10,"is_canonical_seed":20},"639da4b0-a393-409d-930c-2546ec8a63ab","Why routing belongs at the center of model serving","\u003Cp data-speakable=\"summary\">Routing should be the single entry point for model serving because it speeds iteration and unlocks new ML products.\u003C\u002Fp>\u003Cp>Routing is not a thin implementation detail in model serving; it is the control plane that decides how fast teams can ship, test, and replace models without breaking the product.\u003C\u002Fp>\u003Ch2>Routing turns model serving into a product surface\u003C\u002Fh2>\u003Cp>Netflix’s own framing is the clearest evidence: a singular \u003Ca href=\"\u002Ftag\u002Fapi\">API\u003C\u002Fa> into the ML serving platform increased the speed of innovation for newer versions of existing experiences and for entirely new ML products. That is not a minor operational win. It means the routing layer became the place where product changes were absorbed, not the place where engineers paid a tax every time a model changed.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777882263285-mkuu.png\" alt=\"Why routing belongs at the center of model serving\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>That matters because model serving is usually treated as a back-end plumbing problem until the organization is forced to support many models, many experiments, and many consumers. Once that happens, every extra endpoint becomes a coordination problem. A single routing entry point removes that sprawl and gives teams one contract to evolve instead of a dozen fragile ones.\u003C\u002Fp>\u003Ch2>A single entry point is the fastest path to iteration\u003C\u002Fh2>\u003Cp>The strongest argument for centralized routing is simple: it cuts the work required to launch a new model version. If a team can swap traffic through one API, then rollout, rollback, canarying, and A\u002FB assignment all happen in one place. That reduces the number of code paths that need to change when the model changes, which is exactly why iteration speeds up.\u003C\u002Fp>\u003Cp>At scale, the alternative is painful. A company with separate serving paths for search, recommendations, personalization, and experimentation ends up duplicating traffic logic across teams. The result is slower releases, inconsistent behavior, and more production risk. Routing centralizes the rules and lets model authors focus on model quality instead of re-implementing distribution logic.\u003C\u002Fp>\u003Ch2>Routing also unlocks new experiences, not just safer deployments\u003C\u002Fh2>\u003Cp>The real payoff is not only operational. A routing layer can decide which model, policy, or ensemble serves each request based on context, user segment, device, locale, or experiment state. That is how a platform stops being a delivery mechanism and starts becoming a capability multiplier. New product ideas become feasible because the serving layer can adapt in real time.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777882250044-0alx.png\" alt=\"Why routing belongs at the center of model serving\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>Netflix’s note that the API enabled completely new product experiences is the key signal here. A routing system that only moves requests is too small a vision. The moment routing can express business logic, experimentation logic, and model selection logic together, product teams gain a new design space. That is where model serving stops being reactive and starts shaping the product itself.\u003C\u002Fp>\u003Ch2>The counter-argument\u003C\u002Fh2>\u003Cp>The best objection is that central routing creates a bottleneck. One API can become one team’s backlog, one set of policies, and one failure domain. Critics are right to worry that a single entry point can slow teams down if it is over-governed, over-generalized, or treated like a sacred abstraction that nobody can extend.\u003C\u002Fp>\u003Cp>That concern is real, but it does not defeat the case for routing. It defines the terms of success. A routing layer must be opinionated about traffic control and boring about everything else. It should standardize common serving concerns while exposing enough escape hatches for specialized use cases. The failure mode is not central routing itself; the failure mode is building a rigid monolith instead of a well-scoped platform boundary.\u003C\u002Fp>\u003Ch2>What to do with this\u003C\u002Fh2>\u003Cp>If you are an engineer, stop adding new model endpoints when a routing rule will do. If you are a PM, treat routing as part of the product architecture, not just infrastructure. If you are a founder, invest in a single serving entry point early, before every team invents its own path and your model platform turns into a maze. The winning pattern is clear: centralize request handling, standardize the common cases, and keep model iteration fast enough that product teams can move without waiting on plumbing.\u003C\u002Fp>","Routing should be the single entry point for model serving because it speeds iteration and unlocks new ML products.","netflixtechblog.com","https:\u002F\u002Fnetflixtechblog.com\u002Fstate-of-routing-in-model-serving-16e22fe18741?gi=a87006c83174",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777882263285-mkuu.png",[13,14,15,16,17],"Netflix Technology Blog","model serving","routing","ML platform","API","en",2,false,"2026-05-04T08:10:34.743542+00:00","2026-05-04T08:10:34.727+00:00","done","22d0f1bd-aa75-4e61-bdb6-a4df685018ed","why-routing-belongs-at-the-center-of-model-serving-en","industry","54b3fd97-c8e6-4b92-b87b-40913f024775","published","2026-05-04T09:00:13.097+00:00",{"tags":31,"relatedLang":42,"relatedPosts":46},[32,35,36,38,40],{"name":33,"slug":34},"Model Serving","model-serving",{"name":15,"slug":15},{"name":13,"slug":37},"netflix-technology-blog",{"name":16,"slug":39},"ml-platform",{"name":17,"slug":41},"api",{"id":27,"slug":43,"title":44,"language":45},"why-routing-belongs-at-the-center-of-model-serving-zh","為什麼 routing 應該放在 model serving 的中心","zh",[47,53,59,65,71,77],{"id":48,"slug":49,"title":50,"cover_image":51,"image_url":51,"created_at":52,"category":26},"f4a9dc33-65ae-41fc-9c17-9ac05935c47a","how-to-follow-gemini-and-apple-watch-12-rumors-en","How to Follow Gemini and Apple Watch 12 Rumors","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778933021686-8pvk.png","2026-05-16T12:03:24.772997+00:00",{"id":54,"slug":55,"title":56,"cover_image":57,"image_url":57,"created_at":58,"category":26},"e2ee68a8-0565-4931-9714-4d87a8899b40","jensen-huang-trump-china-trip-en","Jensen Huang Joins Trump on China Trip","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778930023714-sprb.png","2026-05-16T11:13:28.944681+00:00",{"id":60,"slug":61,"title":62,"cover_image":63,"image_url":63,"created_at":64,"category":26},"f08de46f-92a7-4390-a143-adb9f53e352e","chatgpt-vs-gemini-9-tests-1-clear-winner-2026-en","ChatGPT vs Gemini: 9 Tests, 1 Clear Winner","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778925832253-m4vv.png","2026-05-16T10:03:30.331792+00:00",{"id":66,"slug":67,"title":68,"cover_image":69,"image_url":69,"created_at":70,"category":26},"a75384ff-223f-4a34-9f86-ae5c2772a2d6","how-to-reduce-ai-model-serving-friction-en","How to Reduce AI Model Serving Friction","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778922838163-oi8d.png","2026-05-16T09:13:32.742904+00:00",{"id":72,"slug":73,"title":74,"cover_image":75,"image_url":75,"created_at":76,"category":26},"aec8ac9b-8df2-4403-bf57-53f34783e3a0","lora-vs-qlora-vs-full-fine-tuning-en","LoRA vs QLoRA vs Full Fine-Tuning","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778915640692-lzwf.png","2026-05-16T07:13:34.373862+00:00",{"id":78,"slug":79,"title":80,"cover_image":81,"image_url":81,"created_at":82,"category":26},"d26f7a03-6d4a-4e8b-8173-550c830a7098","why-global-ai-regulation-2026-rewards-modular-compliance-en","Why Global AI Regulation in 2026 Rewards Modular Compliance","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778913228246-86gy.png","2026-05-16T06:33:21.841262+00:00",[84,89,94,99,104,109,114,119,124,129],{"id":85,"slug":86,"title":87,"created_at":88},"d35a1bd9-e709-412e-a2df-392df1dc572a","ai-impact-2026-developments-market-en","AI's Impact in 2026: Key Developments and Market Shifts","2026-03-25T16:20:33.205823+00:00",{"id":90,"slug":91,"title":92,"created_at":93},"5ed27921-5fd6-492e-8c59-78393bf37710","trumps-ai-legislative-framework-en","Trump's AI Legislative Framework: What's Inside?","2026-03-25T16:22:20.005325+00:00",{"id":95,"slug":96,"title":97,"created_at":98},"e454a642-f03c-4794-b185-5f651aebbaca","nvidia-gtc-2026-key-highlights-innovations-en","NVIDIA GTC 2026: Key Highlights and Innovations","2026-03-25T16:22:47.882615+00:00",{"id":100,"slug":101,"title":102,"created_at":103},"0ebb5b16-774a-4922-945d-5f2ce1df5a6d","claude-usage-diversifies-learning-curves-en","Claude Usage Diversifies, Learning Curves Emerge","2026-03-25T16:25:50.770376+00:00",{"id":105,"slug":106,"title":107,"created_at":108},"69934e86-2fc5-4280-8223-7b917a48ace8","openclaw-ai-commoditization-concerns-en","OpenClaw's Rise Raises Concerns of AI Model Commoditization","2026-03-25T16:26:30.582047+00:00",{"id":110,"slug":111,"title":112,"created_at":113},"b4b2575b-2ac8-46b2-b90e-ab1d7c060797","google-gemini-ai-rollout-2026-en","Google's Gemini AI Rollout Extended to 2026","2026-03-25T16:28:14.808842+00:00",{"id":115,"slug":116,"title":117,"created_at":118},"6e18bc65-42ae-4ad0-b564-67d7f66b979e","meta-llama4-fabricated-results-scandal-en","Meta's Llama 4 Scandal: Fabricated AI Test Results Unveiled","2026-03-25T16:29:15.482836+00:00",{"id":120,"slug":121,"title":122,"created_at":123},"bf888e9d-08be-4f47-996c-7b24b5ab3500","accenture-mistral-ai-deployment-en","Accenture and Mistral AI Team Up for AI Deployment","2026-03-25T16:31:01.894655+00:00",{"id":125,"slug":126,"title":127,"created_at":128},"5382b536-fad2-49c6-ac85-9eb2bae49f35","mistral-ai-high-stakes-2026-en","Mistral AI: Facing High Stakes in 2026","2026-03-25T16:31:39.941974+00:00",{"id":130,"slug":131,"title":132,"created_at":133},"9da3d2d6-b669-4971-ba1d-17fdb3548ed5","cursors-meteoric-rise-pressures-en","Cursor's Meteoric Rise Faces Industry Pressures","2026-03-25T16:32:21.899217+00:00"]