[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-databricks-custom-models-aws-overview-en":3,"article-related-databricks-custom-models-aws-overview-en":30,"series-tools-6455d3ca-2f71-42e9-aec6-91db98028f01":84},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":22,"views":26,"created_at":27,"published_at":28,"topic_cluster_id":29},"6455d3ca-2f71-42e9-aec6-91db98028f01","databricks-custom-models-aws-overview-en","Databricks custom models on AWS: what to know","\u003Cp data-speakable=\"summary\">Databricks custom models on \u003Ca href=\"\u002Ftag\u002Faws\">AWS\u003C\u002Fa> can be logged in MLflow and served as APIs with CPU or \u003Ca href=\"\u002Ftag\u002Fgpu\">GPU\u003C\u002Fa> compute.\u003C\u002Fp>\u003Cp>Databricks updated its \u003Ca href=\"https:\u002F\u002Fdocs.databricks.com\u002Faws\u002Fen\u002Fmachine-learning\u002Fmodel-serving\u002Fcustom-models\" target=\"_blank\" rel=\"noopener\">custom models\u003C\u002Fa> guide on May 28, 2026, and the document is packed with the kind of details teams usually learn the hard way. The big themes are simple: package your model correctly, include its dependencies, and expect serving endpoints to scale and reload on Databricks’ schedule, not yours.\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>Topic\u003C\u002Fth>\u003Cth>What Databricks says\u003C\u002Fth>\u003Cth>Why it matters\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>Endpoint creation\u003C\u002Ftd>\u003Ctd>About 10 minutes\u003C\u002Ftd>\u003Ctd>New versions take time to package and provision\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Request timeout\u003C\u002Ftd>\u003Ctd>597 seconds\u003C\u002Ftd>\u003Ctd>Long inference jobs can fail if they run too long\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Scale from zero\u003C\u002Ftd>\u003Ctd>10–20 seconds, sometimes minutes\u003C\u002Ftd>\u003Ctd>Cold starts can hurt latency-sensitive apps\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Scale-down window\u003C\u002Ftd>\u003Ctd>Every 5 minutes\u003C\u002Ftd>\u003Ctd>Endpoints shrink after traffic drops\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Provisioned concurrency formula\u003C\u002Ftd>\u003Ctd>QPS × execution time\u003C\u002Ftd>\u003Ctd>Capacity planning depends on real traffic and model latency\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>What Databricks means by a custom model\u003C\u002Fh2>\u003Cp>In Databricks’ terminology, a custom model is any Python model or custom code that you deploy through \u003Ca href=\"https:\u002F\u002Fwww.databricks.com\u002Fproduct\u002Fmachine-learning\u002Fmodel-serving\" target=\"_blank\" rel=\"noopener\">Model Serving\u003C\u002Fa>. That includes models built with \u003Ca href=\"https:\u002F\u002Fscikit-learn.org\u002F\" target=\"_blank\" rel=\"noopener\">scikit-learn\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fxgboost.readthedocs.io\u002F\" target=\"_blank\" rel=\"noopener\">XGBoost\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fpytorch.org\u002F\" target=\"_blank\" rel=\"noopener\">PyTorch\u003C\u002Fa>, and \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Ftransformers\" target=\"_blank\" rel=\"noopener\">HuggingFace Transformers\u003C\u002Fa>, plus arbitrary Python logic.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780378381029-mcao.png\" alt=\"Databricks custom models on AWS: what to know\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>The deployment path is straightforward on paper. You log the model in \u003Ca href=\"https:\u002F\u002Fmlflow.org\u002F\" target=\"_blank\" rel=\"noopener\">MLflow\u003C\u002Fa>, register it in \u003Ca href=\"https:\u002F\u002Fwww.databricks.com\u002Fproduct\u002Funity-catalog\" target=\"_blank\" rel=\"noopener\">Unity Catalog\u003C\u002Fa> or the workspace registry, then create a serving endpoint. Databricks also points readers to its \u003Ca href=\"\u002Fnews\u002Fdatabricks-model-serving-tutorial\" target=\"_blank\" rel=\"noopener\">model serving tutorial\u003C\u002Fa> for a full walkthrough.\u003C\u002Fp>\u003Cul>\u003Cli>Native MLflow flavors work for standard libraries and common training workflows.\u003C\u002Fli>\u003Cli>\u003Ccode>pyfunc\u003C\u002Fcode> works when you want to wrap custom Python behavior.\u003C\u002Fli>\u003Cli>Unity Catalog registration is the recommended path for managed model governance.\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Logging choices decide how painful deployment gets\u003C\u002Fh2>\u003Cp>The article spends a lot of time on logging because that is where serving problems usually begin. Databricks supports autologging in Databricks Runtime for ML, manual logging with MLflow built-in flavors, and custom logging with \u003Ccode>pyfunc\u003C\u002Fcode> for arbitrary Python code.\u003C\u002Fp>\u003Cp>The practical difference is control. Autologging is easy, built-in flavors are cleaner when your model fits the library, and \u003Ccode>pyfunc\u003C\u002Fcode> gives you room for extra code paths, helper functions, or custom preprocessing. If you are mixing model code with application code, \u003Ccode>pyfunc\u003C\u002Fcode> is often the least awkward option.\u003C\u002Fp>\u003Cblockquote>\u003Cp>“Databricks refers to such models as custom models.”\u003C\u002Fp>\u003C\u002Fblockquote>\u003Cp>Databricks also recommends adding a model signature and input example. That advice matters more than it sounds. Signatures are required for Unity Catalog logging, and input examples make it easier to catch shape and type mistakes before the model hits production traffic.\u003C\u002Fp>\u003Cp>Here is the rule of thumb I would use: if your model needs special preprocessing, package that logic with the model instead of hoping the serving container guesses right. The docs are blunt about dependency errors, and that is usually code for “something was missing from the model artifact.”\u003C\u002Fp>\u003Cul>\u003Cli>\u003Ccode>signature\u003C\u002Fcode> helps define the expected inputs and outputs.\u003C\u002Fli>\u003Cli>\u003Ccode>input_example\u003C\u002Fcode> gives Databricks a sample payload for validation.\u003C\u002Fli>\u003Cli>\u003Ccode>code_path\u003C\u002Fcode>, \u003Ccode>pip_requirements\u003C\u002Fcode>, and \u003Ccode>extra_pip_requirements\u003C\u002Fcode> help package nonstandard code and libraries.\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>CPU and GPU serving are not interchangeable\u003C\u002Fh2>\u003Cp>Databricks gives you several compute types, and the differences are more than marketing labels. CPU options include \u003Cstrong>CPU_MEDIUM\u003C\u002Fstrong> and \u003Cstrong>CPU_LARGE\u003C\u002Fstrong>, which trade concurrency for more memory per worker. GPU options include \u003Cstrong>GPU_SMALL\u003C\u002Fstrong> with 1xT4 and 16GB per concurrency, \u003Cstrong>GPU_MEDIUM\u003C\u002Fstrong> with 1xA10G and 24GB, \u003Cstrong>MULTIGPU_MEDIUM\u003C\u002Fstrong> with 4xA10G and 96GB, and \u003Cstrong>GPU_MEDIUM_8\u003C\u002Fstrong> with 8xA10G and 192GB.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780378380169-kk9t.png\" alt=\"Databricks custom models on AWS: what to know\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>That memory-per-concurrency detail is the part teams often miss. If your model is memory-hungry but still CPU-friendly, moving up to CPU_MEDIUM or CPU_LARGE may be enough. If you are serving transformer-style workloads, Databricks says PyTorch and Transformers flavors handle GPU prediction automatically, which removes some of the plumbing work.\u003C\u002Fp>\u003Cp>There is also a deployment-time tradeoff. GPU container builds take longer because of model size and installation overhead, and very large models can hit a 60-minute timeout or fail with a “No space left on device” error. For very large language models, Databricks tells users to use \u003Ca href=\"https:\u002F\u002Fdocs.databricks.com\u002Faws\u002Fen\u002Fmachine-learning\u002Ffoundation-model-apis\" target=\"_blank\" rel=\"noopener\">Foundation Model APIs\u003C\u002Fa> instead.\u003C\u002Fp>\u003Cul>\u003Cli>CPU_MEDIUM gives 8GB per concurrency.\u003C\u002Fli>\u003Cli>CPU_LARGE gives 16GB per concurrency.\u003C\u002Fli>\u003Cli>GPU_MEDIUM_8 gives 192GB per concurrency across 8 A10G GPUs.\u003C\u002Fli>\u003Cli>GPU autoscaling takes longer than CPU autoscaling.\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Scaling rules matter more than people expect\u003C\u002Fh2>\u003Cp>The docs are very clear that endpoints scale based on traffic and provisioned concurrency units. Databricks defines provisioned concurrency as the maximum number of parallel requests the system can handle, and gives a simple planning formula: \u003Cstrong>provisioned concurrency = QPS × model execution time\u003C\u002Fstrong>.\u003C\u002Fp>\u003Cp>That formula is useful because it ties capacity planning to real behavior instead of guesswork. If your model handles 20 QPS and each request takes 0.2 seconds, you are already at 4 units of provisioned concurrency before you account for spikes, retries, or background load.\u003C\u002Fp>\u003Cp>Scaling behavior is also specific. Endpoints scale up almost immediately when traffic rises, then scale down every five minutes when traffic drops. Scale to zero is optional, and Databricks warns that the first request after inactivity will hit a cold start. The first request after scale-to-zero usually takes 10–20 seconds to wake up, but it can take minutes, and there is no SLA for that latency.\u003C\u002Fp>\u003Cp>That is why Databricks says scale to zero should not be used for production workloads that need consistent uptime or guaranteed response times. For high-QPS, low-latency use cases, the docs recommend route optimization and express deployments.\u003C\u002Fp>\u003Cblockquote>\u003Cp>“Scale to zero should not be used for production workloads that require consistent uptime or guaranteed response times.”\u003C\u002Fp>\u003C\u002Fblockquote>\u003Cp>There is a second operational detail that matters just as much: Databricks performs zero-downtime updates by keeping the old endpoint configuration alive until the new one is ready. That protects live traffic, but it also means you are billed for both configurations during the transition.\u003C\u002Fp>\u003Ch2>What teams should actually do with this doc\u003C\u002Fh2>\u003Cp>If you are deploying custom models on Databricks, the checklist is pretty clear. Package the model in MLflow, include a signature and input example, make sure the dependencies are declared, and test the model locally before you push it into serving. Databricks explicitly warns that missing dependencies can break deployment, which is exactly the kind of failure that wastes a deployment window.\u003C\u002Fp>\u003Cp>The older Anaconda notice is also worth a quick look if you are running legacy models logged with MLflow v1.17 or earlier. Databricks says models logged before MLflow v1.18 may have used the \u003Ccode>defaults\u003C\u002Fcode> channel from Anaconda, while newer logs use \u003Ccode>conda-forge\u003C\u002Fcode>. If you have old models in production, check the packaged \u003Ccode>conda.yaml\u003C\u002Fcode> before assuming the environment is still compliant.\u003C\u002Fp>\u003Cp>The most important operational takeaway is that serving custom models is less about training accuracy and more about packaging discipline. If a model cannot reload during maintenance, Databricks will fail the update and keep the old configuration serving traffic. That is a safe fallback, but it also means your deployment hygiene decides whether updates are boring or painful.\u003C\u002Fp>\u003Cp>For teams running production \u003Ca href=\"\u002Ftag\u002Finference\">inference\u003C\u002Fa>, the next question is simple: can your model reload cleanly after a maintenance event, or only on the machine where you trained it?\u003C\u002Fp>","Databricks explains how to package, deploy, and scale custom ML models on AWS Model Serving, including CPU, GPU, and reload rules.","docs.databricks.com","https:\u002F\u002Fdocs.databricks.com\u002Faws\u002Fen\u002Fmachine-learning\u002Fmodel-serving\u002Fcustom-models",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780378381029-mcao.png","tools","en","47ce5058-3c10-4d7c-ad89-053b8f8d953e",[17,18,19,20,21],"Databricks","MLflow","model serving","custom models","AWS",[23,24,25],"Custom models on Databricks are logged in MLflow and deployed through Model Serving.","Deployment success depends heavily on packaging dependencies, signatures, and input examples.","Scaling, cold starts, and reload behavior can affect latency and uptime more than model code does.",1,"2026-06-02T05:32:35.343023+00:00","2026-06-02T05:32:35.334+00:00","92db0173-053e-4cb5-96c1-633ac3197050",{"tags":31,"relatedLang":43,"relatedPosts":47},[32,35,37,39,41],{"name":33,"slug":34},"Model Serving","model-serving",{"name":21,"slug":36},"aws",{"name":17,"slug":38},"databricks",{"name":20,"slug":40},"custom-models",{"name":18,"slug":42},"mlflow",{"id":15,"slug":44,"title":45,"language":46},"databricks-custom-models-aws-overview-zh","Databricks AWS 自訂模型重點","zh",[48,54,60,66,72,78],{"id":49,"slug":50,"title":51,"cover_image":52,"image_url":52,"created_at":53,"category":13},"aa96e422-2b01-4480-b4ce-a646be8e0993","magenta-realtime-2-score-inside-daw-en","Magenta RealTime 2 lets you score in the DAW","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781046208039-ksdz.png","2026-06-09T23:02:56.428086+00:00",{"id":55,"slug":56,"title":57,"cover_image":58,"image_url":58,"created_at":59,"category":13},"c79bca38-50b2-4d80-9a48-7f4d1afd051a","open-source-ai-tools-beat-claude-paid-tiers-en","Open-source AI tools beat Claude’s paid tiers on value","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781045269190-a1ow.png","2026-06-09T22:47:20.7972+00:00",{"id":61,"slug":62,"title":63,"cover_image":64,"image_url":64,"created_at":65,"category":13},"fbd166b2-30ad-451c-bfa5-8f190d0c4252","500-ai-agent-projects-show-where-agents-work-now-en","500 AI agent projects show where agents work now","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781033595427-zvq5.png","2026-06-09T19:32:37.573706+00:00",{"id":67,"slug":68,"title":69,"cover_image":70,"image_url":70,"created_at":71,"category":13},"8f987f8b-1e3b-409d-9ca9-3f0884d5e1d9","chocolatey-go-package-policy-installs-en","Chocolatey’s Go package turns installs into policy","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781029112225-4nik.png","2026-06-09T18:18:05.601854+00:00",{"id":73,"slug":74,"title":75,"cover_image":76,"image_url":76,"created_at":77,"category":13},"c1c49550-3032-4381-bad9-a7ef29973b4d","go-support-policy-turns-releases-into-a-checklist-en","Go support policy turns releases into a checklist","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781028203465-bas6.png","2026-06-09T18:02:50.061065+00:00",{"id":79,"slug":80,"title":81,"cover_image":82,"image_url":82,"created_at":83,"category":13},"75f55dc1-b87b-4a8a-812f-bc31ab4ae4dc","rustdesk-self-hosting-secure-remote-access-en","RustDesk self-hosting setup for secure remote access","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781017372462-mgyj.png","2026-06-09T15:02:24.622252+00:00",[85,90,95,100,105,110,115,120,125,130],{"id":86,"slug":87,"title":88,"created_at":89},"8008f1a9-7a00-4bad-88c9-3eedc9c6b4b1","surepath-ai-mcp-policy-controls-en","SurePath AI's New MCP Policy Controls Enhance AI Security","2026-03-26T01:26:52.222015+00:00",{"id":91,"slug":92,"title":93,"created_at":94},"27e39a8f-b65d-4f7b-a875-859e2b210156","mcp-standard-ai-tools-2026-en","MCP Standard in 2026: Integrating AI Tools","2026-03-26T01:27:43.127519+00:00",{"id":96,"slug":97,"title":98,"created_at":99},"165f9a19-c92d-46ba-b3f0-7125f662921d","rag-2026-transforming-enterprise-ai-en","How RAG in 2026 is Transforming Enterprise AI","2026-03-26T01:28:11.485236+00:00",{"id":101,"slug":102,"title":103,"created_at":104},"6a2a8e6e-b956-49d8-be12-cc47bdc132b2","mastering-ai-prompts-2026-guide-en","Mastering AI Prompts: A 2026 Guide for Developers","2026-03-26T01:29:07.835148+00:00",{"id":106,"slug":107,"title":108,"created_at":109},"3ab2c67e-4664-4c67-a013-687a2f605814","garry-tan-open-sources-claude-code-toolkit-en","Garry Tan Open-Sources a Claude Code Toolkit","2026-03-26T08:26:20.245934+00:00",{"id":111,"slug":112,"title":113,"created_at":114},"66a7cbf8-7e76-41d4-9bbf-eaca9761bf69","github-ai-projects-to-watch-in-2026-en","20 GitHub AI Projects to Watch in 2026","2026-03-26T08:28:09.752027+00:00",{"id":116,"slug":117,"title":118,"created_at":119},"9f332fda-eace-448a-a292-2283951eee71","practical-github-guide-learning-ml-2026-en","A Practical GitHub Guide to Learning ML in 2026","2026-03-27T01:16:50.125678+00:00",{"id":121,"slug":122,"title":123,"created_at":124},"1b1f637d-0f4d-42bd-974b-07b53829144d","aiml-2026-student-ai-ml-lab-repo-review-en","AIML-2026 Is a Bare-Bones Student Lab Repo","2026-03-27T01:21:51.661231+00:00",{"id":126,"slug":127,"title":128,"created_at":129},"6d1bf3f6-e191-4d30-b55b-8a0722fa6afe","ai-trending-github-repos-and-research-feeds-en","AI Trending Tracks Repos and Research Feeds","2026-03-27T01:31:35.709532+00:00",{"id":131,"slug":132,"title":133,"created_at":134},"010539a1-4c3a-4bd3-937a-26616422ee0d","awesome-ai-for-science-research-tools-map-en","Awesome AI for Science Is Becoming a Real Research Map","2026-03-27T01:46:50.89513+00:00"]