Kubernetes v1.36 turns release notes into a playbook

OraCore Editors

[TOOLS] May 19, 202616 min readOraCore Editors

Kubernetes v1.36 turns release notes into a playbook

I break down Kubernetes v1.36 into the practical moves that matter, then give you a copy-ready template for tracking upgrades.

release-management Kubernetes cluster-operations policy scheduling

Share LinkedIn

Kubernetes v1.36 turns release notes into a playbook

Kubernetes v1.36 can be read as a practical upgrade checklist for cluster teams.

I've been reading Kubernetes release posts for years, and honestly, most of them blur together. New stable stuff, a few betas, a pile of alpha flags, and the same polite optimism. Useful, sure. Memorable? Not really. But v1.36 felt different to me because it reads less like a brag sheet and more like a pressure map. The release is telling me where the project wants cluster operators to spend attention: scheduling, policy, storage, observability, auth, and the annoying corners where upgrades usually bite.

That’s the part I keep coming back to. When I’m helping a team plan a Kubernetes upgrade, I don’t care about the headline number first. I care about what changes the day after rollout. What gets easier to observe? What becomes less brittle? What might force me to rewrite a controller, a policy, or a Helm chart at 2 a.m.? v1.36 has enough of those answers that it’s worth slowing down and actually decomposing it instead of skimming the release post and moving on.

The source that kicked this off is the official Kubernetes blog post, Kubernetes v1.36: ハル (Haru). I’m also pulling from the linked feature posts in the Kubernetes blog, because the release post itself is a map, not the whole territory. That matters here.

Stop reading releases like they’re marketing copy

Get the latest AI news in your inbox

Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.

No spam. Unsubscribe at any time.

Similar to previous releases, the release of Kubernetes v1.36 introduces new stable, beta, and alpha features.

What this actually means is that Kubernetes is still shipping in layers, and the release post is trying to tell you how much trust to place in each layer. Stable features are the stuff you can wire into production without pretending you’re being “experimental.” Beta is where I start planning migration work. Alpha is where I treat the feature like a lab instrument: useful, but not something I’d build my whole platform around unless I enjoy pain.

I’ve made the mistake of treating release notes as a shopping list. That always backfires. The real job is classification: what can I adopt now, what needs a test cluster, and what should I ignore until the next cycle? If I don’t do that sorting, I end up with a half-finished upgrade plan and a team asking why the rollout is blocked on a feature gate nobody owns.

In v1.36, the sheer spread of changes tells me the project is still balancing maturity and exploration. That’s normal. What’s useful is the pattern: a lot of the work lands in platform plumbing, not flashy user-facing UX. If you run clusters for real workloads, that’s the stuff that matters anyway.

Stable means: bake it into standards and docs.
Beta means: test it, measure it, and plan for it.
Alpha means: isolate it, gate it, and keep your expectations low.

How I apply this in practice: I make three columns in my upgrade notes. One for “safe to enable,” one for “pilot in staging,” and one for “ignore for now.” Then I map every v1.36 item into one of those buckets before I let anyone on the team get excited. It’s boring, but boring is how upgrades stop being heroic.

Scheduling is where Kubernetes keeps showing its hand

Advancing Workload-Aware Scheduling

What this actually means is that Kubernetes keeps pushing toward scheduling decisions that understand more than just CPU and memory. The scheduler is no longer just a bin-packer. It’s becoming a policy engine with opinions about workload shape, resource pressure, and placement tradeoffs.

I ran into this kind of problem when a team kept asking why certain pods were “randomly” landing on nodes that looked fine on paper but were terrible in practice. They weren’t random. The scheduler was doing exactly what we told it to do, which is usually the issue. The problem was that our mental model was too simple. We were asking for placement, but what we needed was placement with context.

That’s why workload-aware scheduling matters. It’s not just a feature; it’s a signal that Kubernetes wants operators to express intent more precisely. If I’m running latency-sensitive services, batch jobs, or mixed workloads on shared nodes, I want the scheduler to know that some pods are not interchangeable. v1.36 keeps moving in that direction.

The practical takeaway is to revisit your scheduling assumptions. Do you rely on labels and affinities that have grown into a pile of exceptions? Do you have taints that nobody can explain anymore? Are your topology rules still aligned with how your apps fail in real life? If the answer is “kind of,” then this release is a prompt to clean house.

Audit your node labels and affinities.
Check whether your current placement rules match actual SLOs.
Write down which workloads are allowed to be “flexible” and which are not.

If you want to go deeper, the Kubernetes scheduler docs are still the best place to start: kube-scheduler. For the broader mechanics of resource placement, I also keep the scheduling and eviction docs handy. The point isn’t to memorize flags. It’s to stop pretending scheduling is magic.

Policy is getting harder to wiggle around

Declarative Validation Graduates to GA

What this actually means is that Kubernetes is continuing to move policy and validation away from ad hoc scripts and toward built-in, declarative control. That’s good news if you’ve ever had to maintain a mess of admission webhooks just to keep bad objects out of the cluster.

I’m opinionated about this because I’ve lived through the webhook era. It starts with one clean validation rule. Then another. Then a third team wants one too. Before long, nobody knows which webhook rejected the object, or why, or whether the failure was the policy or the implementation. You don’t have governance. You have folklore.

GA validation features are a big deal because they let teams enforce rules without turning every policy into a custom service. That reduces operational drag. It also makes the cluster easier to reason about, which is the part people forget until they’re debugging a broken deploy and a policy exception buried three layers deep.

For me, the action item here is simple: identify every validation rule that’s still living in code when it could live in policy. If you’ve got admission logic in a controller, a webhook, and a CI pipeline all checking the same thing, that’s not defense in depth. That’s duplication.

There’s a related Kubernetes concept worth keeping nearby: admission controllers. And if you’re already using policy tooling like Kyverno or OPA Gatekeeper, v1.36 is a good moment to ask whether you can simplify, not just add more rules.

How I’d apply it: list the top five object-level mistakes your platform team keeps seeing, then decide which ones should be blocked by native Kubernetes mechanisms, which ones belong in external policy, and which ones are just documentation failures dressed up as governance.

Storage keeps drifting toward fewer surprises

Moving Volume Group Snapshots to GA

What this actually means is that storage teams can trust more of the snapshot story without treating it like a side quest. Volume group snapshots graduating to GA means the platform is getting better at handling coordinated storage state across related volumes, which is exactly the kind of thing people need when apps are not neatly single-volume anymore.

I’ve seen enough backup plans go sideways to know why this matters. The classic mistake is assuming a database, its sidecar, and its supporting data all fail or recover independently. They don’t. If you snapshot one piece and not the others, congratulations, you have a technically successful backup of a broken application.

GA here is less about novelty and more about confidence. It tells me the API and behavior have stabilized enough that I can start designing around them instead of treating them as a test feature for one storage vendor. That’s a big difference for operators who need repeatable recovery stories.

The practical move is to review your disaster recovery workflow. If you use CSI storage, check whether your backup tooling can actually take advantage of group snapshot semantics. If you don’t have an app-level restore test, you don’t have a backup strategy. You have a storage bill.

Useful references here are the Kubernetes storage docs: Kubernetes storage and the CSI documentation at kubernetes-csi.github.io. I’d also keep an eye on your storage provider’s own support matrix, because GA in Kubernetes doesn’t magically make every backend behave the same way.

Test snapshot restore, not just snapshot creation.
Verify multi-volume consistency for stateful apps.
Document which workloads can recover from a point-in-time snapshot.

Observability is finally being treated like a first-class cost

PSI Metrics for Kubernetes Graduates to GA

What this actually means is that Kubernetes is making pressure signals a normal part of platform visibility instead of a niche tuning tool. PSI, or pressure stall information, helps operators understand when nodes are under stress in ways that raw CPU utilization doesn’t capture.

I like this because “CPU is at 40%” has lied to me more times than I can count. A node can look underused and still be awful for latency because memory pressure, I/O contention, or scheduling churn is chewing it up. Pressure metrics help close that gap. They don’t solve the problem for you, but they at least tell you where the body is buried.

When metrics graduate to GA, I start thinking about dashboards, alerts, and runbooks. Not because I want more charts, but because stable signals are worth wiring into incident response. If PSI is part of the platform now, then it should show up in capacity planning and node-health debugging, not sit in a corner nobody checks.

I’d use this release as a prompt to ask: do our alerts reflect actual pressure, or do they just reflect utilization? Those are not the same thing. If you’ve ever had a node look “fine” until pods started getting throttled, you already know why this matters.

For the mechanics, Kubernetes docs on resource management for pods and containers are still the place I’d start. If you’re on Linux, the kernel PSI documentation also helps explain what the metric is actually measuring. The point is to stop using a single number as a proxy for cluster health.

Identity and authorization keep tightening up

Fine-Grained Kubelet API Authorization Graduates to GA

What this actually means is that Kubernetes is continuing to reduce the blast radius of access. The kubelet is one of those components people trust too casually because it’s “internal.” Internal is not the same thing as harmless. Fine-grained authorization means you can be more precise about which API access is allowed and who gets it.

This is one of those areas where security work looks boring until it saves you from a very expensive mistake. I’ve seen clusters where broad kubelet access was inherited by default because nobody wanted to untangle the permissions. That’s how accidental privilege becomes policy. Then one day someone asks why a debugging tool can see more than it should.

GA is the signal that this is no longer a niche hardening option. It’s part of the mainstream operating model. If you’re responsible for multi-tenant clusters or regulated workloads, this should be on your checklist.

How I’d apply it: review kubelet access paths, identify where broad permissions are still in play, and tighten them before you need to explain them to security. If your RBAC story is “we’ll clean that up later,” later is now.

For background, I’d keep the Kubernetes auth docs close: authentication and authorization. And if you’re trying to understand the kubelet itself, the kubelet reference is the right anchor.

The weird alpha features are where the roadmap peeks through

Pod-Level Resource Managers (Alpha)

What this actually means is that Kubernetes is still experimenting with more granular ways to manage resources at the pod level, not just the container level. That’s interesting because it hints at a future where resource control is more aligned with how applications actually behave.

I’m always cautious with alpha features. Not because they’re bad, but because they’re honest. They admit they’re not ready. That honesty is useful. It lets me see where the project is heading without pretending I should build a production dependency around it today.

Pod-level resource management matters because a lot of modern workloads don’t fit neatly into one-container mental models. Sidecars, init containers, helper processes, and mixed resource profiles make the old assumptions feel clumsy. If Kubernetes can manage resources more naturally at the pod boundary, that could reduce a lot of operator friction.

But I wouldn’t rush to adopt it just because it exists. I’d use alpha features to learn, not to stabilize production. The right question is not “Can I turn this on?” It’s “What problem is Kubernetes trying to solve here, and does my workload actually have that problem?”

If you want the broader context, the Kubernetes docs on pods and resource requests and limits are still the basics. I’d also keep an eye on the related v1.36 feature posts in the official blog, because that’s where the implementation details usually show up.

The template you can copy

# Kubernetes release review template

## Release summary
- Release version:
- Release date:
- Source URL:
- Owner:
- Cluster(s) affected:

## What changed
### Stable
- [ ] Item:
- [ ] Item:

### Beta
- [ ] Item:
- [ ] Item:

### Alpha
- [ ] Item:
- [ ] Item:

## What matters to us
For each item, answer:
- Does this affect scheduling?
- Does this affect policy or security?
- Does this affect storage or backup?
- Does this affect observability?
- Does this affect upgrade risk?

## Adoption decision
| Item | Status | Why | Owner | Deadline |
|------|--------|-----|-------|----------|
|      | Adopt  |     |       |          |
|      | Test   |     |       |          |
|      | Ignore |     |       |          |

## Upgrade checklist
- [ ] Read the upstream feature post
- [ ] Check feature gate status
- [ ] Verify compatibility with existing controllers/operators
- [ ] Test in staging
- [ ] Update runbooks
- [ ] Update alerts/dashboards
- [ ] Confirm rollback path

## Questions I need answered before rollout
- What breaks if this is enabled?
- What behavior changes for existing workloads?
- What metrics should I watch?
- What is the rollback plan?
- Who owns the follow-up?

## Final decision
- Approve / Defer / Reject
- Notes:
- Follow-up date:

That’s the version I’d actually use in a team meeting. It turns a noisy release post into a decision document, which is what most of us need anyway.

If I were applying v1.36 tomorrow, I’d start by sorting features into stable, beta, and alpha, then I’d focus on scheduling, policy, storage, observability, and auth. That’s where the real operational impact lives, not in the celebratory language around the release.

And yes, the official release post is still the source of truth. My breakdown is just the operator’s translation layer.

Source attribution: original material comes from the Kubernetes blog post at https://kubernetes.io/blog/2026/04/22/kubernetes-v1-36-release/. The template and practical framing here are my own synthesis of that post and the linked Kubernetes feature writeups.

// Related Articles

Kubernetes v1.36 turns release notes into a playbook

Stop reading releases like they’re marketing copy

Get the latest AI news in your inbox

Scheduling is where Kubernetes keeps showing its hand

Policy is getting harder to wiggle around

Storage keeps drifting toward fewer surprises

Observability is finally being treated like a first-class cost

Identity and authorization keep tightening up

The weird alpha features are where the roadmap peeks through

The template you can copy

Sim turns agent workflows into a visual canvas

low_latency_layer brings Reflex to Linux GPUs

dbt sl turns Semantic Layer setup into a loop

Kubernetes turns clusters into declared state

IBM’s vibe coding guide turns prompts into code

Anthropic buys Stainless, the SDK tool behind rivals