With faster node startup for GKE, say goodbye to cold-start latency

We’ve rolled out a significant update to Google Kubernetes Engine (GKE) that solves one of the most annoying problems in cloud infrastructure: cold start latency. GKE now has up to 4x faster node startup times compared to previous versions for qualifying nodes, allowing customers to provision quickly and efficiently. This isn’t a setting you have to toggle or a config file you need to patch. It’s an architectural upgrade to how we provision infrastructure, meaning your nodes just start faster, out of the box. This translates directly into enhanced agility and cost-efficiency for your cloud operations with a significant impact on a wide range of use cases, from rapid deployment of models for AI inference to dynamic scaling of accelerated and general-purpose nodes.

The problem we set out to tackle: the “cold start” tax

If you run workloads with fluctuating demand, especially AI inference or batch processing, you know the pain of waiting for a new node to spin up. When demand spikes, your autoscaler requests a node. Then you wait. To avoid that wait, and the resulting latency for your users, many teams resort to over-provisioning, keeping expensive nodes running “just in case.” You end up paying for idle compute just to buy yourself insurance against startup lag. That insurance is especially expensive when it comes to scarce accelerators.

The solution: a complete rework of node provisioning

To address this, we rebuilt the provisioning logic for VMs and GKE nodes. At a high level, we are using a combination of intelligent compute buffers, specially designed fast-starting virtual machines, and a new control plane architecture that allows VMs to resize instantly without rebooting. While the technical details are complex, the benefit to you is simple: your GKE clusters now scale inherently faster and are more efficient, allowing you to shift precious resources to where they are needed.

What this means for you

  • Less over-provisioning: Because nodes come online faster, you can trust your autoscaler to react in real-time rather than keeping a buffer of idle nodes.

  • Better AI inference: For models running on GPUs, faster node provisioning reduces the time between a request spike and the model serving traffic.

  • No “Ops” overhead: This works automatically. You don’t need to change your Terraform or YAML files to take advantage of it.

image1

Availability

The accelerated provisioning is live right now for workloads running in GKE Autopilot — including Autopilot workloads running inside Standard clusters — using the following hardware:

Coming soon, we will continue to roll this out to more machines, including the following, so stay tuned:

How to try it

If you already use GKE Autopilot on the supported instance types, you’ve probably  already noticed the improvement.

And if you’re running a GKE Standard cluster, you can now use Autopilot specifically for these workloads without migrating your whole cluster. Just point your Pods to the Autopilot ComputeClass, and they will inherit these startup speeds while living alongside your standard nodes.

You can read the full technical documentation on fast-starting nodes here.

What’s next

Learn how you can leverage these new improvements to improve your workload responsiveness with these resources.