Scale-to-zero & cold starts

How Knative drops idle knext services to zero, and how knext keeps the wake fast.

When a knext service is idle, Knative drops it to zero replicas — you pay nothing for traffic you're not serving. The first request after that wakes it through the activator. This page is how that works, and how knext keeps the wake fast.

How it scales to zero

Each service runs behind a queue-proxy that reports concurrency to the Knative Pod Autoscaler (KPA). After the stable window with no requests, the KPA scales the deployment to 0. Incoming requests then route to the shared activator, which buffers the request, triggers a scale-up, and forwards once a pod is ready.

Step	What happens
t+0ms	Request arrives at the ingress; the service is at 0 replicas → routed to the activator.
buffer	The activator holds the request and signals the autoscaler to scale 0→1.
start	A pod is scheduled; the Node standalone server boots — V8 reads its bytecode cache.
serve	The activator forwards the buffered request; later requests hit the pod directly.

Set it per app

spec.scaling.minScale: 0 enables scale-to-zero; raise it to keep a warm floor for latency-critical services. The operator translates this to KPA annotations on the Knative Service.

spec:
  scaling:
    minScale: 0     # idle → zero pods
    maxScale: 20

The cold-start budget

A cold start is pod scheduling + container start + the Node/Next boot + first-request work. The boot is where a Next.js app spends real CPU compiling JavaScript — so knext attacks exactly that with bytecode caching: NODE_COMPILE_CACHE on a persistent volume means each cold pod reads pre-compiled V8 bytecode instead of recompiling from source.

Cross-cold-start caching needs persistence. The bytecode cache only helps across pods if it survives pod death — knext mounts it from a PVC, not pod-local disk, when spec.cache.enableBytecodeCache is set.

A project cold-start benchmark on a real cluster (OKE) found Bun and Node cold starts roughly comparable (~1.3s, scheduling-bound) — the boot is dominated by pod scheduling, not the runtime. So knext stays on Node + NODE_COMPILE_CACHE. Treat any specific millisecond figure as environment-dependent, not a guaranteed number.

Scale-to-zero & cold starts

How it scales to zero

Set it per app

The cold-start budget

On this page