Scale-to-zero & cold starts
How Knative drops idle knext services to zero, and how knext keeps the wake fast.
When a knext service is idle, Knative drops it to zero replicas — you pay nothing for traffic you're not serving. The first request after that wakes it through the activator. This page is how that works, and how knext keeps the wake fast.
How it scales to zero
Each service runs behind a queue-proxy that reports concurrency to the Knative Pod Autoscaler
(KPA). After the stable window with no requests, the KPA scales the deployment to 0. Incoming
requests then route to the shared activator, which buffers the request, triggers a scale-up, and
forwards once a pod is ready.
| Step | What happens |
|---|---|
| t+0ms | Request arrives at the ingress; the service is at 0 replicas → routed to the activator. |
| buffer | The activator holds the request and signals the autoscaler to scale 0→1. |
| start | A pod is scheduled; the Node standalone server boots — V8 reads its bytecode cache. |
| serve | The activator forwards the buffered request; later requests hit the pod directly. |
Set it per app
spec.scaling.minScale: 0 enables scale-to-zero; raise it to keep a warm floor for
latency-critical services. The operator translates this to KPA annotations on the Knative Service.
spec:
scaling:
minScale: 0 # idle → zero pods
maxScale: 20The cold-start budget
A cold start is pod scheduling + container start + the Node/Next boot + first-request work. The
boot is where a Next.js app spends real CPU compiling JavaScript — so knext attacks exactly that
with bytecode caching: NODE_COMPILE_CACHE on a persistent volume means each cold
pod reads pre-compiled V8 bytecode instead of recompiling from source.
Cross-cold-start caching needs persistence. The bytecode cache only helps across pods if it
survives pod death — knext mounts it from a PVC, not pod-local disk, when
spec.cache.enableBytecodeCache is set.
A project cold-start benchmark on a real cluster (OKE) found Bun and Node cold starts roughly
comparable (~1.3s, scheduling-bound) — the boot is dominated by pod scheduling, not the runtime. So
knext stays on Node + NODE_COMPILE_CACHE. Treat any specific millisecond figure as
environment-dependent, not a guaranteed number.