Scaling
Nexlayer runs on production-grade Kubernetes infrastructure with automatic scaling. Your apps scale based on demand — no configuration required.
Automatic Scaling
By default, Nexlayer automatically scales your pods based on CPU and memory usage. When traffic increases, more replicas spin up. When traffic decreases, replicas scale down. No cold starts, no manual intervention.
How Auto-Scaling Works
Scale Up
When CPU or memory usage exceeds thresholds, Nexlayer automatically spins up additional pod replicas to handle the load. New replicas are ready in seconds.
Scale Down
When demand decreases, excess replicas are gracefully terminated. Your app scales down to match actual usage, optimizing costs.
Load Balancing
Traffic is automatically distributed across all healthy replicas. If a replica becomes unhealthy, it's removed from the load balancer and replaced.
Manual Scaling
Need more control? You can specify replica counts in your nexlayer.yaml:
application:
name: my-app
pods:
- name: api
image: myuser/api:latest
path: /api
servicePorts: [3000]
replicas: 3 # Always run 3 replicasNote: Fixed replicas disable auto-scaling for that pod.
Resource Allocation
Nexlayer provides sensible defaults for CPU and memory, but you can customize resource allocation for demanding workloads:
pods:
- name: ml-worker
image: myuser/ml-worker:latest
servicePorts: [8080]
resources:
cpu: "2" # 2 CPU cores
memory: "4Gi" # 4 GB RAMScaling Databases
Databases require special consideration. Use persistent volumes to ensure data survives restarts and scaling events:
pods:
- name: postgres
image: postgres:15
servicePorts: [5432]
vars:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: secretpassword
POSTGRES_DB: myapp
volumes:
- name: pg-data
size: 10Gi
mountPath: /var/lib/postgresql/dataNote: Database pods with persistent volumes don't auto-scale by default. For high-availability databases, consider using managed database services or telling your AI assistant about your HA requirements.
Scaling Best Practices
Keep Pods Stateless
Design your API and web pods to be stateless. Store sessions in Redis and files in object storage. Stateless pods scale seamlessly.
Health Checks
Nexlayer automatically monitors pod health. If your app has specific health requirements, expose a /health endpoint that returns 200 when healthy.
Graceful Shutdown
Handle SIGTERM in your app to finish in-progress requests before shutting down. This ensures zero dropped requests during scale-down events.
Need Help Scaling?
Just tell your AI assistant what you're trying to achieve:
"This app needs to handle 10,000 concurrent users"
"Scale the API to always have at least 2 replicas"
"The ML worker needs 4GB of RAM"