Docs/Deployments/Scaling

Scaling

Nexlayer runs on production-grade Kubernetes infrastructure with automatic scaling. Your apps scale based on demand — no configuration required.

Automatic Scaling

By default, Nexlayer automatically scales your pods based on CPU and memory usage. When traffic increases, more replicas spin up. When traffic decreases, replicas scale down. No cold starts, no manual intervention.

How Auto-Scaling Works

Scale Up

When CPU or memory usage exceeds thresholds, Nexlayer automatically spins up additional pod replicas to handle the load. New replicas are ready in seconds.

Scale Down

When demand decreases, excess replicas are gracefully terminated. Your app scales down to match actual usage, optimizing costs.

Load Balancing

Traffic is automatically distributed across all healthy replicas. If a replica becomes unhealthy, it's removed from the load balancer and replaced.

Manual Scaling

Need more control? You can specify replica counts in your nexlayer.yaml:

Fixed Replicas

application:
  name: my-app

pods:
  - name: api
    image: myuser/api:latest
    path: /api
    servicePorts: [3000]
    replicas: 3  # Always run 3 replicas

Note: Fixed replicas disable auto-scaling for that pod.

Resource Allocation

Nexlayer provides sensible defaults for CPU and memory, but you can customize resource allocation for demanding workloads:

Custom Resources

pods:
  - name: ml-worker
    image: myuser/ml-worker:latest
    servicePorts: [8080]
    resources:
      cpu: "2"        # 2 CPU cores
      memory: "4Gi"   # 4 GB RAM

Scaling Databases

Databases require special consideration. Use persistent volumes to ensure data survives restarts and scaling events:

Persistent Database

pods:
  - name: postgres
    image: postgres:15
    servicePorts: [5432]
    vars:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: secretpassword
      POSTGRES_DB: myapp
    volumes:
      - name: pg-data
        size: 10Gi
        mountPath: /var/lib/postgresql/data

Note: Database pods with persistent volumes don't auto-scale by default. For high-availability databases, consider using managed database services or telling your AI assistant about your HA requirements.

Scaling Best Practices

Keep Pods Stateless

Design your API and web pods to be stateless. Store sessions in Redis and files in object storage. Stateless pods scale seamlessly.

Health Checks

Nexlayer automatically monitors pod health. If your app has specific health requirements, expose a /health endpoint that returns 200 when healthy.

Graceful Shutdown

Handle SIGTERM in your app to finish in-progress requests before shutting down. This ensures zero dropped requests during scale-down events.

Need Help Scaling?

Just tell your AI assistant what you're trying to achieve:

"This app needs to handle 10,000 concurrent users"

"Scale the API to always have at least 2 replicas"

"The ML worker needs 4GB of RAM"

Next Steps

Configuration Reference →Connect Your AI Assistant →