Having too many concurrent users is a good problem to have but it can be tricky to ensure you have enough resources available when an unexpected spike of traffic occurs. StackPath allows you to auto-scale your workloads granularly across PoP's so you can serve your users worldwide and ensure that your localized traffic spike is served without any degradation of service.
How Does EdgeCompute Auto-Scaling Work?
When enabled, auto-scaling monitors the CPU utilization of your instances every 15s and if it is seen above the configured threshold more instances are started, up to the configured maximum. This ensures your instances are not overloaded and that you can serve all requests promptly, whilst also ensuring your infrastructure is cost-efficient as extra instances are only added when required and are then removed. Auto-scaling is configured per deployment target but occurs per PoP. For example, if a 'North America' deployment target configured with the PoP locations Ashburn, Dallas & New York has auto-scaling enabled when CPU Util% is above 50% and the Dallas PoP instance has 70% CPU Util with only 30% CPU Util seen in Ashburn and New York, then a new instance will be created in Dallas only.
Instances are automatically scaled down when they are at least 10% lower than the defined threshold 5 mins after the last auto-scale action.
If your workload is configured with an Anycast IP address requests will automatically be routed to the new instances as soon as they are ready, as determined by readiness probes.
Container workloads will auto-scale using the currently configured image.
Virtual Machine Behaviour
Virtual machine workloads will auto-scale with the configured base image. As this image will require configuration you should ensure that cloud-init user data is configured, along with readiness probes, to ensure that your virtual machines are configured correctly before they are added to the anycast IP routing.
You can configure auto-scaling when creating your workload, or after it has been created. When creating or updating your workload simply navigate to your deployment targets and click the `Enable Auto Scaling` checkbox. This will present 3 new inputs; 'Instances Per PoP Min', 'Instances Per PoP Max' and 'CPU Utilization'. Instances Per PoP defines the minimum and maximum amount of instances that can be created per PoP when auto-scaling. The CPU Utilization input defines the CPU Utilization threshold at which auto-scaling occurs.
It's possible to have a different auto-scaling configuration for each deployment target in your workload allowing you more granular control, as seen in the example below:
Once you've added auto-scaling to your configuration simply save the workload for it to be enabled.
The use of auto-scaling does not incur any unique fees, only the cost associated with leveraging a higher quantity of our existing services.