Autoscaling adds (or removes) servers to (or from) your application in response to changes in their utilization levels or responsiveness. So, for example, if your application servers are starting to run out of RAM, Autoscaling could fire up a new server in the background and add it to your load balancer automatically.
Autoscaling is entirely determined by rules that you specify, so you have complete control over whether servers are added and removed (and when that happens).
Autoscalers are sets of rules that are configured and managed via your Dashboard. To add or change a rule:
- Open the application from your Dashboard
- Click on Web/Application in the left-hand nav, and then Servers in the sub-nav
- Click on the Autoscaling tab
- Click Add Autoscaler and select the type of rule (CPU, Memory, Response Time)
- A drawer will open from the left that allows you to configure the rule (see below for more detail on each type of rule)
- Configure the parameters as required
- Click Save
Your new Autoscaler will be active immediately - no need to deploy. You can check its logs (see below) to see how it is operating.
An application can have one of each type of Autoscaler (CPU, memory, response time).
You can edit an existing Autoscaler by clicking on its name in the Autoscaling page of the Dashboard.
Existing Autoscalers can be enabled or disabled by clicking the small down arrow on the right, and then the respective button. You can delete Autoscalers via the same route.
You can view the logs for an active Autoscaler by clicking its name and then clicking the Logs tab. This will show what recent activity the autoscaler has encountered. Note that we only log a small percentage of “no action needed” messages (to keep the logs compact and readable).
Understanding Autoscaling rules
All Autoscalers have a common set of parameters:
- A range (or tolerance) in which they operate
- A lookback period (the period used when calculating the average of the appropriate metric)
- Upper and lower limits for the number of servers. We will not scale servers below the lower limit, or above the upper limit. The range for both limits is 1 - 20.
- You can also set an optional cooldown period between scaling events and allow multiple servers to scale simultaneously
An Autoscaler continuously calculates the average of the metric (CPU, Memory, Response Time) that it monitors over the lookback period and, if that average breaches the threshold(s) you have specified, it either adds servers to your application or removes them.
Server deletion rules
When servers are scaled down, they will not be automatically deleted unless you have have changed the related setting for your application
Autoscalers will create servers based on the size of the last server added to a group. So, for example, if your current application runs on a t3.large instance from AWS with 2 cores and 8GB of RAM, your Autoscalers will create more servers of this exact size and type.
Autoscalers work using thresholds. You set an explicit range within which a server may operate, and the Autoscaler will automatically scale up if the average of that metric exceeds the upper bound of that tolerance, and will scale down in the opposite case.
Be aware that if you make this range very narrow, you will (possibly) breach it more frequently and therefore trigger scale ups (and scale downs) more often. If you set a threshold to either 0% or 100% then your application will never scale down or up (respectively) because you are explicitly indicating that the server can run at its limit with no consequences.
You can set both upper and lower limits on the number of servers for an Autoscaler. It will not create (or remove) servers outside of those limits. The minimum must always be greater than or equal to 1, and the maximum is 20. So if you already have, for example, 25 servers we will not scale up any further.
The advanced options allow you to specify batches of servers for simultaneous creation and deletion (minimum of 1 and maximum of 10). You can also set a cooldown period (in minutes) between scaling events (the minimum is 20 minutes, with no maximum).
The period of time used to calculate the average of a metric - its lookback period - can have a dramatic effect on scaling. A very short lookback period might result in servers being scaled up prematurely due to a short-lived blip in utilization, whereas an overlong period might result in servers not scaling up despite load being unacceptably high for a sustained period.
You should choose your lookback period carefully, ideally based on trends you have observed in your server utilization. Always bear in mind that the average of the chosen metric must breach one of your limits before the Autoscaler will be triggered. So, for example, setting your upper CPU limit to 95% gives you very little headroom if utilization is climbing steeply.