Running a large scale anycast deployment across multiple edge locations can increase the administrative burden on your team and the complexity of monitoring your instances and ensuring that they are not only available but that they are also responding to requests correctly. StackPath provides Liveness and Readiness Probes to help ease the monitoring burden and allow you to automate which instances in your workload respond to anycast requests.
This means you can automate the removal of unresponsive or incorrectly responding instances from the routing of your anycast address automatically ensuring that your workload is always responsive to your users.
Probe Basics
Both Liveness and Readiness probes check either an HTTP endpoint or a TCP port. It is possible to configure the probe frequency, timeout, initial delay, and success/failure thresholds. However, the two probes will respond differently if a failure is seen. Let's look at that in more detail.
Liveness Probes
A liveness probe will try and restart the instance if the probe reports as down. This also has the benefit of removing the instance from the routing for your anycast address before the restart occurs meaning that requests to your anycast IP will not be routed to instances marked as down. This can be useful if your application is no longer responding to any network requests.
Readiness Probes
A Readiness probe will remove the instance from the routing for your anycast address if the probe reports as down, but will not attempt to restart the instance. This means that your instance will still be online but anycast requests will not be routed to the instance. This can be useful if your application is returning old, incorrect data and needs some time to query and process fresh data before being re-added to the routing for your anycast IP address.
HTTP & TCP
Both Readiness and Liveness probes can be configured to check via HTTP or TCP.
For HTTP checks (HTTPS is also possible) a request is made to the specified path and any HTTP status codes greater than or equal to 200, but less than 400 will be considered a success. Any other HTTP status codes will be considered a failure. As the check only tests the HTTP status code you will need to implement an endpoint in your application to test your business logic and return the appropriate response code, if required. With HTTP checks it's also possible to configure the scheme, the port and HTTP headers.
For TCP checks the probe attempts to open a socket to your instances on the specified port. If this is successful the probe is considered successful, else the probe is considered a failure.
Common Settings
Readiness and Liveness probes share some common settings:
Name | Default Value | Min Value | Description |
initialDelaySeconds |
0 | 0 | The initial delay before the probe starts |
timeoutSeconds |
1 | 1 | The number of seconds before the probe times out and is considered a failure |
periodSeconds |
10 | 1 | The frequency of the probe in seconds |
successThreshold |
1 | 1 | Minimum consecutive successes required before a probe is considered successful after a failure. This must be 1 for liveness probes |
failureThreshold |
3 | 1 | The number of failures seen before the probe is considered a failure. |
A Real-World Example
If you're tasked with running a website that returns up to date information via HTTPS, such as a stock ticker pulling data from a third party, you want to be sure that;
a) The web server is responding to requests on port 443, and
b) That the data returned is relevant and up to date.
This can be achieved by configuring both a liveness and readiness probe on your workload.
You would configure the liveness probe with a TCP check to check that port 443 is responding and if the probe does not get a response from port 443 the instance would be removed from anycast routing and restarted.
You would also configure an HTTP readiness probe to check an HTTP endpoint of your application and ensure that the response is healthy. To do this you would need to implement a health endpoint in your application that executes the check logic and then returns either a 200 response code if there are no issues, or a 500 if there is an issue and the instances needs to be removed from Anycast. In the stock ticker example, as the application is relying on a third party for the data, you could implement logic that returns a 500 if the data is more than 2mins old while the markets are open and the third party service is available, else return a 200. If the 500 is seen by the readiness probe the instance would be taken out of the anycast routing while the application fetches and processes newer data.
With both the Readiness and Liveness probes configured we can be satisfied that our website is going to be available with correct, up to date data being served.
Configuring Probes
API
Probes can be configured in the initial creation of a workload via the API, or in a workload update.
You must send the current metadata version when sending a PATCH request, else the request will be rejected.
If you want to enable Liveness or Readiness probes on an already created workload you will need to send a PATCH request to update your workload configuration and include the probe configuration within your Containers or Virtual Machines spec objects. For example, to add a TCP Liveness probe checking port 22 every 10s with a 2s timeout and a failure threshold of 3 on a virtual machine workload you would need to send the following in the API body:
"workload": {
"spec": {
"virtualMachines": {
"12": {
"livenessProbe": {
"tcpSocket": {
"port": "22"
},
"initialDelaySeconds": "30",
"timeoutSeconds": "2",
"periodSeconds": "10",
"successThreshold": "1",
"failureThreshold": "3"
},
}
},
},
"metadata": {
"version": "7",
}
}
Or, to add an HTTP Readiness Probe checking the /health endpoint on port 8080 with a basic auth header every 10s you would need to send the following:
"workload": {
"spec": {
"virtualMachines": {
"12": {
"readinessProbe": {
"httpGet": {
"path": "/health",
"port": 8080,
"scheme": "http",
"httpHeaders": {
"Authorization": "Basic QWxhZGRpbjpPcGVuU2VzYW1l",
}
},
"initialDelaySeconds": "30",
"timeoutSeconds": "2",
"periodSeconds": "10",
"successThreshold": "1",
"failureThreshold": "3"
},
}
},
},
"metadata": {
"version": "7",
}
}
Further information and examples are available in our API Documentation.
Portal
Probes and Lifecycle checks can be similarly configured through the Portal.
- Simply select the desired workload-instance you wish to configure
- Navigate to the Probes or Lifecycle tab.
- Enable and Define desired feature(s) by clicking the slider button.