Kubernetes Health Check
An introduction to health checks with Kubernetes for WinCC OA.
Overview
A health check queries the status of running managers. A health check command is used
during startup of a container. The health check is performed at regular
intervals based on the configuration you have defined in the probe
settings. The frequency, timing and behavior of these checks depend on
parameters such as periodSeconds
,
initialDelaySeconds
and others.
If the container is "healthy", the health check returns 0. The health check is meaningful since when a docker container is "healthy", you can interact with it and do not need to wait any longer. A Health check can be added to the end of a dockerfile as follows:
HEALTHCHECK --interval=5s --timeout=5s --start-period=10s --retries=20 CMD "${BIN_DIR}
WCCILpmon" -config ${OAPROJ}config/config -status || exit 1
You can find a health probe example for OA running in Docker (or Kubernetes). The files are located in wincc_oa_path/data/containerization/examples/kubernetes/healthProbes.
Unhealthy Container
To make an unhealthy container in Kubernetes healthy again, the problems that cause the health checks to fail must be diagnosed and fixed. Sometimes a simple restart of the container or pod is enough to fix the problem. In Kubernetes, restarts are automatically managed when a container fails the health check.
Kubernetes provides three main types of health check mechanisms called probes. These probes ensure that the containers in a pod are running as expected and are ready to perform their service.
Liveness probe
Liveness probes determine when to restart a container. For example, liveness probes could catch a deadlock when an application is running but unable to make progress.
Readiness probe
Readiness probes determine when a container is ready to start accepting traffic. This is useful when waiting for an application to perform time-consuming initial tasks, such as establishing network connections, loading files, and warming caches.
Startup probe
A startup probe verifies whether the application within a container is started. This can be used to adopt liveness checks on slow starting containers, avoiding them getting killed by the Kubernet before they are up and running.
Fixing the unhealthy state of a container in Kubernetes depends on the nature of the problem and the specific reason why it is failing. Diagnosing the root cause is essential for applying the right solution.
For more information, see https://kubernetes.io/docs/concepts/configuration/liveness-readiness-startup-probes/.
What is the expected time?
The time to fix a health issue in a container is not fixed and depends on the nature of the problem. Sometimes a simple restart of the container or pod is enough to fix the problem. In such cases, the time required is usually the time it takes Kubernetes to terminate the old container or pod and start a new one, which usually takes only a few seconds to a few minutes.
How does the Health Check work with WinCC OA
liveness.sh script
Kubernetes executes liveness.sh directly within the container to perform a liveness test.
WCCILpmon -config $OAPROJ/config/config -command MGRLIST:STATI -log +stdout
Please make sure that the environment variables $OAINST and $OAPROJ_NAME are set in your containers, or adapt the script to your specific needs. The following logic is then used to evaluate the health state of the WinCC OA runtime:
Standby and startup tests can be disabled (or implemented according to your own requirements). The startup tests are not required as we do not expect inconsistent startup times of the WinCC OA runtime. Below you can find a configuration example:
When adjusting the values, note that Kubernetes uses the values in the following order:
- Wait for initialDelaySeconds
- Perform readiness check and wait timeoutSeconds for a timeout
-
If the number of continued successes is greater than successThreshold, return success
Or
If the number of continued failures is greater than failureThreshold, failure is returned
otherwise wait periodSeconds and start a new readiness check.