Kubernetes Health Check

An introduction to health checks with Kubernetes for WinCC OA.

Overview

A health check queries the status of running managers. A health check command is used during startup of a container. If the container is "healthy", the health check returns 0. The health check is meaningful since when a docker container is "healthy", you can interact with it and do not need to wait any longer. A Health check can be added to the end of a dockerfile as follows:

HEALTHCHECK --interval=5s --timeout=5s --start-period=10s --retries=20 CMD "${BIN_DIR}
WCCILpmon" -config ${OAPROJ}config/config -status || exit 1

You can find a health probe example for OA running in Docker (or Kubernetes). The files are located in wincc_oa_path/data/containerization/examples/kubernetes/healthProbes.

How does the Health Check work with WinCC OA

To enable Kubernetes to monitor the health status of WinCC OA components/containers, you can use the provided
liveness.sh script

Kubernetes executes liveness.sh directly within the container to perform a liveness test.

The test is realized internally by executing the following command in the directory $OAINST/bin/:
WCCILpmon -config $OAPROJ/config/config -command MGRLIST:STATI -log +stdout  

Please make sure that the environment variables $OAINST and $OAPROJ_NAME are set in your containers, or adapt the script to your specific needs. The following logic is then used to evaluate the health state of the WinCC OA runtime:

Figure 1. Logic to evaluate the Health State

Standby and startup tests can be disabled (or implemented according to your own requirements). The startup tests are not required as we do not expect inconsistent startup times of the WinCC OA runtime. Below you can find a configuration example:

Figure 2. Configuration Example
Note: For larger projects, the values may not be sufficient and should be adapted to your needs.

When adjusting the values, note that Kubernetes uses the values in the following order:

  • Wait for initialDelaySeconds
  • Perform readiness check and wait timeoutSeconds for a timeout
  • If the number of continued successes is greater than successThreshold, return success

    Or

    If the number of continued failures is greater than failureThreshold, failure is returned

    otherwise wait periodSeconds and start a new readiness check.