Module Health Verification Endpoints and K8S Probes
-
Health endpoint: Will be used by monitoring to determine the health state of a Microservice. The health endpoint will also include the states of the Liveness, Readiness and Ping States. It is not intended to be used by the container management system, but for privileged (authorization required) users like operators or tools of an operator. For that reason the endpoints are exposed and will provide additional information than a simple ok, nok status.
-
Readiness endpoint: Will be polled by Kubernetes to check, if a Microservice (pod) can accept traffic.
-
Liveness endpoint: Will be polled by Kubernetes to check, if a Microservice internal state is valid. Kubernetes will restart the Container if the liveness endpoint fails for the configured number of times.
-
Ping endpoint: Always returns OK. Is intended to be used as a subsequent call executed by the health endpoint implementation of a service. It ensures only that a service client is able to connect to a service.
Webapp
-
Health Endpoint
-
/health exposed externally as /health
-
Authorized on resource with id
healthand typeurn:com:bosch:bci:operationwith privilegeEXECUTE -
unauthorized: 401
-
No authorization (token) was provided with the request
-
-
forbidden: 403
-
The provided authorization (token) does not have the required permission to access the endpoint
-
-
down: 200
-
Status 200 is returned also when the Health State is down. Reason for being down and details can be found in the response body
-
We think that the microservice is not healthy in case of lost connection to Core.
-
-
up: 200
-
-
Liveness Endpoint
-
/health/liveness exposed externally as /health/liveness
-
down: 503
-
only during startup and shutdown it will be down
-
-
up: 200
-
-
Readiness Endpoint
-
/health/readiness exposed externally as /health/readiness
-
down: 503
-
only during startup and shutdown it will be down
-
-
up: 200
-
-
Ping Endpoint
-
/ping exposed externally as /webapp/ping
-
always up: 200
-
-
Dependencies
-
Core
-
Use cases:
Lost connection to Core.
Reasons:-
Core instance crashed, restarted etc
-
Network issue between microservice and Core
-
-
General Behavior
-
The lost connection to Core will be logged.
-
The microservice is trying to reconnect infinitely.
-
We keep the Service alive and ready to accept requests.
-
-
Impact
-
UI still works if Core is down and the UI will show a proper error message.
-
-
-
-
Use cases
-
N/A
-
-
General Behavior
-
UI still works if Keycloak or Core is down and the UI will show a proper error message.
-
-
Impact
-
N/A
-
Core
-
Health Endpoint
-
/health exposed externally as /core/health
-
Authorized on resource with id
healthand typeurn:com:bosch:bci:operationwith privilegeEXECUTE -
unauthorized: 401
-
No authorization (token) was provided with the request
-
-
forbidden: 403
-
The provided authorization (token) does not have the required permission to access the endpoint
-
-
down: 200
-
Status 200 is returned also when the Health State is down. Reason for being down and details can be found in the response body (about the RabbitMQ, Database or Keycloak connection issue)
-
We think that the microservice is not healthy in case of lost connection to RabbitMQ, Database or Keycloak.
-
-
up: 200
-
-
Liveness Endpoint
-
/health/liveness exposed externally as /core/health/liveness
-
down: 503
-
During startup and shutdown
-
If MACMA’s own bootstrapping (performed immediately after startup) fails and all retry attempts are exhausted
-
-
up: 200
-
-
Readiness Endpoint
-
/health/readiness exposed externally as /core/health/readiness
-
down: 503
-
If Database or Keycloak are down we will not accept requests to the microservice
-
If MACMA’s own bootstrapping (performed immediately after startup) fails
-
-
up: 200
-
If RabbitMQ is down we can still accept requests to the microservice
-
-
-
Ping Endpoint
-
/ping exposed externally as /ping
-
always up: 200
-
-
Dependencies
-
RabbitMQ
-
Use cases:
Lost connection to RabbitMQ.
Reasons:-
RabbitMQ instance crashed, restarted, etc.
-
Network issue between microservice and RabbitMQ
-
-
General Behavior
-
The lost connection to RabbitMQ will be logged when an event is tried to be sent.
-
We keep the Service alive and ready to accept requests.
-
The re-established connection to RabbitMQ will be logged for the first event that is sent after RabbitMQ comes back online.
-
-
Impact
-
-
Database
-
Use cases:
Lost connection to the Database.
Reasons:-
Database instance crashed, restarted etc
-
Network issue between microservice and Database
-
-
General Behavior
-
The lost connection to Database will be logged on error.
-
The microservice is trying to reconnect infinitely.
-
We keep the Service alive, but it won’t be ready to accept any requests.
-
-
Impact
-
The API will no longer be functional
-
-
-
Keycloak
-
Use cases:
Lost connection to Keycloak.
Reasons:-
Keycloak instance crashed, restarted etc
-
Network issue between microservice and Keycloak
-
Keycloak cannot accept traffic (readiness state is DOWN)
-
-
General Behavior
-
The lost connection to Keycloak will be logged on error.
-
The microservice is trying to reconnect infinitely.
-
We keep the Service alive, but it won’t be ready to accept any requests.
-
-
Impact
-
The API will no longer be functional
-
-
-
Keycloak
-
Health Endpoint
-
/auth/health exposed externally as /auth/health and /keycloak/health
-
Keycloak’s health endpoint does not follow the BCI specifications for the health endpoint, it has its own implementation
-
down: 503
-
Health endpoint is showing down and answering 503. We think that the microservice is not healthy in case of lost connection to the Database.
-
-
up: 200
-
-
Liveness Endpoint
-
/auth/health/live exposed externally as /auth/health/live and /keycloak/health/live
-
down: 503
-
Before Keycloak 19.0.0 the liveness endpoint has a dependency on the Database. Only starting with version 19.0.0 (MACMA version 1.20) this endpoint is independent of the database.
-
-
up: 200
-
-
Readiness Endpoint
-
/auth/health/ready exposed externally as /auth/health/ready and /keycloak/health/ready
-
down: 503
-
We will not accept requests to the microservice
-
-
up: 200
-
-
Ping Endpoint
-
Keycloak does not provide a ping endpoint; /auth/health/live can be used instead, as a replacement
-
-
Dependencies
-
Database
-
Use cases:
Lost connection to the Database.
Reasons:-
Database instance crashed, restarted etc
-
Network issue between microservice and Database
-
-
General Behavior
-
The lost connection to Database will be logged on error.
-
The microservice is trying to reconnect infinitely.
-
We keep the Service alive, but it won’t be ready to accept any requests.
-
-
Impact
-
The API will no longer be functional
-
-
-