Module health Endpoints and K8s probes
Condition Monitoring provides Kubernetes probes and health endpoints. There is a central module health endpoint, and each service also exposes its own health endpoints. Unlike the probes, health endpoints are not intended for use by the container management system. Instead, they are designed for privileged users (authorization required), such as operators or operator tools. These endpoints are exposed to provide more detailed information than a simple OK/NOK status.
Module Health EndPoint
In order to check the overall health status of the Condition Monitoring module or services, users must have one of the following roles assigned:
-
Operator (condition-monitoring-operator)
-
Condition Monitoring Administrator (condition-monitoring-admin)
-
Custom role with the static resource below:
ResourceType: urn:com:bosch:bci:cm:all:operation ResourceID: health ResourceName: Health endpoint - Provides information about the services and their dependencies
The request:
HTTP Method: GET Url: https://<domain>/cm/health
Possible responses:
-
HTTP 200 OK - Returned when given token is valid with authorized resource
-
HTTP 403 FORBIDDEN - Returned when given token isn’t authorized (ex. Missing resource and permission)
-
HTTP 401 UNAUTHORIZED - Returned when given token is invalid
An authorized request returns a detailed response:
{
"name": "Condition Monitoring Module",
"description": "Condition Monitoring Module - Custom Health Endpoint",
"instanceId": "",
"startupTime": "2025-12-18T11:34:25.883348480Z",
"version": "master-dev-SNAPSHOT",
"ready": true,
"health": "healthy",
"onStateSince": "2025-12-18T12:56:05.899540031Z",
"dependencies": [
{
"name": "cmCore",
"available": true,
"details": {
"service": "Condition Monitoring"
}
},
{
"name": "kafka",
"available": true,
"details": {
"clusterId": "XzLspcMFRvmShsc3Pk8o-w"
}
},
{
"name": "valueAggregator",
"available": true,
"details": {
"service": "Value Aggregator"
}
},
{
"name": "mmpd",
"available": true
},
{
"name": "functionExecutor",
"available": true,
"details": {
"service": "Function Executor"
}
},
{
"name": "resultAggregator",
"available": true,
"details": {
"service": "Result Aggregator"
}
},
{
"name": "valueProvider",
"available": true,
"details": {
"service": "Value Provider"
}
},
{
"name": "macma",
"available": true
},
{
"name": "rabbit",
"available": true,
"details": {
"version": "4.1.2"
}
},
{
"name": "db",
"available": true
}
]
}
Kubernetes probes
-
Liveness Probe A liveness probe can return
-
"Yes, I’m alive!" which will be evaluated by the runtime environment and no further action is executed.
-
"No I’m not alive!" which will be evaluated by the runtime environment. The container will be killed and restarted by the runtime environment.
-
Or it simply fails since the container is not able to respond to the probe for some reason. The container will be killed and restarted by the runtime environment.
-
-
Readiness Probe A readiness probe can return
-
"Yes, I’m ready and can serve functionality to my clients!" which will be evaluated by the runtime environment and no further action is executed.
-
"No, I’m not ready!" which will be evaluated by the runtime environment and no traffic will be routed to the container. The runtime environment will ask again with a configurable amount of retries and with a configurable delay. If this probe fails everytime the pod will be marked as unready.
-
Or it simply fails since the container is not able to respond to the probe for some reason. The runtime environment will stop routing traffic to the container. The runtime environment will ask again with a configurable amount of retries and with a configurable delay. If this probe fails everytime the pod will be marked as unready
-
-
Startup Probe A startup probe is intended to suppress other probes during containers startup or initialization phases for preventing restarts. Condition Monitoring uses the liveness probe as startup probe. It can return
-
"I’m succesfully started." From now on the runtime environment will start executing the liveness probes.
-
"I’m in the startup phase." the runtime environment will ask again with a configurable amount of retries and with configurable delay. If this probe fails everytime the pod might be restarted (depending on the pods restart policy).
-
Or it simply fails since the container is not able to respond to the probe for some reason.
-
Services Health Endpoints
Below you will find details about each service health endpoints and the impact of outages to dependencies.
Condition Monitoring Core
| Endpoint | Description |
|---|---|
Health endpoint |
200 when service available Health endpoint is showing some details about the RabbitMQ, Influx, Database, Deviation Processor (SMDP), MACMA, MDM and Portal connection issue (only for authenticated users). |
Liveness endpoint |
200 The Micro service will still be alive, the internal liveness state is valid and do not need to get restarted. |
Readiness endpoint |
RabbitMQ, Deviation Processor, Portal, MDM 200
Database , MACMA, InfluxDB 503
|
| Dependency | Use cases | Impact |
|---|---|---|
RABBITMQ |
Lost connection to RabbitMQ. Reasons:
|
|
INFLUXDB |
Lost connection to InfluxDB. Reasons:
|
|
DATABASE |
Lost connection to the Database. Reasons:
|
|
MACMA |
Lost connection to MACMA. Reasons:
|
|
PORTAL |
Lost connection to Portal. Reasons:
|
|
MDM |
Lost connection to MDM. Reasons:
|
|
SMDP |
Lost connection to Smdp. Reasons:
|
|
Rule Service App
| Endpoint | Description |
|---|---|
Health endpoint |
200 when service available
|
Liveness endpoint |
200
|
Readiness endpoint |
RabbitMQ, Kafka , MDM 200
Database , MACMA 503
|
| Dependency | Use cases | Impact |
|---|---|---|
RABBITMQ |
Lost connection to RabbitMQ. Reasons:
|
|
Kafka |
Lost connection to Kafka. Reasons:
|
|
DATABASE |
Lost connection to Database. Reasons:
|
|
MACMA |
Lost connection to MACMA. Reasons:
|
|
MDM |
Lost connection to MDM. Reasons:
|
|
Stateful Function Executor
| Endpoint | Description |
|---|---|
Health endpoint |
200 when service available.
|
Liveness endpoint |
200
|
Readiness endpoint |
200
503
|
| Dependency | Use cases | Impact |
|---|---|---|
RABBITMQ |
Lost connection to RabbitMQ. Reasons:
|
|
INFLUXDB |
Lost connection to InfluxDB. Reasons:
|
|
Rule Value Aggregator
| Endpoint | Description |
|---|---|
Health endpoint |
200 when service available.
|
Liveness endpoint |
503
200
|
Readiness endpoint |
503 While connection is lost to Kafka and Kafka Streams is not in the RUNNING state.
200 While service is connected to Kafka and Kafka Streams is in the RUNNING state.
|
| Dependency | Use cases | Impact |
|---|---|---|
Kafka |
Lost connection to Kafka. Reasons:
|
|
Rule Value Provider
| Endpoint | Description |
|---|---|
Health endpoint |
200 when service available.
|
Liveness endpoint |
503
200
|
Readiness endpoint |
503 While connection is lost to Kafka and Kafka Streams is not in the RUNNING state.
200 While service is connected to Kafka and Kafka Streams is in the RUNNING state.
|
| Dependency | Use cases | Impact |
|---|---|---|
Kafka |
Lost connection to Kafka. Reasons:
|
|
Rule Result Aggregator
| Endpoint | Description |
|---|---|
Health endpoint |
200 when service available.
|
Liveness endpoint |
503
200
|
Readiness endpoint |
503 While connection is lost to Kafka and Kafka Streams is not in the RUNNING state.
200 While service is connected to Kafka and Kafka Streams is in the RUNNING state.
|
| Dependency | Use cases | Impact |
|---|---|---|
Kafka |
Lost connection to Kafka. Reasons:
|
|
Rule Function Executor
| Endpoint | Description |
|---|---|
Health endpoint |
200 when service available.
|
Liveness endpoint |
503
200
|
Readiness endpoint |
503 While connection is lost to Kafka and Kafka Streams is not in the RUNNING state.
200 While service is connected to Kafka and Kafka Streams is in the RUNNING state.
|
| Dependency | Use cases | Impact |
|---|---|---|
Kafka |
Lost connection to Kafka. Reasons:
|
|