Nexeed
    • Introduction
    • User manual
      • Condition monitoring and its tabs
        • Live
        • Counters
        • Measurements
        • Events
        • Rules
        • View configuration
        • Details
      • Rules management
        • Rule types and standard functions
        • Rule details
      • Function configuration
      • Condition Monitoring widgets
      • Access Management
        • Application Roles
        • Fine-Grained Access Control and Configuration
        • How to Configure Organization Roles
    • Operations manual
      • Overview
      • System architecture and interfaces
        • System components
      • System requirements
        • General notes
        • cm/condition-monitoring-core
        • cm/rule-service-app
        • cm/rule-function-executor
        • cm/rule-result-aggregator
        • cm/rule-value-aggregator
        • cm/rule-value-provider
        • cm/stateful-function-executor
      • Migration from previous versions
        • Migration to 2.1+
        • Migration from CPM 1.5.4 to CM and RM 3.0.x (Nexeed IAS 2023.02.00.xx)
          • CPM to CM relational database migration
          • CPM to RM relational database migration
          • CM Influx database migration
          • Deletion of an old CPM installation
        • Resources mapping from MES to IAS Condition Monitoring
        • Migration to 4.0.0+ (Nexeed IAS 2024.01.00.xx)
        • Migration to 4.3.x (Nexeed IAS 2024.02.01.x)
        • Migration to 4.5.x (Nexeed IAS 2025.01.00.x)
        • Migration to 4.6.x (Nexeed IAS 2025.01.01.x)
        • Migration to 4.8.x (Nexeed IAS 2025.02.00.x)
        • Migration to 4.9.x (Nexeed IAS 2025.02.01.x)
      • Setup and configuration
        • Manual MACMA configuration after setting up a new tenant
        • RabbitMQ
        • Influx configuration
        • Kafka topics
        • Condition Monitoring - Helm Configuration
        • Advanced configuration parameters
          • cm/condition-monitoring-core
            • Common shared variables
            • Portal shared variables
            • MDM shared variables
            • RabbitMQ shared variables
            • OTEL shared variables
          • cm/rule-service-app
            • Rules Management shared variables
            • KAFKA shared variables
          • cm/rule-function-executor
          • cm/rule-result-aggregator
          • cm/rule-value-aggregator
          • cm/rule-value-provider
          • cm/stateful-function-executor
      • Start and shutdown
      • Regular operations
      • Failure handling
        • Rule Management Light Helm installation failing when Kafka is disabled or Kafka is not configured at all
        • User manual injection into Rule Management
        • Infrastructure outages: health verification Endpoints
        • OPP/PPMP are not received in CM
        • Master data (Devices, Facilities, Measuring Points, DeviceTypes) is missing in CM
        • CM is not visible in the portal
        • How to verify if the broker is out of sync
      • Backup and Restore
      • Logging and monitoring
        • General logging characteristics
        • Required monitoring
        • General logging format
        • Request-based logging format
        • Security logging format
        • Lifecycle logging format
        • Module health Endpoints and K8s probes
      • Known limitations
    • API documentation
      • Condition Monitoring HTTP API
      • Rules Management HTTP API
    • Glossary
Condition Monitoring
  • Industrial Application System
  • Core Services
    • Block Management
    • Deviation Processor
    • ID Builder
    • Multitenant Access Control
    • Notification Service
    • Ticket Management
    • Web Portal
  • Shopfloor Management
    • Andon Live
    • Global Production Overview
    • KPI Reporting
    • Operational Routines
    • Shift Book
    • Shopfloor Management Administration
  • Product & Quality
    • Product Setup Management
    • Part Traceability
    • Process Quality
    • Setup Specs
  • Execution
    • Line Control
    • Material Management
    • Order Management
    • Packaging Control
    • Rework Control
  • Intralogistics
    • AGV Control Center
    • Stock Management
    • Transport Management
  • Machine & Equipment
    • Condition Monitoring
    • Device Portal
    • Maintenance Management
    • Tool Management
  • Enterprise & Shopfloor Integration
    • Archiving Bridge
    • Data Publisher
    • Direct Data Link
    • Engineering UI
    • ERP Connectivity
    • Gateway
    • Information Router
    • Master Data Management
    • Orchestrator

Nexeed Learning Portal

  • Condition Monitoring
  • Operations manual
  • Logging and monitoring
  • Module health Endpoints and K8s probes
preview 4.10.0

Module health Endpoints and K8s probes

Condition Monitoring offers 3 endpoints for each of its microservices: 018 Module Health Verification Endpoints and K8S Probes

  • Health endpoint: Will be used by monitoring to determine the health state of a microservice. The health endpoint will also include the states of the Liveness and Readiness probes.

  • Readiness endpoint: Will be polled by Kubernetes to check, if a microservice (pod) can accept traffic.

  • Liveness endpoint: Will be polled by Kubernetes to check, if a microservice’s internal state is valid.

If the modules get installed behind a reverse proxy i.e. /cm/* then the prefix has to be prepended to the given paths (i.e. /health gets to /cm/health).

Overall status

  • Health Endpoint

  • Endpoint Address: /health

  • Expected Status Code: HTTP Status 200

  • Failure Status Code: HTTP Status 503

Services

Condition Monitoring Core

Endpoint Description

Health endpoint
cm/core/health

200 when service available

Health endpoint is showing some details about the RabbitMQ, Database, Deviation Processor (SMDP), MACMA, MDM and Portal connection issue (only for authenticated users).

Liveness endpoint
core/internal/health/liveness

200

The Micro service will still be alive, the internal liveness state is valid and do not need to get restarted.

Readiness endpoint
cm/core/internal/health/readiness

RabbitMQ, Deviation Processor, Portal, MDM

200

  • Rest api requests are still accepted to the microservice

Database , MACMA, InfluxDB

503

  • The requests will not be accepted to the microservice

Dependency Use cases Impact

RABBITMQ

Lost connection to RabbitMQ.

Reasons:

  • RabbitMQ instance crashed, restarted, etc.

  • Network issue between microservice and RabbitMQ

  • The lost connection to RabbitMQ will be logged as warning.

  • The microservice is trying to reconnect infinitely.

  • We keep the Service alive to keep the UI running.

  • RabbitMQ objects (e.g. queues and bindings) will be recreated once the Microservice reconnected

  • API will be still functional and entities should be saved in DB

  • Publishing of messages is retried till RabbitMQ is up and running

  • No messages are lost since durable queues are used

INFLUXDB

Lost connection to InfluxDB.

Reasons:

  • InfluxDB instance crashed, restarted, etc.

  • Network issue between microservice and RabbitMQ

  • The lost connection to Influx will be logged on error.

  • The service is kept alive but it won’t be ready to accept any requests.

  • After restarting InfluxDB, the service will accept requests.

  • Several functionalities will not work anymore e.g.

  • no OPP/PPMP data will be shown

  • no rules are triggered

  • no deviations are created

  • no sequence detection will work

  • …​

DATABASE

Lost connection to the Database.

Reasons:

  • Database instance crashed, restarted etc.

  • Network issue between microservice and database

  • The lost connection to database will be logged as warning.

  • The microservice is trying to reconnect infinitely.

  • The service is kept alive, but it won’t be ready to accept any requests.

  • The API will no longer be functional

MACMA

Lost connection to MACMA.

Reasons:

  • MACMA instance crashed, restarted etc.

  • Network issue between microservice and MACMA

  • The lost connection to MACMA will be logged as error in CM Core.

  • The microservice is trying to reconnect 6 hours default (configurable via MACMA_RETRYER_MAXATTEMTPS).

  • The service is kept alive, but it won’t be ready to accept any requests.

  • The API will no longer be functional.

  • After restarting MACMA, everything works fine again (get token, send requests) without restarting rule service.

PORTAL

Lost connection to Portal.

Reasons:

  • Portal instance crashed, restarted etc.

  • Network issue between microservice and Portal

  • The lost connection/failed requests to Portal will be logged as error.

  • The microservice is trying to reconnect infinitely default (configurable via PORTAL_RETRYER_RETRYINTERVALINSECONDS).

  • After restarting Portal, Portal registration works fine again without restarting Rule Service App.

  • The Nexeed IAS UI cannot be accessed.

  • Api calls to Rule Service App via postman are still working.

MDM

Lost connection to MDM.

Reasons:

  • MDM instance crashed, restarted etc.

  • Network issue between microservice and MDM.

  • The lost connection/failed requests to MDM will be logged as error.

  • The microservice is trying to reconnect 6 hours default (configurable via MDM_RETRYER_MAXATTEMTPS).

  • The API is still functional with the existing master data in CM database.

  • After restarting MDM, everything works fine again without restarting rule service.

  • Master data changes (Integration Events) after MDM stopped is not reflected to CM Core, MDM needs to retry publishing. After triggering master data Reload via API, master data appears.

SMDP

Lost connection to Smdp.

Reasons:

  • Smdp instance crashed, restarted etc.

  • Network issue between microservice and Smdp.

  • The lost connection/failed requests to Smdp will be logged as error.

  • The microservice is trying to reconnect 6 hours default (configurable via SMDP_RETRYER_MAXATTEMTPS).

  • Deviations to Deviation Processor cannot be sent.

  • After restarting smdp, deviations are sent again without restarting CM Core.

Rule Service App

Endpoint Description

Health endpoint
cm/rm/rule-manager/health

200 when service available

  • Health endpoint is showing some details about the RabbitMQ, Database, Kafka, MACMA, MDM connection issue (only for authenticated users).

Liveness endpoint
cm/rm/rule-manager/internal/health/liveness

200

  • The microservice will still be alive, the internal liveness state is valid and do not need to get restarted.

Readiness endpoint
cm/rm/rule-manager/internal/health/readiness

RabbitMQ, Kafka , MDM

200

  • Rest API requests are still accepted by the microservice.

Database , MACMA

503

  • The requests will not be accepted by the microservice.

Dependency Use cases Impact

RABBITMQ

Lost connection to RabbitMQ.

Reasons:

  • RabbitMQ instance crashed, restarted, etc.

  • Network issue between microservice and RabbitMQ

  • The lost connection to RabbitMQ will be logged as warning.

  • The microservice is trying to reconnect infinitely.

  • We keep the service alive to keep the UI running.

  • RabbitMQ objects (e.g. queues and bindings) will be recreated once the microservice reconnected.

  • API will be still functional and entities should be saved in DB.

  • Publishing of messages is retried till RabbitMQ is up and running.

  • No messages aare lost since durable queues are used.

  • Rules are executed and events are created again when Rabbitmq is available.

Kafka

Lost connection to Kafka.

Reasons:

  • Kafka instance crashed, restarted etc.

  • Network issue between microservice and database.

  • The lost connection to Kafka will be logged as warning.

  • The microservice is trying to reconnect infinitely.

  • We keep the service alive to keep the UI running.

  • Kafka topics will be reconnected/recreated (when auto create is enabled) once the microservice reconnected.

  • API will still be functional and entities should be saved in DB.

  • Publishing of messages is retried till Kafka is up and running.

  • No messages are lost.

  • Rules are executed and events are created again when Kafka is available.

DATABASE

Lost connection to Database.

Reasons:

  • Database instance crashed, restarted etc.

  • Network issue between microservice and database.

  • The lost connection to database will be logged as warning.

  • The microservice is trying to reconnect infinitely.

  • The service is kept alive, but it won’t be ready to accept any requests.

  • The API will no longer be functional.

MACMA

Lost connection to MACMA.

Reasons:

  • MACMA instance crashed, restarted etc.

  • Network issue between microservice and MACMA.

  • The lost connection to MACMA will be logged as error in Rule Service App.

  • The microservice is trying to reconnect 6 hours default (configurable via MACMA_RETRYER_MAXATTEMTPS).

  • The service is kept alive, but it won’t be ready to accept any requests.

  • The API will no longer be functional.

  • After restarting MACMA, everything works fine again (get token, send requests) without restarting Rule Service App.

MDM

Lost connection to MDM.

Reasons:

  • MDM instance crashed, restarted etc.

  • Network issue between microservice and MDM.

  • The lost connection/failed requests to MDM will be logged as error.

  • The microservice is trying to reconnect 6 hours default (configurable via MDM_RETRYER_MAXATTEMTPS).

  • The API is still functional with the existing master data in RM Database.

  • After restarting MDM, everything works fine again without restarting Rule Service App.

  • Master data changes (Integration Events) after MDM stopped are not reflected to Rule Service App, MDM needs to retry publishing (stop Rule Service App. Create new device in MDM, link it to a facility. Stop MDM, start Rule Service App). After triggering master data reload via API, master data appears.

Stateful Function Executor

Endpoint Description

Health endpoint
cm/rm/stateful-function-executor/health

200 when service available.

  • Health endpoint is showing some details about the RabbitMQ and InfluxDB connection issue (only for authenticated users).

Liveness endpoint
cm/rm/stateful-function-executor/internal/health/liveness

200

  • The microservice will still be alive, the internal liveness state is valid and do not need to get restarted.

Readiness endpoint
cm/rm/stateful-function-executor/internal/health/readiness

200

  • When service is able to connect to RabbitMQ.

  • When service is able to connect to InfluxDB.

503

  • When service is not able to connect to RabbitMQ.

  • When service is not able to connect to InfluxDB.

Dependency Use cases Impact

RABBITMQ

Lost connection to RabbitMQ.

Reasons:

  • RabbitMQ instance crashed, restarted, etc.

  • Network issue between microservice and RabbitMQ

  • The lost connection to RabbitMQ will be logged as warning.

  • The microservice is trying to reconnect infinitely.

  • Rule execution will not be done at this time since messages cannot be consumed.

  • Rule execution is done after RabbitMQ connection is established.

INFLUXDB

Lost connection to InfluxDB.

Reasons:

  • InfluxDB instance crashed, restarted, etc.

  • Network issue between microservice and RabbitMQ

  • The lost connection to Influx will be logged on error.

  • The service is kept alive but it won’t be ready to accept any requests.

  • After restarting InfluxDB, the service will accept requests.

  • Several functionalities will not work anymore e.g.

    • no rules with previous are triggered

Rule Value Aggregator

Endpoint Description

Health endpoint
cm/rm/value-aggregator/health

200 when service available.

  • Health endpoint is showing some details about the Kafka and Kafka Streams State(only for authenticated users).

Liveness endpoint
cm/rm/value-aggregator/internal/health/liveness

503

  • When Kafka Streams is not running or re-balancing, then service is also not alive, the service should be restarted.

200

  • The microservice will still be alive, the internal liveness state is valid and do not need to get restarted.

Readiness endpoint
cm/rm/value-aggregator/internal/health/readiness

503

While connection is lost to Kafka and Kafka Streams is not in the RUNNING state.

  • The messages will not be consumed from service instance.

200

While service is connected to Kafka and Kafka Streams is in the RUNNING state.

  • The messages will be consumed from service instance.

Dependency Use cases Impact

Kafka

Lost connection to Kafka.

Reasons:

  • Kafka instance crashed, restarted etc.

  • Network issue between microservice and database.

  • The lost connection to Kafka will be logged as warning.

  • The microservice is trying to reconnect infinitely.

  • Kafka topics will be reconnected once the microservice reconnected.

  • Rule execution will not be done at this time since messages cannot be consumed.

  • Rule execution is done after Kafka connection is established.

Rule Value Provider

Endpoint Description

Health endpoint
cm/rm/value-provider/health

200 when service available.

  • Health endpoint is showing some details about the Kafka and Kafka Streams State(only for authenticated users).

Liveness endpoint
cm/rm/value-provider/internal/health/liveness

503

  • When Kafka Streams is not running or re-balancing, then service is also not alive, the service should be restarted.

200

  • The microservice will still be alive, the internal liveness state is valid and do not need to get restarted.

Readiness endpoint
cm/rm/value-provider/internal/health/readiness

503

While connection is lost to Kafka and Kafka Streams is not in the RUNNING state.

  • The messages will not be consumed from service instance.

200

While service is connected to Kafka and Kafka Streams is in the RUNNING state.

  • The messages will be consumed from service instance.

Dependency Use cases Impact

Kafka

Lost connection to Kafka.

Reasons:

  • Kafka instance crashed, restarted etc.

  • Network issue between microservice and database.

  • The lost connection to Kafka will be logged as warning.

  • The microservice is trying to reconnect infinitely.

  • Kafka topics will be reconnected once the microservice reconnected.

  • Rule execution will not be done at this time since messages cannot be consumed.

  • Rule execution is done after Kafka connection is established.

Rule Result Aggregator

Endpoint Description

Health endpoint
cm/rm/result-aggregator/health

200 when service available.

  • Health endpoint is showing some details about the Kafka and Kafka Streams State(only for authenticated users).

Liveness endpoint
cm/rm/result-aggregator/internal/health/liveness

503

  • When Kafka Streams is not running or re-balancing, then service is also not alive, the service should be restarted.

200

  • The microservice will still be alive, the internal liveness state is valid and do not need to get restarted.

Readiness endpoint
cm/rm/result-aggregator/internal/health/readiness

503

While connection is lost to Kafka and Kafka Streams is not in the RUNNING state.

  • The messages will not be consumed from service instance.

200

While service is connected to Kafka and Kafka Streams is in the RUNNING state.

  • The messages will be consumed from service instance.

Dependency Use cases Impact

Kafka

Lost connection to Kafka.

Reasons:

  • Kafka instance crashed, restarted etc.

  • Network issue between microservice and database.

  • The lost connection to Kafka will be logged as warning.

  • The microservice is trying to reconnect infinitely.

  • Kafka topics will be reconnected once the microservice reconnected.

  • Rule execution will not be done at this time since messages cannot consumed.

  • Rule execution is done after Kafka connection is established.

Rule Function Executor

Endpoint Description

Health endpoint
cm/rm/function-executor/health

200 when service available.

  • Health endpoint is showing some details about the Kafka and Kafka Streams State(only for authenticated users).

Liveness endpoint
cm/rm/function-executor/internal/health/liveness

503

  • When Kafka Streams is not running or re-balancing, then service is also not alive, the service should be restarted.

200

  • The microservice will still be alive, the internal liveness state is valid and do not need to get restarted.

Readiness endpoint
cm/rm/function-executor/internal/health/readiness

503

While connection is lost to Kafka and Kafka Streams is not in the RUNNING state.

  • The messages will not be consumed from service instance.

200

While service is connected to Kafka and Kafka Streams is in the RUNNING state.

  • The messages will be consumed from service instance.

Dependency Use cases Impact

Kafka

Lost connection to Kafka.

Reasons:

  • Kafka instance crashed, restarted etc.

  • Network issue between microservice and database.

  • The lost connection to Kafka will be logged as warning.

  • The microservice is trying to reconnect infinitely.

  • Kafka topics will be reconnected once the microservice reconnected.

  • Rule execution will not be done at this time since messages cannot be consumed

  • Rule execution is done after Kafka connection is established.

Contents

© Robert Bosch Manufacturing Solutions GmbH 2023-2025, all rights reserved

Changelog Corporate information Legal notice Data protection notice Third party licenses