Nexeed
    • Introduction
    • Concepts
      • Domain model
    • User manual
      • Device types
        • Manage or create a new Measuring Point for a device type
        • Manage or create a new error definition for a device type
        • Manage devices for a device type
      • Devices
        • Manage or create a new Measuring Point for a device
        • Manage or create a new error definition for a device
      • Topology
        • Navigate the topology
      • Error definitions
      • Measuring points
      • Processes
      • Process groups
      • Material definitions
    • Operations manual
      • Overview
      • System architecture and interfaces
      • System requirements
        • Equipment management service
        • Material service
        • Messaging service
        • Nginx gateway
        • Process service
      • Migration from previous versions
        • History of current versions
        • History of older versions
      • Setup and configuration
        • Helm configuration
        • Horizontal scalability for services in HELM deployments
        • Module health verification Endpoints and K8s probes
        • Data migration & synchronization
        • mmpd/equipment-service
        • mmpd/messaging-service
        • mmpd/process-service
        • mmpd/material-service
      • Start and shutdown
      • Regular operations
        • Deletion policy
        • Entities & fields
        • Resources and roles
      • Failure handling
        • Health verification Endpoints
        • Resiliency against failures in RabbitMQ
      • Backup and Restore
      • Logging and monitoring
      • Known limitations
    • API documentations
      • Equipment HTTP API
      • Process HTTP API
      • Material HTTP API
    • Glossary
Master Data Management
  • Industrial Application System
  • Core Services
    • Block Management
    • Deviation Processor
    • ID Builder
    • Multitenant Access Control
    • Notification Service
    • Ticket Management
    • Web Portal
  • Shopfloor Management
    • Andon Live
    • Global Production Overview
    • KPI Reporting
    • Operational Routines
    • Shift Book
    • Shopfloor Management Administration
  • Product & Quality
    • Product Setup Management
    • Part Traceability
    • Process Quality
    • Setup Specs
  • Execution
    • Line Control
    • Material Management
    • Order Management
    • Packaging Control
    • Rework Control
  • Intralogistics
    • AGV Control Center
    • Stock Management
    • Transport Management
  • Machine & Equipment
    • Condition Monitoring
    • Device Portal
    • Maintenance Management
    • Tool Management
  • Enterprise & Shopfloor Integration
    • Archiving Bridge
    • Data Publisher
    • Direct Data Link
    • Engineering UI
    • ERP Connectivity
    • Gateway
    • Information Router
    • Master Data Management
    • Orchestrator

Nexeed Learning Portal

  • Master Data Management
  • Operations manual
  • Setup and configuration
  • Module health verification Endpoints and K8s probes
preview v9.0.0

Module health verification Endpoints and K8s probes

  • Health endpoint: Will be used by monitoring to determine the health state of a Microservice. The health endpoint will also include the states of the Liveness, Readiness and Ping States. It is not intended to be used by the container management system, but for privileged (authorization required) users like operators or tools of an operator. For that reason the endpoints are exposed and will provide additional information than a simple healthy, unhealthy status.

  • Readiness endpoint: Will be polled by Kubernetes to check, if a Microservice (pod) can accept traffic.

  • Liveness endpoint: Will be polled by Kubernetes to check, if a Microservice internal state is valid. Kubernetes will restart the Container if the liveness endpoint fails for the configured number of times.

  • Ping endpoint: Always returns "Healthy". Is intended to be used as a subsequent call executed by the health endpoint implementation of a service. It ensures only that a service client is able to connect to a service.

Services

Both process and equipment service have the same health and k8s probe endpoints available at their respective base URL address (ex: https://server.com/mdm/process/health, https://server.com/mdm/health)

Health and probes

  • Health Endpoint

    • /health

    • down: 503

      • Health endpoint is showing down and answering 503. We think that the microservice is not healthy in case of lost connection to RabbitMQ, MACMA and Database.

      • Health endpoint is showing some details about the RabbitMQ, MACMA, Database, Portal and Equipment API service connections issue (only for authenticated users).

    • up: 200

  • Liveness Endpoint

    • /health/live

    • unhealthy: 503

      • only during startup and shutdown it will be down

    • healthy: 200

  • Readiness Endpoint

    • /health/readiness

    • unhealthy: 503

      • If Database, RabbitMQ and MACMA are down we will not accept requests to the microservice

    • healthy: 200

      • If Portal and Equipment services are down we can still accept requests to the microservice

  • Ping Endpoint

    • /ping

    • always healthy: 200

Dependencies

The status of probes (healthy/unhealthy) are based on health checks for service dependecies

  • RabbitMQ

    • Use cases:
      Lost connection to RabbitMQ.
      Reasons:

      • RabbitMQ instance crashed, restarted, etc.

      • Network issue between microservice and RabbitMQ

    • General Behavior

      • The lost connection to RabbitMQ will be logged when an event is tried to be sent.

      • We keep the Service alive and ready to accept requests.

      • The re-established connection to RabbitMQ will be logged for the first event that is sent after RabbitMQ comes back online.

    • Impact

      • MDM service will still be functional if the messaging 'enabled' flag is set to false in configuration.

  • Database

    • Use cases:
      Lost connection to the Database.
      Reasons:

      • Database instance crashed, restarted etc

      • Network issue between microservice and Database

    • General Behavior

      • The lost connection to Database will be logged on error.

      • The microservice is trying to reconnect infinitely.

      • We keep the Service alive, but it won’t be ready to accept any requests.

    • Impact

      • The service will no longer be functional

  • MACMA

    • Use cases:
      Lost connection to MACMA.
      Reasons:

      • MACMA instance crashed, restarted etc

      • Network issue between microservice and MACMA

      • MACMA cannot accept traffic (readiness state is "Unhealty")

    • General Behavior

      • The lost connection to MACMA will be logged on error.

      • The microservice is trying to reconnect infinitely.

      • We keep the Service alive, but it won’t be ready to accept any requests.

    • Impact

      • The service will no longer be functional

  • Portal

    • Use cases:
      Lost connection to Portal.
      Reasons:

      • Portal instance crashed, restarted etc

      • Network issue between microservice and Portal

      • Portal cannot accept traffic (readiness state is "Unhealty")

    • General Behavior

      • The lost connection to Portal will be logged on error.

      • The microservice is trying to reconnect infinitely.

      • We keep the Service alive, but it won’t be ready to accept any requests.

    • Impact

      • The service will be still functional as its optional service.

  • Equipment API

    • Use cases:
      Lost connection to Equipment API service.
      Reasons:

      • Equipment API service instance crashed, restarted etc

      • Network issue between microservice and Equipment API service

      • Equipment API service cannot accept traffic (readiness state is "Unhealty")

    • General Behavior

      • The lost connection to Equipment service will be logged on error.

      • The microservice is trying to reconnect infinitely.

      • We keep the Service alive, but it won’t be ready to accept any requests.

    • Impact

      • The process service will be still functional as its optional service.

Nginx health endpoints

The Nginx gateway provides no builtin health endpoints and no extra configuration is added

  • Liveness: is not needed as nginx starts quickly and if the process exits then the container is restarted anyway. *Readiness: not implemented Theoretically there is a short time during the startup when this would be needed. Because in MDM we use a simple configuration we consider this startup time is very short and can be safely ignored.

Infrastructure outages

If a required infrastructure or module will become unavailable MDM will immediately reflect the status in the health response. The general rule is that when a failed dependency becomes available again MDM will automatically restore connection without any user intervention.

  • MACMA MDM will try connection on every incoming authenticated request. There is no continuous background connection to Macma.

  • RabbitMq MDM will reconnect for listening to incoming messages as soon as RabbitMq becomes available, with a delay in the range of 10 - 30 seconds.

  • Database MDM will attempt a DB connection at every incoming service request and thus restore connectivity as soon as the database becomes available. The /health state has no influence on the db connection attempts.

  • Portal Once registered successfully, MDM does not re-attempt registration if portal transitions between unhealthy and healthy states. The portal registration flow is not related in any way to the health monitoring endpoints.

  • Internal MDM services MDM services attempt to connect to each other whenever needed by an incoming API request.

Contents

© Robert Bosch Manufacturing Solutions GmbH 2023-2025, all rights reserved

Changelog Corporate information Legal notice Data protection notice Third party licenses