Health and availability APIs

The health and availability status over a container lifetime is published via representational state transfer rest APIs.

Health Endpoint

Reports the current health status of the service, an unauthenticated requests returns a minimal response providing the health state.

{
    "health": "degraded"
}

Possible responses

HTTP 200 OK - Returned when given token is valid with authorized resource or without token
HTTP 403 FORBIDDEN - Returned when given token isn’t authorized (ex. Missing resource and permission)
HTTP 401 UNAUTHORIZED - Returned when given token is invalid

An authenticated request returns a detailed response

{
    "name": "Bosch.Nexeed.WebPortal.CoreService",
    "description": "Health check for Bosch.Nexeed.WebPortal.CoreService",
    "instanceId": "bd280485-369c-48fc-91e4-a23ff5ba0738",
    "startupTime": "2024-05-10T09:14:42.0856869+00:00",
    "version": "5.16.0-dev1294071",
    "ready": true,
    "health": "healthy",
    "onStateSince": "2024-05-10T09:14:52.2530809+00:00",
    "dependencies": [
        {
            "name": "portal_health",
            "description": "Portal service health checks",
            "available": true,
            "isRequired": true,
            "duration": "00:00:00.0001254",
            "details": {
                "targetFrameworkName": ".NETCoreApp,Version=v8.0",
                "companyInfo": "Robert Bosch Manufacturing Solutions GmbH",
                "appTitle": "Bosch.Nexeed.WebPortal.CoreService",
                "targetFramework": ".NET 8.0",
                "description": "WebPortal_Backend_5.16.24131.13",
                "copyrightInfo": "Copyright © Robert Bosch Manufacturing Solutions GmbH",
                "informationalVersionString": "0.0.0+9304053cad3e8c76e3f73fb384474e633da3c36d",
                "versionString": "5.16.24131.13",
                "productInfo": "'undefined'"
            }
        },
        {
            "name": "access_provider_health",
            "url": "https://domain.bosch.com/iam/ping",
            "available": true,
            "isRequired": true,
            "duration": "00:00:00.0226407",
            "details": {
                "healthResponseMessage": "{\"status\":\"UP\"}"
            }
        },
        {
            "name": "rabbitmq_health",
            "available": true,
            "isRequired": true,
            "duration": "00:00:00.0000086",
            "details": {
                "007c1c2f-0f55-4173-a16c-6d569bee039b": {
                    "connectorName": "PortalRabbitMqConnector",
                    "connectorType": "RabbitMqConnector",
                    "status": "healthy"
                }
            }
        },
        {
            "name": "mdm_service",
            "url": "https://domain.bosch.com/mdm/equipment-management/ping",
            "available": true,
            "isRequired": false,
            "duration": "00:00:00.0116536",
            "details": {}
        },
        {
            "name": "database_health",
            "available": true,
            "isRequired": true,
            "duration": "00:00:00.0003362",
            "details": {}
        }
    ]
}

Liveness Endpoint

Can be used by a container orchestration system like Kubernetes to take automatic actions.

A liveness probe can return

"Yes, I’m alive!" which will be evaluated by the runtime environment and no further action is executed.
"No I’m not alive!" which will be evaluated by the runtime environment. The container will be killed and restarted by the runtime environment.
Or it simply fails since the container is not able to respond to the probe for some reasons. The container will be killed and restarted by the runtime environment.

Readiness Endpoint

Can be used by a container orchestration system like Kubernetes to take automatic actions.

A readiness probe can return

"Yes, I’m ready and can serve functionality to my clients!" which will be evaluated by the runtime environment and no further action is executed.
"No, I’m not ready!" which will be evaluated by the runtime environment and no traffic will be routed to the container. The runtime envrionment will ask again with a configurable amount of retries and with a configurable delay. If this probe fails everytime the pod will marked as unready.
Or it simply fails since the container is not able to respond to the probe for some reasons. The runtime environment will stop routing traffic to the container. The runtime environment will ask again with a configurable amount of retries and with a configurable delay. If this probe fails everytime the pod will marked as unready

Startup Endpoint

Can be used by a container orchestration system like Kubernetes to take automatic actions. A startup probe is intended to suppress other probes during containers startup or initialization phases for preventing restarts.

A startup probe can return

"I’m successfully started." From now on the runtime environment will start executing the liveness probes.
"I’m in the startup phase." the runtime environment will ask again with a configurable amount of retries and with configurable delay. If this probe fails everytime the pod might be restarted (depending on the pods restart policy).
Or it simply fails since the container is not able to respond to the probe for some reasons.

portal/coreservice:

Health Endpoint: https://<domain>/api/core/health
Startup Endpoint: https://<domain>/api/core/health/startup
Liveness Endpoint: https://<domain>/api/core/health/live
Readiness Endpoint: https://<domain>/api/core/health/ready

Scenario: all dependencies are resolved

Description: Service is working as expected
Behavior: Service is set up and running. It can respond to API requests
Impact: None

Health	Startup	Liveness	Readiness
200	200	200	200

Health

Startup

Liveness

Readiness

200

Scenario: lost connection to MACMA

Description

MACMA health status is down

Behavior

The health of the service will be down

Impact

The authorized APIs will no longer be functional

Health	Startup	Liveness	Readiness
503	503	200	200

Health

Startup

Liveness

Readiness

503

200

Scenario: lost connection to Master Data Management equipment service

Description

Behavior

Issues in dashboard

Impact

Facility selector in dashboard will not work and keep throwing console errors
Widgets that work with facility details will fail to work as expected
Loses information on newly created facility/devices
Facility APIs will not be functional
The Master Data Management service health check will be part of Core service health response, but the Core service status will not be affected by it

Health	Startup	Liveness	Readiness
200	200	200	200

Health

Startup

Liveness

Readiness

200

Scenario: lost connection to RabbitMQ

Description

RabbitMQ instance crashed, restarted, etc.
Network issue between microservice and RabbitMQ

Behavior

The lost connection to RabbitMQ will be logged when an event is tried to be sent.
The health of the service will be degraded
There will be an infinite retry to connect to RabbitMQ (also when RabbitMQ is down at startup)

Impact

API will still be functional
MACMA and MDM events will not be processed immediately, but will be processed as soon as the connection is restored
User/tenant data deletion will be delayed until the connection is restored
MDM data sync will be delayed until the connection is restored

Health	Startup	Liveness	Readiness
200	200	200	200

Health

Startup

Liveness

Readiness

200

Scenario: cannot fetch discovery document from MACMA

Description

MACMA application is not registered to portal

Behavior

Impacts authentication/authorization of APIs

Impact

Blocks communication with MACMA and UI crashes with error banner
APIs will not be functional

Health	Startup	Liveness	Readiness
503	200	200	503

Health

Startup

Liveness

Readiness

503

200

503

Scenario: lost connection to DB

Description

Behavior

Service is unable to persist data and hence leads to data loss

Impact

Data cannot be persisted in Core service
Manual application and views
Web Portal footer and skinning configuration
Dashboards

Health	Startup	Liveness	Readiness
503	503	200	200

Health

Startup

Liveness

Readiness

503

200

Scenario: unable to complete a database migration

Description

Behavior

Features of the failed migration no longer works
Might affect already existing functionalities due to partial migration

Impact

Impact would depend on the failed migration script
Check will not impact Kubernetes probes, but service health would be impacted

Health	Startup	Liveness	Readiness
503	200	200	200

Health

Startup

Liveness

Readiness

503

200