Health and availability APIs
The health and availability status over a container lifetime is published via representational state transfer rest APIs.
Health Endpoint
Reports the current health status of the service, an unauthenticated requests returns a minimal response providing the health state.
{
"health": "degraded"
}
- Possible responses
-
-
HTTP 200 OK - Returned when given token is valid with authorized resource or without token
-
HTTP 403 FORBIDDEN - Returned when given token isn’t authorized (ex. Missing resource and permission)
-
HTTP 401 UNAUTHORIZED - Returned when given token is invalid
-
An authenticated request returns a detailed response
{
"name": "Bosch.Nexeed.WebPortal.CoreService",
"description": "Health check for Bosch.Nexeed.WebPortal.CoreService",
"instanceId": "bd280485-369c-48fc-91e4-a23ff5ba0738",
"startupTime": "2024-05-10T09:14:42.0856869+00:00",
"version": "5.16.0-dev1294071",
"ready": true,
"health": "healthy",
"onStateSince": "2024-05-10T09:14:52.2530809+00:00",
"dependencies": [
{
"name": "portal_health",
"description": "Portal service health checks",
"available": true,
"isRequired": true,
"duration": "00:00:00.0001254",
"details": {
"targetFrameworkName": ".NETCoreApp,Version=v8.0",
"companyInfo": "Robert Bosch Manufacturing Solutions GmbH",
"appTitle": "Bosch.Nexeed.WebPortal.CoreService",
"targetFramework": ".NET 8.0",
"description": "WebPortal_Backend_5.16.24131.13",
"copyrightInfo": "Copyright © Robert Bosch Manufacturing Solutions GmbH",
"informationalVersionString": "0.0.0+9304053cad3e8c76e3f73fb384474e633da3c36d",
"versionString": "5.16.24131.13",
"productInfo": "'undefined'"
}
},
{
"name": "access_provider_health",
"url": "https://domain.bosch.com/iam/ping",
"available": true,
"isRequired": true,
"duration": "00:00:00.0226407",
"details": {
"healthResponseMessage": "{\"status\":\"UP\"}"
}
},
{
"name": "rabbitmq_health",
"available": true,
"isRequired": true,
"duration": "00:00:00.0000086",
"details": {
"007c1c2f-0f55-4173-a16c-6d569bee039b": {
"connectorName": "PortalRabbitMqConnector",
"connectorType": "RabbitMqConnector",
"status": "healthy"
}
}
},
{
"name": "mdm_service",
"url": "https://domain.bosch.com/mdm/equipment-management/ping",
"available": true,
"isRequired": false,
"duration": "00:00:00.0116536",
"details": {}
},
{
"name": "database_health",
"available": true,
"isRequired": true,
"duration": "00:00:00.0003362",
"details": {}
}
]
}
Liveness Endpoint
Can be used by a container orchestration system like Kubernetes to take automatic actions.
- A liveness probe can return
-
-
"Yes, I’m alive!" which will be evaluated by the runtime environment and no further action is executed.
-
"No I’m not alive!" which will be evaluated by the runtime environment. The container will be killed and restarted by the runtime environment.
-
Or it simply fails since the container is not able to respond to the probe for some reasons. The container will be killed and restarted by the runtime environment.
-
Readiness Endpoint
Can be used by a container orchestration system like Kubernetes to take automatic actions.
- A readiness probe can return
-
-
"Yes, I’m ready and can serve functionality to my clients!" which will be evaluated by the runtime environment and no further action is executed.
-
"No, I’m not ready!" which will be evaluated by the runtime environment and no traffic will be routed to the container. The runtime envrionment will ask again with a configurable amount of retries and with a configurable delay. If this probe fails everytime the pod will marked as unready.
-
Or it simply fails since the container is not able to respond to the probe for some reasons. The runtime environment will stop routing traffic to the container. The runtime environment will ask again with a configurable amount of retries and with a configurable delay. If this probe fails everytime the pod will marked as unready
-
Startup Endpoint
Can be used by a container orchestration system like Kubernetes to take automatic actions. A startup probe is intended to suppress other probes during containers startup or initialization phases for preventing restarts.
- A startup probe can return
-
-
"I’m successfully started." From now on the runtime environment will start executing the liveness probes.
-
"I’m in the startup phase." the runtime environment will ask again with a configurable amount of retries and with configurable delay. If this probe fails everytime the pod might be restarted (depending on the pods restart policy).
-
Or it simply fails since the container is not able to respond to the probe for some reasons.
-
portal/coreservice:
- Health Endpoint
-
https://<domain>/api/core/health - Startup Endpoint
-
https://<domain>/api/core/health/startup - Liveness Endpoint
-
https://<domain>/api/core/health/live - Readiness Endpoint
-
https://<domain>/api/core/health/ready
Scenario: all dependencies are resolved
- Description
-
Service is working as expected
- Behavior
-
Service is set up and running. It can respond to API requests
- Impact
-
None
| Health | Startup | Liveness | Readiness |
|---|---|---|---|
200 |
200 |
200 |
200 |
Scenario: lost connection to MACMA
- Description
-
MACMA health status is down
- Behavior
-
The health of the service will be down
- Impact
-
-
The authorized APIs will no longer be functional
-
| Health | Startup | Liveness | Readiness |
|---|---|---|---|
503 |
503 |
200 |
200 |
Scenario: lost connection to Master Data Management equipment service
- Description
- Behavior
-
-
Issues in dashboard
-
- Impact
-
-
Facility selector in dashboard will not work and keep throwing console errors
-
Widgets that work with facility details will fail to work as expected
-
Loses information on newly created facility/devices
-
Facility APIs will not be functional
-
The Master Data Management service health check will be part of Core service health response, but the Core service status will not be affected by it
-
| Health | Startup | Liveness | Readiness |
|---|---|---|---|
200 |
200 |
200 |
200 |
Scenario: lost connection to RabbitMQ
- Description
-
-
RabbitMQ instance crashed, restarted, etc.
-
Network issue between microservice and RabbitMQ
-
- Behavior
-
-
The lost connection to RabbitMQ will be logged when an event is tried to be sent.
-
The health of the service will be degraded
-
There will be an infinite retry to connect to RabbitMQ (also when RabbitMQ is down at startup)
-
- Impact
-
-
API will still be functional
-
MACMA and MDM events will not be processed immediately, but will be processed as soon as the connection is restored
-
User/tenant data deletion will be delayed until the connection is restored
-
MDM data sync will be delayed until the connection is restored
-
| Health | Startup | Liveness | Readiness |
|---|---|---|---|
200 |
200 |
200 |
200 |
Scenario: cannot fetch discovery document from MACMA
- Description
-
-
MACMA application is not registered to portal
-
- Behavior
-
-
Impacts authentication/authorization of APIs
-
- Impact
-
-
Blocks communication with MACMA and UI crashes with error banner
-
APIs will not be functional
-
| Health | Startup | Liveness | Readiness |
|---|---|---|---|
503 |
200 |
200 |
503 |
Scenario: lost connection to DB
- Description
- Behavior
-
-
Service is unable to persist data and hence leads to data loss
-
- Impact
-
-
Data cannot be persisted in Core service
-
Manual application and views
-
Web Portal footer and skinning configuration
-
Dashboards
-
| Health | Startup | Liveness | Readiness |
|---|---|---|---|
503 |
503 |
200 |
200 |
Scenario: unable to complete a database migration
- Description
- Behavior
-
-
Features of the failed migration no longer works
-
Might affect already existing functionalities due to partial migration
-
- Impact
-
-
Impact would depend on the failed migration script
-
Check will not impact Kubernetes probes, but service health would be impacted
-
| Health | Startup | Liveness | Readiness |
|---|---|---|---|
503 |
200 |
200 |
200 |