SOT
    • Introduction
    • Release notes
      • 2025.03.00
        • RC2
        • RC1
      • 2025.02.01
        • SP10
        • SP9
        • SP8
        • SP7
        • SP6
        • SP5
        • SP3
        • SP2
        • SP1
      • 2025.02.00
        • SP25
        • SP24
        • SP23
        • SP22
        • SP21
        • SP20
        • SP19
        • SP18
        • SP17
        • SP16
        • SP15
        • SP14
        • SP13
        • SP12
        • SP11
        • SP10
        • SP9
        • SP8
        • SP7
        • SP6
        • SP5
        • SP4
        • SP3
        • SP2
        • SP1
    • Getting started
      • Getting access
      • Login
      • Main screen
      • Welcome dashboard
      • Detecting process anomalies
      • Analyzing data and detecting event sequences
      • Analyzing KPIs
    • How-tos
      • Monitors on production lines
        • Configuring the automatic login in the Smart Operations Toolkit
        • Configuring the automatic login to the identity provider with the Windows user
        • Setting cookies in the browser
        • Configuring the automatic logout in the Smart Operations Toolkit
        • Configuring the command line parameters in the browser
        • Known limitations and troubleshooting
      • Try out the APIs
    • Integration guide
      • Underlying concepts
        • Underlying concepts
        • Onboarding
        • Security
        • Communication
      • Integration journey
      • Example integrations
        • Node-RED
        • Power BI
      • Overview of APIs
    • Operations manual
      • Release
      • System architecture and interfaces
      • System requirements
        • Cluster requirements
        • Database requirements
        • Support for service meshes
      • Migration from previous SOT versions
      • Setup and configuration
        • Deployment process
        • Deployment with Helm
        • Advanced configuration
        • Integrations with external secret management solutions
        • Context paths
        • Service accounts and authorizations
        • Validation tests
        • Setup click once
        • Database user setup and configuration
      • Start and shutdown
      • Regular operations
        • User management & authentication
        • How to add additional tenants
        • How to access the cluster and pods
        • Automatic module role assignments in customer tenants
        • User credentials rotation - database and messaging secrets
      • Failure handling
        • Failure handling guidelines
        • Ansible operator troubleshooting
        • How to reach BCI for unresolved issues
      • Backup and restore
      • Logging and monitoring
        • The concept and conventions
        • ELK stack
        • ELK configurations aspects for beats
        • Proxy setup for ELK
        • Health endpoints configurations
      • Known limitations
      • Supporting functions
      • Security recommendations
        • Kubernetes
        • Security Best Practices for Databases
        • Certificates
        • Threat detection tools
    • Infrastructure manual
      • Release
      • System architecture and interfaces
        • RabbitMQ version support
      • System requirements
      • Migration from previous SOT infrastructure versions
      • Setup and configuration
        • Deployment process of the SOT infrastructure Helm chart
        • Deployment with Helm
      • Start and shutdown
      • Regular operations
        • RabbitMQ
          • User management & authentication
          • Disk size change
          • Upgrade performance with high performant disk type
          • Pod management policy
      • Failure handling
        • Connection failures
        • Data safety on the RabbitMQ side
        • Fix RabbitMQ cluster partitions
        • Delete unsynchronized RabbitMQ queues
        • How to reach BCI for unresolved issues
      • Backup and restore
      • Logging and monitoring
      • Known limitations
    • Training
    • Glossary
    • Further information and contact
Smart Operations Toolkit
  • Smart Operations Toolkit
    • Deviation Processor
    • Multitenant Access Control
    • Notification Service
    • Ticket Management
    • Web Portal
  • Shopfloor Management
    • Andon Live
    • KPI Reporting
    • Operational Routines
    • Shift Book
    • Shopfloor Management Administration
  • Product & Quality
    • Process Quality
    • AI Services
  • Machine & Equipment
    • Condition Monitoring
    • Device Portal
  • Enterprise & Shopfloor Integration
    • Information Router
    • Master Data Management

SOT Learning Portal

  • Smart Operations Toolkit
  • Operations manual
  • Logging and monitoring
  • The concept and conventions

The concept and conventions

SOT is a distributed system consisting of multiple self-contained modules which function together. The concept implemented focuses on the aspect of centralized operations of many systems. One key to having control and good observability of such a system is to treat specific logs as a kind of API, which has to be implemented by any operated system. This insight leads us to the need to log at least everything which enables support, operations, and monitoring in a homogenous and standardized way over all SOT components.

logging

All components of the system send metrics, traces, and logs to a log aggregator (for example, the ELK stack) allowing us to:

  • provide information as one source for monitoring the behavior and health from an inner application perspective

  • record failures

  • debug the application if necessary, e.g., while reproducing reported failures

  • track usage statistics of APIs

  • record security-relevant events

General logging conventions

The following are the guidelines implemented by each application module regarding logging:

  • No request-based logging per default
    Request-based logging is avoided. It can only be enabled temporarily and only limited to specific components. Anything else is logged in an aggregated way.

  • Technology-independent log pattern
    All applications log in the same manner, independent of used programming languages or frameworks.

  • Configurable log level
    The log level is treated as a configuration. The log level can be configured externally.

  • Environment independent logging
    The log is runtime environment independent, so whether the application runs on azure, on-premise, or elsewhere doesn’t matter.

  • Log analysis tool independent logging
    Supporting a new log aggregator do not trigger a release cascade of all applications. In other words, the log analysis tool is independent of logging.

  • Single line messages
    Logs are written as single-line formatted JSON - no line breaks.

  • Log to stdout
    Logs are written to stdout for collection.

  • Log in English
    Log messages presented to the end user are written in English.

Log levels

Log levels are configurable and can be one of the following. The default log level for SOT modules is WARN if it not defined differently.

  • TRACE - Anything at statement level such as start to calculate something, query something, loop x of y, etc…​ is logged at TRACE level.

  • DEBUG - Anything at the method or request processing level, e.g[].,

    • method entered or return with …​. is logged at DEBUG level

    • HTTP GET /some resource took 5 seconds status 200 but also 4xy statuses

  • INFO - In the following cases, the information is logged at INFO level, e.g.,

    • Process life cycle messages, such as

      • application started

      • application ready

      • shutdown application

    • Administrative events, such as

      • (re)load or flush caches

      • (re)load configuration

      • database migrations

    • handled exceptions, which do not require any additional investigations, such as

      • connection refused, try reconnect in X seconds

  • WARN - Uncommon behavior or situations, e.g.,

    • unavailability of services

    • security related events (potentially malicious)

    • handled failures that prevent the application from working, usually as long as the application can recover automatically

  • ERROR - Anything which potentially leads to unexpected behavior or crashes, e.g.,

    • unhandled failures

    • implementation failures

Configuration of default log levels during deployment

It is possible to configure the log level for all SOT modules during the helm deployment in a standardized way. The default log level is WARN. All modules have two logging parameters:

  • default - This parameter is used for all logging configurations as a fallback

  • application - This parameter is used for the logs that are written by the SOT applications. The logs normally include only logs from the business logic and exclude logs from frameworks

Some modules have further logging parameters defined (e.g. for different frameworks) which are documented in the module’s operations manuals. The log levels can be configured during the helm deployment via helm override file. It is possible to set the log level parameters per module or globally which will effect all SOT modules. Example:

global:
  logging:
    default: "ERROR"
    application: "INFO"
macma:
  local:
    logging:
      default: "INFO"
      application: "TRACE"

Information of interest

This section describes the information which will be logged. General information will be logged with any log event. Depending on the case the general information will be enriched by additional information e.g. in case of request based logs a correlationId will be logged additionally together with other information described in detail in the following sections.

This section describes the information that the system logs. General information is logged with any log event. Depending on the case, additional information enriches the general information. For example, in the case of request-based logs, a correlationId is logged. The following sections describe other information in detail.

General information

Attribute Description Information Origin Required

version

The version of the application, as specified during the build.

Build time configuration.

Yes.

timestamp

The time when the log was emitted as ISO 8601 formatted timestamp.

System time.

Yes.

thread

The thread name at which the log was generated.

Process.

No.

system

The name of the system, e.g. DEV, TEST, PROD, etc…​

Environment variable NEXEED_GLOBAL_SYSTEM_NAME.

Yes.

stackTraces

Stack traces of caught exceptions.

Source code.

No.

product

The name of the product, the specific application belongs to.

Build time / runtime configuration.

Yes.

loglevel

One of TRACE, DEBUG, INFO, WARN, ERROR.

Source code.

Yes.

logger

The name of the logger.

Source code.

Yes.

instance

The name or identifier of the instance.

Environment variable NEXEED_GLOBAL_APPLICATION_INSTANCE_ID.

Yes.

environment

The name of the environment, e.g. AZURE, ONPREM@<CUSTOMER>.

Environment variable NEXEED_GLOBAL_ENVIRONMENT_NAME.

Yes.

application

The name of the application, as specified during the build.

Build time configuration.

Yes.

Request / event-based information

Attribute Description Information Origin Required

correlationId

The request related correlationId.

E.g. request header.

Yes, if available.

trace_id

trace_id as specified by W3CTraceContext specification and recommended by the OpenTelemetry Logging specification for JSON logs.

E.g. request header.

Yes, if available.

span_id

span_id as specified by W3CTraceContext specification and recommended by the OpenTelemetry Logging specification for JSON logs.

E.g. request header.

Yes, if available.

tenantId

The tenant in whose context the request was processed.

E.g. access token.

Yes, as soon as the information is available.

userId

The user in whose context the request was processed.

E.g. access token.

Yes, as soon as the information is available.

The goal is to support tracing as specified by the W3CTraceContext, OpenTelemetry/Tracing, and OpenTelemetry/Logging specifications. Since not all components are migrated simultaneously, the correlationId header is still supported. As soon as all components support the standard tracing approach, the support for the correlationId will be removed.

LIFE-CYCLE log

The following information is logged to know when an application started, stopped, and is ready.

Attribute Description Required Default

logger

LIFE-CYCLE

Yes.

N/A

loglevel

INFO

Yes.

N/A

status

One of:

  • STARTED

  • INITIALIZATION

  • READY

  • UNHEALTHY

  • STOPPING

  • CRASHING

  • RESTARTING

Yes.

N/A

message

An optional reason phrase.

No.

N/A

SECURITY logs

Security-related events which are logged are, for example:

  • call to non existing endpoints

  • rejected calls, e.g., caused by failed input validation

  • unauthorized calls

  • signature verification failures

  • usage of expired tokens

Attribute Description Required Default

logger

SECURITY

Yes.

N/A

loglevel

One of:

  • INFO

  • WARN

Yes.

N/A

correlationId

Yes, as soon as the information is available.

N/A

trace_id

Yes, as soon as the information is available.

N/A

span_id

Yes, as soon as the information is available.

N/A

tenantId

Yes, as soon as the information is available.

N/A

userId

Yes, as soon as the information is available.

N/A

sourceIp

Yes, as soon as the information is available.

N/A

event

What happened, e.g

  • LOGIN_FAILED

  • LOGIN_SUCCESSFUL

  • TOKEN_EXPIRED

  • TOKEN_SIGNATURE_INVALID

  • TOKEN_OUTDATED

  • TOKEN_VALIDATION_FAILED

  • SENSITIVE_DATA_ACCESSED

  • INPUT_VALIDATION_FAILED

  • TLS_DISABLED

Yes.

N/A

message

An optional reason phrase.

No.

N/A

Specific SECURITY events

Attribute Description Required Default

WARN

TLS_DISABLED

Environment Variable: NEXEED_GLOBAL_DISABLE_TLS.

If this variable is set to true, this must be logged.

OpenTelemetry integration

The following section describes the configuration and usage of SOT in combination with OpenTelemetry (OTEL).

Configuration of default OpenTelemetry during deployment

It is possible to configure the OpenTelemetry agents for all SOT modules that are supporting OpenTelemetry during the Helm deployment in a standardized way.

The proper environment variables for OTEL agent are injected automatically in all pods via a configmap (otel-observability-configmap), a secret (otel-observability-secret) and a pod environment variable (OTEL_SERVICE_NAME).

One can disable the automatic injection by setting the following variable in the module local section:

<module>:
  local:
    observability:
      otelAutoInjectEnvParams: false

In the observability node of the global section you can define defaults for all modules supporting OpenTelemetry. You can enable or disable the exporter or the feature flags for tracing, metrics, or logging. The default for all settings is disabled. You can overwrite the global configuration by module-specific configuration in the local section of the module.

global:
  observability:
    otlpEnabled: true
    otlpTracingEnabled: true
    otlpUrl: "<yourUrl>"
    otlpHeaders: "<headerContainingYourAuthenticationSecrets>"
macma:
  local:
    observability:
      otlpTracingEnabled: false

The following settings can be configured in both global umbrella chart and module local dictionary under observability key, local dictionary having precendence over global one:

Parameter Required Description

observability.otlpUrl

yes

The URL endpoint for the OpenTelemetry Protocol (OTLP) collector.

observability.otlpEnabled

no

Flag to enable or disable the OpenTelemetry Protocol (OTLP) exporter.

observability.otlpTracingEnabled

no

Flag to enable or disable tracing for OpenTelemetry Protocol (OTLP).

observability.otlpMetricEnabled

no

Flag to enable or disable metrics for OpenTelemetry Protocol (OTLP).

observability.otlpLoggingEnabled

no

Flag to enable or disable logging for OpenTelemetry Protocol (OTLP).

observability.otlpHeader

yes if otel collector/endpoint requires authentication

The header of the messages for OpenTelemetry Protocol (OTLP). Will be mapped to a secret as it might contain the Authorization Secrets.

observability.otelResourceAttributes

no

A comma-separated list of key-value pairs to define resource attributes, which describe the entity producing telemetry data. Example: 'deployment.environment=<myEnvironmentKey>'

observability.otlpProtocol

no

Protocol used for otel communication - one of grpc, http/protobuf or http/json

observability.tracesSampler

no

Sampler to be used for traces

observability.tracesSamplerArg

no

String value to be used as the sampler argument

For more information about the values which can be passed in the variables please consult OTEL agent official documentation.

Contents

© Robert Bosch Manufacturing Solutions GmbH 2023-2026, all rights reserved

Changelog Corporate information Legal notice Data protection notice Third party licenses