Nexeed
    • Introduction
    • Getting started
      • Getting access
      • Login
      • Main screen
      • Welcome dashboard
      • Detecting process anomalies
      • Analyzing data and detecting event sequences
      • Analyzing KPIs
    • How-tos
      • Monitors on production lines
        • Configuring the automatic login in the Nexeed Industrial Application System
        • Configuring the automatic login to the identity provider with the Windows user
        • Setting cookies in the browser
        • Configuring the automatic logout in the Nexeed Industrial Application System
        • Configuring the command line parameters in the browser
        • Known limitations and troubleshooting
      • Try out the APIs
    • Integration guide
      • Underlying concepts
        • Underlying concepts
        • Onboarding
        • Security
        • Communication
      • Integration journey
      • Overview of APIs
    • Operations manual
      • Release
      • System architecture and interfaces
      • System requirements
        • Cluster requirements
        • Database requirements
        • Support for service meshes
      • Migration from previous Nexeed IAS versions
      • Setup and configuration
        • Deployment process
        • Deployment with Helm
        • Advanced configuration
        • Integrations with external secret management solutions
        • Context paths
        • Service accounts and authorizations
        • Validation tests
        • Setup click once
        • Database user setup and configuration
      • Start and shutdown
      • Regular operations
        • User management & authentication
        • How to add additional tenants
        • How to access the cluster and pods
        • Automatic module role assignments in customer tenants
        • User credentials rotation - database and messaging secrets
      • Failure handling
        • Failure handling guidelines
        • Ansible operator troubleshooting
        • How to reach BCI for unresolved issues
      • Backup and restore
      • Logging and monitoring
        • The concept and conventions
        • ELK stack
        • ELK configurations aspects for beats
        • Proxy setup for ELK
        • Health endpoints configurations
      • Known limitations
      • Supporting functions
      • Security recommendations
        • Kubernetes
        • Security Best Practices for Databases
        • Certificates
        • Threat detection tools
    • Infrastructure manual
      • Release
      • System architecture and interfaces
        • RabbitMQ version support
      • System requirements
      • Migration from previous Nexeed infrastructure versions
      • Setup and configuration
        • Deployment process of the Nexeed infrastructure Helm chart
        • Deployment with Helm
      • Start and shutdown
      • Regular operations
        • RabbitMQ
          • User management & authentication
          • Disk size change
          • Upgrade performance with high performant disk type
          • Pod management policy
      • Failure handling
        • Connection failures
        • Data safety on the RabbitMQ side
        • Fix RabbitMQ cluster partitions
        • Delete unsynchronized RabbitMQ queues
        • How to reach BCI for unresolved issues
      • Backup and restore
      • Logging and monitoring
      • Known limitations
    • Glossary
    • Further information and contact
Industrial Application System
  • Industrial Application System
  • Core Services
    • Block Management
    • Deviation Processor
    • ID Builder
    • Multitenant Access Control
    • Notification Service
    • Ticket Management
    • Web Portal
  • Shopfloor Management
    • Andon Live
    • Global Production Overview
    • KPI Reporting
    • Operational Routines
    • Shift Book
    • Shopfloor Management Administration
  • Product & Quality
    • Product Setup Management
    • Part Traceability
    • Process Quality
    • Setup Specs
  • Execution
    • Line Control
    • Material Management
    • Order Management
    • Packaging Control
    • Rework Control
  • Intralogistics
    • AGV Control Center
    • Stock Management
    • Transport Management
  • Machine & Equipment
    • Condition Monitoring
    • Device Portal
    • Maintenance Management
    • Tool Management
  • Enterprise & Shopfloor Integration
    • Archiving Bridge
    • Data Publisher
    • Direct Data Link
    • Engineering UI
    • ERP Connectivity
    • Gateway
    • Information Router
    • Master Data Management
    • Orchestrator

Nexeed Learning Portal

  • Industrial Application System
  • Operations manual
  • Failure handling
  • Failure handling guidelines
preview 2025.03.00

Failure handling guidelines

Common deployment errors

Connection refused, 401: unauthorized, 404 not found when downloading the images during the deployment

This error usually happens when the proxy blocks the request to the Docker registry. It can mean the proxy is not configured correctly on the VM, or the Docker registry should be whitelisted.

401:unauthorized when trying to get the helm charts

This error usually happens when the proxy blocks the request to the Helm registry. It can mean the proxy is not configured correctly on the VM, or the Helm registry should be whitelisted.

Ingress is not deploying / the node is not ingress ready

This error usually happens when the node doesn’t have the correct label assigned. This issue is solved by running the following command:

kubectl label nodes $HOSTNAME ingress=ready

Forbidden! Configured service account doesn’t have access / Service account may have been revoked / User "system:serviceaccount:default:default" cannot get services in the namespace "[namespace_name]"

This error usually means that the cluster is missing a service account role. The issue can be resolved by running:

kubectl create clusterrolebinding default-serviceaccount-rb --clusterrole=cluster-admin --serviceaccount=default:default

Any errors that indicate disk is full

This error can be resolved by adjusting the size of your /var partition or changing the location of K3s files from /var/lib/rancher to another place that has enough space.

Post-deployment errors

Connection refused

When encountering this error, you need to check two things:

  • Firewall

    • When debugging this issue for a faster outcome, the firewall can be temporarily disabled. If the connection is restored, the problem comes from the firewall.

    • Check if the firewall is active and if all the necessary rules exist. If you run a Redhat system, you can run the following commands:

      systemctl status firewalld
      firewall-cmd --list-all
  • Ingress

    • Check if the ingress pod is up and has no error by running the following commands:

      kubectl get pods -n ingress-nginx
      kubectl logs -n ingress-nginx {pod_name}
    • If the issue is with the ingress most of the time a simple restart will work. This can be achived by running the following command:

      kubectl rollout restart deployment -n ingress-nginx

NGNIX: 503 service unavailable

This error point out to Nginx ingress controller being unable to connect to backend services.

Check the following kubernetes objects:

  • ingress definition corresponding to the module path raising 504 error

  • service state for the service specified in the ingress definition: check if there are any endpoints connected to the service

  • if the endpoints are missing, check the pods specified in the service definition for not-ready state

  • if the pods are not ready, check the pod logs to understand why the readiness probe is failing

If the configuration looks fine but the error is still present, ask for support for kubernetes admin team since it might relate to an internal k8s networking issue.

Sometimes this error happens because Portal or MACMA didn’t load correctly.

Use pod restart only as a last resort.

Usually MACMA is not the problem, so we start with Portal:

kubectl rollout restart deployment -n portal

If the Portal restart didn’t work, you can try restart MACMA by running:

kubectl rollout restart deployment -n iam

Deployed modules doesn’t appear in the menu / When checked in the integration status in the main tenant the modules appeared registered but they have no views

This error happens because either Portal didn’t load the views or the module didn’t start correctly.

Check if you can access the URL specified in the Views menu error.

We can restart Portal and then the module (only if the URL is not reachable) by running the following commands:

kubectl rollout restart deployment -n portal core-service-deployment
# restart other modules
kubectl rollout restart deployment -n <module_namespace> <module_ui_deployment>

Deployed modules doesn’t register in portal

This error can happen for two reasons:

  • The module is missing the Portal registration permissions. We can check the permissions from the primary tenant by accessing access management > modules > faulty_module > roles > portal. Make sure that either the Portal User or Portal Registration role is assigned.

  • The module doesn’t start correctly. In this case the module’s status and logs should be checked by following the steps:

    1. Check the module’s status by getting all the pods from the namespace

      kubectl get pods -n <module_namespace>
    2. If all the services are up and running, a restart of the module web UI may solve the issue

      kubectl rollout restart deployment -n <module_namespace> <module_web_ui>
    3. If a service is down or the restart did not solve the issue, you can look at the logs.

      kubectl logs -n <module_namespace> <faulty_pod>

      4 Types of errors are common: database errors, SSL errors, permission errors, or RabbitMQ errors.

      1. Database errors: Wrong usernames/passwords or wrong connection strings usually cause database errors. Try connecting with the same connection strings you have in the config file from another environment with a database client. If the connection works, the issue is strictly between the cluster and the database server.

      2. SSL errors: If an SSL error is presented then the certificate, the key or the CA is not valid. The certificate can be checked with the following command:

        openssl x509 -in server.crt -text

        The validity of the key against the certificate can be checked with:

        openssl rsa -modulus -noout -in server.key | openssl sha256
        openssl rsa -modulus -noout -in server.crt | openssl sha256

        Both modules should be identic. If they are not, then the key and certificate don’t match.

      3. Permission errors: If the logs indicate an unauthorized error, the module is missing some required roles. Please assign the essential roles that the module needs as per the application assignment roles matrix. The roles can be assigned from the main tenant by accesing access management > modules > faulty_module > roles > missing_role.

      4. RabbitMQ errors*: If one of the application module logs indicates a RabbitMQ error, check the RabbitMQ Admin interface for cluster and queues status. If you are using the embedded RabbitMQ server instance (deployed via Nexeed IAS umbrella helm chart) you can start a shell in one of the rabbitmq pods and check the status using rabbitmqctl command:

rabbitmqctl cluster_status

or:

rabbitmqctl status

or you can check the rabbitmq pod logs for errors.

Common errors

Portal doesn’t load after the login

This issue can be solved by running:

kubectl rollout restart deployment -n portal

Module specific errors

Please check the troubleshooting section of the corresponding Nexeed IAS application module operations manual.

Contents

© Robert Bosch Manufacturing Solutions GmbH 2023-2025, all rights reserved

Changelog Corporate information Legal notice Data protection notice Third party licenses