Resiliency against failures in RabbitMQ
Problem: The module used to run into errors when RabbitMQ was temporarily unavailable.
Solution: The module is now resilient against RabbitMQ failures and can handle temporary downtimes (due to updates or failures) of RabbitMQ by recovering to healthy after RabbitMQ is available again.
Equipment Management
MDM Equipment will reconnect for listening to incoming messages as soon as RabbitMq becomes available, with a delay in the range of 10 - 30 seconds.
-
The lost connection to RabbitMQ will be logged when tring to send an event.
-
We keep the Service alive and ready to accept requests: all read requests are successfull, all write requests will fail with 500 return code
-
The message sending is retried 3 times. If RabbitMQ is not available the messages are lost / not sent. The data changes are already committed and remain saved in the database.
-
-
The re-established connection to RabbitMQ will be logged for the first event that is sent after RabbitMQ comes back online.
Process
Any connection-errors or messaging failures while writing Data (create/update/delete entity) will result in an Exception which in turn rolls back the transaction. No data is modified in this case. Additionally, no automatic retries or anything like that will happen for that specific failed request → the user must send another request to actively retry. Further request will eventually work out correctly as soon as messaging is working fine again (e.g., RabbitMq is back online)
Material
Any connection-errors or messaging failures while writing Data (create/update/delete entity) will result in an Exception which in turn rolls back the transaction. No data is modified in this case. Additionally, no automatic retries or anything like that will happen for that specific failed request → the user must send another request to actively retry. Further request will eventually work out correctly as soon as messaging is working fine again (e.g., RabbitMq is back online)