User credentials rotation - database and messaging secrets

It is a industry-wide best practice to regularly rotate user credentials.

To support operators with this secret rotation task, the system offers an automated solution using Ansible, which requires a tool called Reloader in order to minimize the impact. Reloader is not provided as part of SOT and can be manually installed.

This method only works for secrets managed by the database or messaging specific ansible operators and not for databases or messaging instances managed externally.

As a result, the jobs will NOT impact secrets for externally managed instances.

Other methods can also be used to restore the service, including the manual trigger of a rollout restart of deployments/statefulsets.

Reloader

Prerequisite: Reloader is installed on the system.

Reloader is a solution offered by Stakater that can watch changes in ConfigMap and Secret and do rolling upgrades on Pods with their associated DeploymentConfigs, Deployments, Daemonsets Statefulsets and Rollouts.

Reloader installation

#To install Reloader on the machine
helm install stakater/reloader reloader

#To create the values file for Reloader
touch values.yaml

#Add the the contents in the values.yaml file
cat > values.yaml <<EOF
reloader.reloadOnCreate=true
reloader.syncAfterRestart=true
reloader.enableHA=true
reloader.deployment.replicas=2
EOF

Ansible operator Helmchart

The Ansible operator contains two jobs for password rotation and secrets restoration (prefixed with password-rotation-job and secret-restoration-job, respectively).

In order to run these jobs we need to do a manual step:

kubectl create job --from=cronjob/<our-cronjob-name> <a-specific-name>

This runs our job and does the password rotation or secret recreation.

Secret restoration

The secret restoration job is used in order to be able to quickly revert changes in case the password rotation fails (e.g due to missing admin credentials). In that case, all that is needed to be done is to run the above command to trigger the secrets restore job.

MACMA and Keycloak password and secrets rotation

For Macma & keycloak password and secrets rotation, the solution is to update the values for the corresponding password or secrets in the helm custom values.

Helm creates an additional job called change-macma-secrets-job-* which allows the update of the following credentials:

MACMA admin user and/or password
Keycloak admin-cli client secret - this is used by MACMA to connect to keycloak
Keycloak user and/or password
MACMA client (macma) secret

Limitations of the job:

The automation relies on helm lookup function which only works during helm install & helm upgrade. If one uses helm templating output to integrate with Git-ops tools like ArgoCD the job succeeds but it doesn’t change anything.
The change of the secrets restarts the MACMA pods introducing an unavailability window. The time of the unavailability is higher when the admin-cli secret is changed, because the pods with the new secret state will not be able to talk with Keycloak till the job is changing the secret.

The reason for this limitation is the breaking change introduced by the change in the admin-cli secret - MACMA pods have to be recreated to pick the new secret containing the updated value of admin-cli secret.

Since it is difficult to know exactly when the new MACMA pods are up, the job might fail to change all secrets so it will need one or two retries to converge. Having several retries is not an issue.

Troubleshooting failed runs

If the password and secret rotation job fails for any reason, one has to investigate the job’s pod logs to see which step is failing.

For now, the only identified failure might be triggered by an attempt to change the main tenant admin password with one from the history.

You can recover by changing the password with a string which wasn’t used in the last five set passwords and re-run the helm upgrade.