Avoiding Footguns When Deploying Platform Helmcharts

doomedCode · January 10, 2025, 5:22am

Foot-guns -

- are the common pitfalls that can lead to unintended consequences or issues when deploying Helmcharts.
Fuckin Great Kill Youself

Here are some common foot-guns to avoid:

Beware of setting `enabled: false` in the values yaml file

You might think that, by setting enable to false the service will be excluded from the upgrade, but that is NOT the case.

If you set enabled: false in the values file, the service will be DELETED from the cluster.

Always upgrade the LOWER environment first before upgrading upper environments

The order of upgrading the environments should be from lower to upper. This ensures that the upper environments will be stable and the applications will work as expected.

For example, if you have dev, SIT, UAT, and PROD environments, you should ALWAYS:

Upgrade the services dev environment first
Test the applications in the dev environment
Upgrade the services SIT environment
Test the applications in the SIT environment
Upgrade the services UAT environment
Test the applications in the UAT environment

Get the helm changes reviewed by a senior engineer - Follow the Checklist

Since the helm charts are managed in a git repository, it is ALWAYS a good practice to get the changes reviewed by a senior engineer.

Here is a checklist of typical workflow to follow. You can copy this checklist and actually follow when helm upgrading.

START
Clone the repository
Check out the source branch for your environment. For example, art-master branch for the ART environment

git checkout art-master

If already checked out to the correct branch, make sure to clean the working directory by either stashing or discarding the changes
Pull the latest changes

 git pull

Checkout to a new branch for the changes you are going to make

git checkout -b <branch-name>

Make the necessary changes
Add all the changes to the staging area

git add --all

Commit the changes

git commit -m "Your commit message"

Push the changes to the remote repository (origin)

# use `git push --set-upstream origin <branch-name>` for the first time
git push origin <branch-name>

Create a pull request to the source branch. For example, art-master branch for the ART environment
Get the pull request reviewed and merged by a senior engineer
Once the pull request is merged, switch back to the source branch

git checkout art-master

Pull the latest changes

git pull

Upgrade the helm chart in the environment
Test the applications in the environment
END

Cancelling the upgrade process can lead to a broken state (`CTRL+C` in the middle of the upgrade process)

If you cancel the upgrade process in the middle, it can lead to a broken state. A broken state might look like:
- The deployment image is not updated, but the secrets are updated
- Backend services are not updated, but the frontend services are updated
To recover from a broken state, it will require manual intervention and keeping the manual intervention to a minimum is always a good practice.
Make sure you use the appropriate flags while upgrading the helm chart (like --atomic, --debug, and --timeout), which leads my next point

Use `--atomic`, `--debug` and `--timeout` flags when applying the helm chart (upgrade or install)

`--atomic`

Ensures that if the upgrade fails, Helm will automatically roll back to the previous release to maintain cluster stability.
Implicitly sets the --wait flag, causing Helm to wait for all resources to reach a ready state before considering the upgrade successful.
If resources do not become ready within the specified timeout, the upgrade is deemed a failure, triggering a rollback.
Reference: Helm Documentation > --atomic

`--debug`

Provides detailed output during the upgrade process, invaluable for troubleshooting.
Displays the rendered templates, executed commands, and other internal operations.
Offers insights into the upgrade’s progression and any issues that arise.
Reference: Helm Documentation > --debug

`--timeout`

Specifies the maximum duration Helm will wait for Kubernetes operations (e.g., Jobs or Pods) to complete during the upgrade.
The default is 5 minutes (5m0s).
If operations exceed this duration without reaching a ready state and --atomic is set, Helm will initiate a rollback.
Adjusting the timeout is essential for deployments requiring more time to become ready.
Reference: Helm Documentation > --timeout

Example Usage

helm upgrade my-release my-chart --atomic --debug --timeout 3m

User `--namespace` flag when applying helm chart (upgrade or install)

Not using the --namespace will cause the helm release metadata to be stored in the context in which the helm command was run (In most cases default namespace)
And, the kubernetes artefacts (deployment, service etc) to be installed in the namespace specified in the template
Reference: Helm Documentation > --namespace
Reference: Github Issue

Example Usage

helm upgrade my-release my-chart --atomic --debug --timeout 3m --namespace

What are some of these foot-guns you have faced and want to stay away from?

Resources:

Helmchart Git Repository - alpha-helm-charts

Helm Docs - Link

Lakshman_Kumar · January 10, 2025, 5:53am

Nice article. Can we introduce another flag remove: which actually removes the chart and install flag for install or dont install

Avoiding Footguns When Deploying Platform Helmcharts

Foot-guns -

Here are some common foot-guns to avoid:

Beware of setting enabled: false in the values yaml file

Always upgrade the LOWER environment first before upgrading upper environments

Get the helm changes reviewed by a senior engineer - Follow the Checklist

Cancelling the upgrade process can lead to a broken state (CTRL+C in the middle of the upgrade process)

Use --atomic, --debug and --timeout flags when applying the helm chart (upgrade or install)

--atomic

--debug

--timeout

User --namespace flag when applying helm chart (upgrade or install)