Deploying Kyverno: The DigitalOcean Kubernetes Challenge
I heard about the Kubernetes challenge DigitalOcean has been running since the middle of November just before the weekend and thought I’d give it a try. It’s somewhat open-ended, but there are a number of interesting suggestions for tasks you can perform. The one I completed is:
Deploy a solution for policy enforcement
Install and use Kyverno, a policy engine designed for Kubernetes. It can validate, mutate, and generate configurations using admission controls and background scans. Kyverno policies are Kubernetes resources and do not require learning a new language. Kyverno is designed to work well with tools you already use like kubectl, kustomize, and Git. Create policies for mandatory labels for every deployment, and image download only permitted from DOCR.
Kyverno is one of those things I’ve heard of and read about but never used. I looked forward to learning a bit about it. Since the challenge only mentions the Kyverno installation, I created a cluster by hand instead of with Terraform. The code is on GitHub at shivjm/digitalocean-kubernetes-challenge-2021.
Coming to grips with helmfile
I chose to use Helm to install
Kyverno so I could try
helmfile for the first time. Apparently, if this were a real
project and I wanted to install certain applications only after Kyverno, I would have to add the
Kyverno release to each of their
needs (or else create a dummy release that depends on all the
prerequisites, like Kyverno, and have everything else depend on the dummy release). I prefer
ArgoCD’s Sync Waves, but I
suppose this is intended for simpler scenarios. Alternatively, I could run helmfile apply -l
somelabel=somevalue before helmfile apply -l somelabel=someothervalue in my
GitHub workflows, which I’m relying on to deploy my applications instead of the helmfile
operator to keep things simple.
With the aid of the comprehensive documentation, I wrote a helmfile.yaml to specify the releases to install. For development, I created a local kind cluster. helmfile apply immediately failed because I needed the helm-diff plugin, but I couldn’t install it in the normal manner on Windows:
❯ helm plugin install https://github.com/databus23/helm-diff Error: exec: "sh": executable file not found in %PATH%
I tried installing from source with MSYS2. That meant installing Go with MSYS2. I really don’t like pacman to begin with, and if I’d stuck with it, I would’ve had to install Helm after Go. Instead, I saved myself a great deal of aggravation by moving to WSL. I already had everything else I needed under Ubuntu, so helm plugin install was enough to put the dependency in the right place. This time, helmfile apply showed me a new error:
Error: unable to build kubernetes objects from release manifest: unable to recognize "": no matches for kind "ClusterPolicy" in version "kyverno.io/v1" Error: plugin "diff" exited with error
I had to disable validation for
kyverno-policies, which needs to install custom resources that are
defined by the
kyverno release. That allowed helmfile apply to perform the installation
without issues. I confirmed that it hadn’t impacted the normal functioning of my cluster using the
❯ k get pod NAME READY STATUS RESTARTS AGE hello-world 0/1 Completed 2 25s ❯ k logs hello-world Hello from Docker! This message shows that your installation appears to be working correctly.
Verifying the installation
I first attempted to write a workflow to test the installation itself using helm/kind-action and mamezou-tech/setup-helmfile. kind worked beautifully with no extra configuration, but the helmfile setup failed:
Run email@example.com with: install-kubectl: false kubectl-version: 1.21.2 kubectl-release-date: 2021-07-05 helm-version: v3.7.1 helmfile-version: v0.142.0 install-helm: yes install-helm-plugins: yes helm-diff-plugin-version: master helm-s3-plugin-version: master Downloading from : https://get.helm.sh/helm-v3.7.1-linux-amd64.tar.gz Downloading from : https://github.com/roboll/helmfile/releases/download/v0.142.0/helmfile_linux_amd64 /usr/local/bin/helm plugin install https://github.com/databus23/helm-diff --version master Finish downloading. : /home/runner/work/_temp/e5d185b9-6d78-4bf7-99bb-8731b7922975 /usr/bin/tar xz --warning=no-unknown-keyword --overwrite -C /home/runner/work/_temp/63546d44-2a13-44da-9801-63a14f0e7575 -f /home/runner/work/_temp/e5d185b9-6d78-4bf7-99bb-8731b7922975 /home/runner/work/_temp/63546d44-2a13-44da-9801-63a14f0e7575 /usr/bin/chmod +x /home/runner/bin/helm Finish downloading. : /home/runner/work/_temp/aca55247-1056-460c-b234-0948fd3dced4 /usr/bin/chmod +x /home/runner/bin/helmfile awk: cmd. line:1: warning: regexp escape sequence `\"' is not a known regexp operator Downloading curl: (3) URL using bad/illegal format or missing URL Failed to install helm-diff For support, go to https://github.com/databus23/helm-diff. Error: plugin install hook for "diff" exited with error (node:8772) UnhandledPromiseRejectionWarning: Error: The process '/usr/local/bin/helm' failed with exit code 1
The same command worked locally, so I don’t know what the cause might be. At any rate, I solved the problem by replacing it with helm-helmfile-action. The test succeeded this time. I moved on to…
Writing the policies
The two mentioned in the challenge are:
- Requiring a specific label for each
digitalocean.com/challenge: "2021"). The documentation already had an example of requiring a label, so I changed it to expect this label instead and added exclusion rules for the
kube-systemnamespaces. I set
background: falsebecause I only want it to apply to new resources.
- Only permitting images from the DigitalOcean Container Registry (DOCR). Again, it was trivial to adapt the example in the documentation and exclude the two aforementioned namespaces.
I created a private DOCR repository and ran doctl registry login locally to authenticate myself. (I also edited the repository settings later to allow the cluster to access it.) Then I pulled the official nginx image locally, re-tagged it, and pushed it to my private repository so I could use it for testing.
Testing the policies
I added tests for the policies themselves with a few YAML files and
run scripts in the
workflow, just running kubectl apply and expect it to fail or succeed as appropriate. Once they
were working, I immediately rewrote them using just as a command
runner, which I’ve also been eyeing for a while.
After I got the tests working in my new justfile—which has the added bonus of being usable locally—I couldn’t understand how to redirect the output from kubectl apply, even in a local Bash shell:
$ (kubectl apply -f tests/invalid/deployment-without-labels.yaml 2>&1 > /dev/null) || echo YES Error from server: error when creating "tests/invalid/deployment-without-labels.yaml": admission webhook "validate.kyverno.svc-fail" denied the request: resource Deployment/default/nginx-deployment was blocked due to the following policies require-labels: check-for-labels: 'validation error: The label `digitalocean.com/challenge` must be equal to `"2021"`. Rule check-for-labels failed at path /metadata/labels/' YES
As it happens, the insistence on producing output in the main shell was because I used
set -x as
suggested by just. (Now who could
possibly have imagined that understanding the code I was copying was necessary?) I removed the flag
and assigned the output of the command to a variable to silence it, then spent a bit more time
playing around with various just features.
When I ran the tests in CI, they failed because the policies were ignored. After a few
minutes of befuddlement, I guessed that this was because Kyverno wasn’t yet fully functional when
they ran. Waiting for its webhooks wasn’t enough. Nor was waiting for the
Pod to be in the
Running state. What finally allowed the tests to succeed was waiting for a specific line in the
I believe this means it isn’t enough to install Kyverno and set up your policies before installing
resources that should be validated or mutated: you need to wait for Kyverno to be ready to apply the
policies before continuing.
Deploying to DigitalOcean
I intended to specify the kubeconfig as a secret—GitLab lets you specify that a secret should be available as a file, for instance—but I ended up using doctl. I guess I could have run echo myself to achieve the same effect, but that would create another opportunity to leak secrets.
These are the steps I followed to deploy Kyverno and the policies:
- Create a DigitalOcean Kubernetes cluster.
- Create a read-only DigitalOcean API token.
- Create a GitHub Environment. (Not a requirement, but it helps keep things organized.)
- Add the DigitalOcean API token and the cluster ID as environment secrets.
- Create a workflow to:
- Set up doctl.
- Save the kubeconfig for the cluster.
- Check out the repository.
- Run helmfile.
And then it was done. If I run my tests against the production cluster, I can see that everything works as it should:
❯ KUBECONFIG=kubeconfig just test Waiting for Kyverno to be ready... kubectl apply -f policies clusterpolicy.kyverno.io/require-labels created clusterpolicy.kyverno.io/restrict-registries created Testing valid files... ✔ tests/valid/deployment.yaml: accepted Testing invalid files... ✔ tests/invalid/deployment-from-public-registry.yaml: rejected ✔ tests/invalid/deployment-without-labels-from-public-registry.yaml: rejected ✔ tests/invalid/deployment-without-labels.yaml: rejected
And I can see the one valid
Deployment being created before the tests deleted it:
❯ k get --watch-only deploy -A NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE default nginx-deployment 0/2 0 0 0s default nginx-deployment 0/2 0 0 0s default nginx-deployment 0/2 0 0 0s default nginx-deployment 0/2 2 0 0s default nginx-deployment 0/2 2 0 1s
I can also see the correct results if I create the
❯ k apply -f .\tests\invalid Error from server: error when creating "tests\\invalid\\deployment-from-public-registry.yaml": admission webhook "validate.kyverno.svc-fail" denied the request: resource Deployment/default/nginx-deployment was blocked due to the following policies restrict-registries: autogen-validate-registries: 'validation error: Images must come from DOCR. Rule autogen-validate-registries failed at path /spec/template/spec/containers/0/image/' Error from server: error when creating "tests\\invalid\\deployment-without-labels-from-public-registry.yaml": admission webhook "validate.kyverno.svc-fail" denied the request: resource Deployment/default/nginx-deployment was blocked due to the following policies require-labels: check-for-labels: 'validation error: The label `digitalocean.com/challenge` must be equal to `"2021"`. Rule check-for-labels failed at path /metadata/labels/' restrict-registries: autogen-validate-registries: 'validation error: Images must come from DOCR. Rule autogen-validate-registries failed at path /spec/template/spec/containers/0/image/' Error from server: error when creating "tests\\invalid\\deployment-without-labels.yaml": admission webhook "validate.kyverno.svc-fail" denied the request: resource Deployment/default/nginx-deployment was blocked due to the following policies require-labels: check-for-labels: 'validation error: The label `digitalocean.com/challenge` must be equal to `"2021"`. Rule check-for-labels failed at path /metadata/labels/' ❯ k apply -f tests\valid deployment.apps/nginx-deployment created
On the whole, this was an easy but fun challenge. I ended up spending a lot less time on Kyverno and a lot more on testing & tooling, since I like my workflows to be just (pun unintended) so, but learning new tools always needs a bit of time and is always worth it. I’m grateful to DigitalOcean for the opportunity to do it in such an unusual and enjoyable manner. I’d love to see more thoughtful challenges like this, which combine brevity with real utility.
As far as my impressions of the tools I tried go, Kyverno seems quite useful for large or complicated clusters, helmfile only seems useful for very small or simple clusters, and just is the star of the show. I really liked using it to make all the tedious parts of the workflows composable and reusable. I’m definitely considering adopting it elsewhere too.
Finally, while I’m mostly happy with how I implemented the tests, there are two caveats. One is that the tests for valid resources will not run if the tests for invalid resources fail. I could untangle the two sets by making my ad hoc scripts more complex, but I want to keep them simple. The other caveat is that although just itself is not a Linux-only tool and supports scripts written in different languages, I’ve restricted myself to Bash because using those other languages would require installing them every time in CI. I can’t run the tests without WSL in any case, given that I can’t use helmfile under Windows, so the requirement is not as onerous as it might seem.
- You can have one per account, which seems a bit limiting, since one DigitalOcean account can
map to multiple projects. I can’t tell whether hierarchical image names are allowed. If so,
using a hierarchy like
group/project/specific/image:latestshould make the limit irrelevant in practice. (It also occurred to me, as I was discussing it with someone, that people might, on the contrary, create an account for every new project, making this irrelevant in any case.) On the other hand, the limit of one repository per account under the free plan, with a maximum size of 500 MB, is much more difficult to deal with. Even the next tier, which costs $5 per month, only increases those to five repositories and 5 GB of storage, which I think is shared across those repositories. However, you can apparently get more storage at the usual rates, so I guess this also isn’t significant.↩
- In fact, there is an ongoing discussion on the subject.↩