One of the main goals of rebuilding my Kubernetes cluster was configuring my ad hoc observability stack in a declarative fashion. The lynchpin of the stack is Prometheus. I intended to install it via the kube-prometheus Jsonnet library, which includes Grafana. The recommended approach is using jsonnet-bundler, but, given how complicated it would get with Argo CD, I wanted to provide simple, instantly-usable Jsonnet.

I could do this either by using jsonnet-bundler locally (under WSL) and vendoring the dependencies (i.e. committing them to my own repository) or by using Git submodules. I preferred the latter approach, which wouldn’t require committing my files. I created a new repository containing just kube-prometheus at release-0.8 as a submodule. Unfortunately, this didn’t work: there were issues with absolute paths, relative paths, and transitive dependencies. I had no choice but to vendor the libraries, drastically and permanently inflating this new repository.

At any rate, once that was done, I encountered a chicken and egg situation: I had many ServiceMonitors to enable, but I could only do so once the Prometheus Operator, which defines the ServiceMonitor resource, was installed. I decided Prometheus would have to come immediately after Linkerd, which has instructions for scraping metrics without ServiceMonitor. (This is possible for any application, but ServiceMonitors provide a simpler, more widely-used interface.) I’d write my own ServiceMonitor for cert-manager, which came before Linkerd; everything afterwards would have monitoring enabled.

I tried it out. The entire kube-prometheus Application refused to sync, with a Status of Unknown. I increased the log level in the Argo CD application controller and kept seeing request object is too large; above that, I could see an error about kind: being missing in the YAML that was generated from the Jsonnet. I realized I was returning an object instead of an array, so I copied the basic example, which was able to sync. I added the annotation afterwards.

I put the suggested Linkerd scrape configurations in a Secret, which I pointed kube-prometheus’s additionalScrapeConfigs at. It didn’t seem to be processed: the Prometheus UI showed no Linkerd jobs in its configuration. I tried the main and release-0.9 branches of kube-prometheus, neither of which I was able to build:

Outputcouldn't open import "": no match locally or in the Jsonnet library path

Regardless, the real reason, as I discovered, was that I had prometheus+: under values+:: instead of as a sibling. Fixing that made the jobs appear.

The next issue was one where the jsonnet CLI tool could generate manifests but Argo CD said it couldn’t unmarshal an array into an object. I debugged this by generating the manifests locally, running them through gojsontoyaml, turning the resultant YAML into an array instead of a stream, and running that through kubectl apply --dry-run=client. All this showed that the ultimate cause was the namespaces: key I was trying to specify. I thought it might be a bug in the library, since the YAML looked malformed to me, so I temporarily removed it.

At last, I could use the dashboards I had copied from the Linkerd repository! I added x509-certificate-exporter (both the application and the dashboard) to monitor the TLS certificates I had created and potentially remind me when I needed to rotate them.

I later moved all observability-related resources into one Application, which I defined using k8s-libsonnet. I initially had to keep both the original YAML files and the JSON equivalents (e.g. for Helm values), until std.parseYaml was released in version 0.18.0 of go-jsonnet.

I noticed at some point that Grafana was ignoring the environment variables I had set for the admin username and password. Inspecting the Pod showed me it was indeed missing that configuration. I had to merge it the hard way:

  spec+: {
    template+: {
      spec+: {
        containers: [
          super.containers[0] {
            env+: grafanaCredentials,
}] +
[kp.grafana[name] for name in std.filter(function(name) name != 'deployment', std.objectFields(kp.grafana))]

Next in series: (#12 in The Death and Rebirth of a Cluster)