Getting Loki installed in my rebuilt Kubernetes cluster was easy with Argo CD and Jsonnet. Configuring Grafana was a bit harder. First Grafana couldn’t connect to the service; I could only get it to work by manually adding the data source. I tried to do the same thing programmatically with kube-grafana but neither could I add it to the existing array (containing the Prometheus data source) nor was I able to create an array containing both sources that didn’t break every page in Grafana with Datasource named X was not found JavaScript errors, until I made sure the data source was named exactly Loki. Even then, I didn’t see any metrics (because the dashboard datasource was Prometheus instead of prometheus) or logs at all.

I checked the Promtail DaemonSet’s logs via kubectl logs. It had the wrong address for Loki. I enabled the Loki headless service—with a single instance, there was no need for full service discovery—in the Helm config and refreshed the Application. I added the port and refreshed the Application. I added the HTTP path and refreshed the Application. I… it worked!

As I explored what logs I could see, I noticed the Argo CD ApplicationSet controller was producing Go-style logs but sporadically emitting DEBUG lines that couldn’t be parsed the same way:

2021-09-17T20:05:38.342Z    DEBUG   Normal  {"object": {"kind":"ApplicationSet","namespace":"argocd","name":"redacted","uid":"redacted","apiVersion":"","resourceVersion":"1009"}, "reason": "created", "message": "created Application \"redacted\""}

I tracked these down to the controller-runtime library and submitted a fix on the Argo side.

One thing I realized as I came to grips with Loki is that, since Linkerd works by adding sidecar containers to most Pods, LogQL queries usually need an extra container != "linkerd-proxy". These containers default to unstructured logs, too, so I enabled JSON logging to allow parsing them, although it’s not something I expect to look at much.

Next in series: (#13 in The Death and Rebirth of a Cluster)