Why Would My Load Generator Lie to Me?
I watched Gil Tene’s talk ‘Your Load Generator Is Probably Lying To You’. A lot of it was beyond me, but all of it sounded bad. This is the little I was able to comprehend:
- Median, mean, and other aggregate measures are misleading or outright useless.
- The more you aggregate your metrics before storing them, the less useful they are.
- Percentiles (or ‘percent-lies’, as he calls them) are irrelevant—always look at the maxima.
- ‘Under what load does this software crumble’ is not as useful a question as ‘given a load far below the point at which it crumbles’—not right under it, because then you’re only a few requests away from crumbling—‘what are its performance characteristics (latency, throughput, etc.)?’.
- Low-resolution histograms are worthless. I can’t tell whether Micrometer’s
Timer is doing it right. It mentions HDR
histograms in the section on memory
usage, but those are under
client-side aggregation, which I’m not using. Meanwhile, it translates each timer into a set of
count
,sum
, andmax
metrics, so is it safe to simply chartmax
? - I think the discussion on percentiles was saying there’s no point looking at low percentiles because almost every user session eventually encounters higher-than-99th-percentile latencies, and rarely encounters lower percentiles, but, again, I’m not certain.
I’ll have to revisit this talk someday when I have a bit of experience.
Meanwhile, in the cluster
I’ve now reinstalled Prometheus and Grafana using
kube-prometheus-stack.
The Traefik dashboard’s ‘Service’ variable confused me. I assumed it was supposed to slice the
traefik traffic according to which Kubernetes service it was going to, but I had to fiddle with
the panels a lot before they showed any data for non-Traefik services, and much of what I saw didn’t
make sense even then.
What I was missing is that ‘Service’ is not a Kubernetes Service but rather a Traefik
Service, so there will only ever be traefik
(or
maybe something like one per Traefik Pod, I don’t know). The dashboard was misleading me because it
used label_value(service)
and therefore returned every value for a service
label in the system
instead of only the ones for Traefik metrics. Once I changed the definition to
label_values(traefik_service_requests_total, service)
, it stopped showing irrelevant values and
the dashboard became comprehensible.
I’ve also got Loki up and running. Its chart says:
YAML## If you set enabled as "True", you need : ## - create a pv which above 10Gi and has same namespace with loki ## - keep storageClassName same with below setting
I created a 10 GB volume. There was no readymade way to see usage, so I tracked down the
official dashboards (written in Jsonnet, which I haven’t used but seems
reasonably self-explanatory) and found the metric I was looking
for
in loki-writes-resources.libsonnet
:
kubelet_volume_stats_used_bytes{persistentvolumeclaim=~".*$Container.*"}
. I added
kubelet_volume_stats_available_bytes{persistentvolumeclaim=~".*$Container.*"}
and set them both to
stack so I have an easy indicator of how much headroom I have…
Disk usage grew by 25.3 MB in 48 hours, which would mean roughly 379.5 MB in a month. I assume this will suffice for the next two years.
Jaeger
I considered using the Jaeger Operator, but that chart doesn’t appear to be compatible with Helm 3, so I have to assume it’s outdated. I turned to the standalone Helm chart. That was a frustrating experience, to say the least. It required:
-
Somewhere between two and three hours of changing heap sizes and fiddling with TLS certificates to get ElasticSearch working until I figured out that the Bash script that reported each Pod’s readiness couldn’t handle spaces in the password.
-
Another hour of trying to make Jaeger connect to Kafka that ended only when I understood that the chart wasn’t configuring the Jaeger components to connect to the Kafka installation it created. It was better to install Kafka separately.
-
Another couple of hours lost connecting Jaeger to the ElasticSearch cluster, including regenerating the certificates with the right host names. I even tried importing the generated root certificate into my local store, tried to connect locally, saw it was rejected, gave up, and turned off TLS verification. Or thought I did.
-
Another hour spent searching for the right incantations to force all the different Jaeger components to disable TLS verification. This is what did the job:
YAML
# irrelevant details elided storage: type: elasticsearch elasticsearch: env: ES_TLS_ENABLED: "true" ES_TLS_SKIP_HOST_VERIFY: "true" ES_ARCHIVE_TLS_ENABLED: "true" ES_ARCHIVE_TLS_SKIP_HOST_VERIFY: "true"
It worked in the end, though. I started reflexively trying to enable tracing in Prometheus, ElasticSearch, et al until I remembered that… isn’t useful to me.
What did I gain?
It was all worth it to see those beautiful graphs. 99.9% of my cluster now consists of monitoring tools watching other monitoring tools, and even themselves, like so:
Who needs software to run when you’ve got an observability stack, eh?
Monitoring containers by Helm chart
I installed a dashboard to monitor individual containers and updated it to use the correct variables. Then I created a few panels for it to show the resource usage history:
It works, but all the underlying queries match the container
label against the value selected for
the Container
variable, which breaks down when the pods aren’t named accordingly (for example, if
container
is foo
while the pods are f-master-0
) and, as is visible in the above graphs, when
more than one matching container has existed in the selected period. Since I use Helm charts for
everything, I thought I could use metric relabeling to coalesce and add a few Helm variables. Then
I’d match against app.kubernetes.io/name
and app.kubernetes.io/component
, and maybe some label
that marked a new deployment.
This seemed like something I might need to hack kube-state-metrics directly to do. I found where the labels are applied, and the code has access to the pod metadata at that point. Fortunately, though, I didn’t need to patch anything; instead, I set up some relabeling (or, more precisely, some metric relabeling) for kube-state-metrics:
YAML# values.yaml for kube-stack-prometheus
kube-state-metrics:
extraArgs:
- "--metric-labels-allowlist=pods=[app,name,app.kubernetes.io/name,helm.sh/chart,app.kubernetes.io/component]"
kubeStateMetrics:
serviceMonitor:
metricRelabelings:
- sourceLabels: [label_app, label_app_kubernetes_io_name]
separator: ":"
targetLabel: "app"
- sourceLabels: [app]
regex: "(.*):(.*)"
replacement: "$1$2"
targetLabel: app
- action: "labeldrop"
regex: "^label_app_kubernetes_io_name|label_app$"
The extra arguments for kube-state-metrics tell it to transfer those specific labels from pods onto
their metrics. Then the metricRelabelings
section normalizes the app
and
app.kubernetes.io/name
labels in a roundabout fashion.[1] Then the dashboard only
needs to match against a single label. This worked perfectly:
The trouble now is, I would need to transfer the labels and then relabel the metrics for every component that collects them, which isn’t always possible. For example, the metrics for CPU and memory usage are collected by cAdvisor, which is running in the cluster as part of the kubelet. Because I’m using DigitalOcean’s Managed Kubernetes, I don’t have any control over cAdvisor and therefore can’t pass it the command-line flag to transfer labels, rendering this entire endeavour somewhat useless.
I restored the original dashboard in defeat. (On the bright side, I’ve noted for the future that all this extra cardinality I was worrying about amounted to fewer than 200 new samples appended per second in Prometheus.)
Assorted titbits
- I thought my last entry on observability would provide the first opportunity to watch the metrics from my GitLab CI pipelines, but there was a discrepancy between the information GitLab provides and the information the exporter expects. I filed issue #280. There’ve been some changes and discussions. I haven’t had a chance to test anything after the first of those.
- Installing the ElasticSearch exporter was a bit of a chore because it doesn’t support
authentication details in environment
variables, but I was
able to adapt a snippet from a
comment
to pass the variables on the command line without exposing them:
uri: "https://$(ES_USERNAME):$(ES_PASSWORD)@elasticsearch:9200"
. - Installing the Kafka exporter was trivial.
- I no longer need to expose webmentiond’s metrics to the world, as the most obliging Horst Gutmann (the very same who recently inspired me to sign my commits at last) kindly added an option to expose them on a different port.
- Since a given pod will only have at most
one of those labels, it puts them both in a new
app
label with a:
in between, removes the colon (leaving us with at most one full name), and drops the two original labels.↩