The Road to Webmentions
If things have been quiet for a couple of days, it’s because I’ve been preoccupied with Webmentions. I’ve talked about implementing them before. They provide a decentralized protocol for discussion across websites through the use of regular links. There are three parts to this:
- Sending a Webmention when a page on your site links to another page.
- Receiving Webmentions from other pages.
- Displaying the received Webmentions.
I’ve been working on the second part, and it’s been quite the journey. If you’d prefer not to read about the technical details of servers and cloud infrastructure, I recommend skipping this instalment. I’ll write about the less technical improvements separately.
The setting: Kubernetes
A Place For My Head is hosted on Netlify, the ideal home for static pages. As it grows, some features naturally require more than serving static content. Netlify Functions are presented as a solution, but I don’t want to fit all my work into 10 seconds of JavaScript. Instead, I set up a Kubernetes cluster on DigitalOcean.[1] I much prefer this to working with VMs, even though Helm is notoriously verbose.
A home for Webmentions
In parallel with preparing the cluster, I searched for a solid server I could set up to receive the Webmentions. The most well-known is webmention.io. This is a fully-featured Webmention service. Granted, it’s open source, but it’s built in Ruby (which I can read reasonably well but find frustrating to tinker in), it’s designed as a multi-user service with authentication through IndieAuth,[2] and there’s no obvious path to running it in Docker, let alone Kubernetes. While that’s not to say it would be impossible to repurpose, it doesn’t seem like a good fit. The pingback-to-Webmention gateway is a unique feature, though.
Moving on, webpage-webmentions looked like a good candidate. It’s another service rather than a single-user tool, but self-hosting is a priority and you can limit the number of users. It’s geared towards running on Heroku, so it would require adaptation. It also seems to be sporadically maintained, with what seems like a major bug left unattended. On the whole, I don’t know what to make of it.
The last package I looked at was webmentiond. It looked perfect in many ways. It’s a (more or less) single-user Webmentions server written in Go, with a moderation queue and UI. The only snag was that it requires an SMTP server for authentication and notifications.
After a lot of consideration and lengthy review of a mixture of tools for sending Webmentions, servers that receive Webmentions, and services that handle Webmentions for you, I felt I had the best chance of success with webmentiond, which would mean dealing with SMTP servers.
How to send an email
Simple. Open a connection to the target server on port 25 and send a bit of specially formatted text.
Well, no.
If you set up a Postfix server and start emailing all and sundry, none of your messages will ever arrive at its destination. Email services have been building barriers against spam for as long as they’ve existed. At this point, SPF—a specially-formatted DNS record indicating which servers are allowed to send email claiming to be from the domain—is only the beginning. In the course of my research, I came across the excellent docker-postfix, a send-only SMTP server designed for containers. Its concise, thorough, and well-written documentation has this to say on the subject (emphasis added):
If you're sending messages directly, you'll need to:
- have a fixed IP address;
- configure a reverse PTR record;
- configure SPF and/or DKIM as explained in this document;
- it's also highly advisable to have your own IP block.
Daunting, to say the least. Still, I merely wanted to send authentication emails from my Kubernetes cluster to my primary domain, so I felt I ought be able to manage most of these steps. Setting up the SPF record only took a few minutes, which bolstered my confidence. Proceeding to the PTR record, I followed link upon link until I reached the Wikipedia page for Reverse DNS Lookup. My eyes began to glaze over and I found myself wondering what past sins I was atoning for.
In defeat, I turned to the rather less complex alternative: relaying my messages through Google Apps, which was already active for the domain. Let Google fret over SPF, DKIM, and message delivery.
My Kubernetes cluster only had a single node at first, given my minimal requirements, so I configured my domain to allow email from its IP. I could now set up Postfix using the handy Helm chart provided by docker-postfix. An hour of deploying, redeploying & testing later, the example emails were reaching my inbox. I did have to find a working version before it would recognize the container as being ready, and the documentation, although superb in general, was unclear on which of hostname and myhostname was required by Postfix. (It was the latter.) I also had to examine the code to understand how the Helm parameters were translated in the newer version. Nevertheless, each of these little challenges fell before the combined might of Google and reading. This was the values file I used:
YAMLimage:
tag: "v2.2.2"
persistence:
enabled: false
config:
general:
ALLOWED_SENDER_DOMAINS: redacted.com
RELAYHOST: smtp-relay.gmail.com:587
postfix:
myhostname: postfix-kubernetes
webmentiond & me
The webmentiond repository already maintained an image on Docker Hub. I needed a Helm chart, but I couldn’t find one, so I created my own by means of hours of cross-referencing the getting started page, the configuration page, the Bitnami Helm chart for Ghost, the docker-postfix Helm chart, the webmentiond Dockerfile, and, in the end, the webmentiond source code itself.[3] The first two pages didn’t quite agree on the options available, which is why I had to refer to the last two.
Many key webmentiond settings have to be provided as command-line arguments rather than as
environment variables. I had to specify the
args
like so:
Jinja or Nunjucksargs:
- "--addr"
- "{{ .Values.config.address }}:{{ .Values.config.port }}"
{{- with .Values.config.publicUrl}}
- "--public-url"
- "{{ . }}"
{{- end }}
{{- with .Values.sendNotifications }}
- "--send-notifications"
{{- end }}
{{- with .Values.config.auth }}
- "--auth-jwt-secret"
- "$AUTH_JWT_SECRET"
#- "--auth-jwt-ttl"
#- "{{ .jwtTtl }}"
- "--auth-admin-emails"
- "{{ .adminEmails }}"
- "--allowed-target-domains"
- "{{ .allowedTargetDomains }}"
{{- end }}
It isn’t possible to set a command-line argument from a Kubernetes Secret, so I had to instead create an environment variable for the image’s shell to expand.
Note the commented-out lines. Two hours of debugging culminated in the realization that the example value of 7d given by the documentation for the TTL option makes the program die with no output. My brilliant solution was to leave it at the default.
This was the configuration I used for the new chart (with the ingress disabled for the moment):
YAMLconfig:
mail:
secretRef: "" # no username or password
host: "postfix-mail.postfix.svc.cluster.local"
port: "587"
from: "user@redacted.com"
disableTls: true
auth:
secretRef: "redacted"
adminEmails: "admin@redacted.com"
allowedTargetDomains: "shivjm.blog"
publicUrl: "https://webmentions.shivjm.blog"
persistence:
storageClass: "do-block-storage"
ingress:
enabled: false
Too many volume(s)
Before I ever got to any of this, however, I had to solve the volume problem. Whenever I deployed
the chart, it never succeeded at creating a
PersistentVolume
.
Investigating the resources during construction revealed that DigitalOcean was refusing to create
more volumes now that I had reached my limit of… one. I guessed it was because my account was new,
and I had already increased my Droplet limit from one to three by means of a link and a simple
questionnaire, but I could find no way to increase this limit. The error message said to contact
support; the support website only repeatedly asked me to log in. I was left to open a thread in the
forums
and pray for answers.
Fortunately, I realized I could solve the immediate problem by giving Postfix an ephemeral
volume instead. What did I care if
it restarted and lost its queue of zero messages? Therefore, I set persistence.enabled
to false
in the Helm chart. I then had the dubious felicity of spending a further two hours discovering why
the Pod
couldn’t be scheduled at all. I found the root cause to be an issue with the logic around
mounting volumes. I opened a merge request to fix
it and successfully used my corrected version to
deploy the chart.
Let the world in
I could now access an apparently fully-functional webmentiond instance through port forwarding, but, from outside the cluster, it could only be accessed by its IP address. It was time to christen it.
I’ve found Traefik’s ingress controller to be pleasantly flexible without an undue amount of complexity when I’ve used it in the past. I didn’t anticipate any difficulties adapting it to my scenario. I installed the controller via Helm, making sure to use the new chart with the following values file:
YAMLimage:
tag: "v2.4.8"
ports:
websecure:
tls:
enabled: true
globalArguments:
- "--global.checknewversion"
additionalArguments:
- "--providers.kubernetesIngress.ingressClass=traefik-cert-manager"
- "--ping"
- "--metrics.prometheus"
I added a DNS record pointing to the automatically-created
LoadBalancer
.
Once the change had been propagated, I could see a plain text 404 page and a self-signed SSL
certificate at https://webmentions.shivjm.blog. I tackled the certificate first.
On Kubernetes, cert-manager can handle requesting, retrieving, managing,
and, crucially, renewing the free certificates from Let’s Encrypt. After
installing it via Helm, I created a
ClusterIssuer
, initially using the staging
environment, and a
Certificate
using that issuer for the
webmentions.shivjm.blog domain name:
YAMLapiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
email: user@redacted.com
server: https://acme-staging-v02.api.letsencrypt.org/directory
privateKeySecretRef:
# Secret resource that will be used to store the account's private key.
name: redacted-staging-private-key
# Add a single challenge solver, HTTP01 using nginx
solvers:
- http01:
ingress:
class: traefik-cert-manager
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: redacted-cert-staging
spec:
secretName: redacted-cert-staging
dnsNames:
- webmentions.shivjm.blog
issuerRef:
name: letsencrypt-staging
kind: ClusterIssuer
group: cert-manager.io
The new Helm chart I had written included an
IngressRoute
where / (the root)
was handled by the Service
I
had created and the https
version used the created certificate. The domain name would be inferred
from the request. There was also a simplistic
Middleware
definition to redirect http
to https. I enabled the ingress in my webmentiond values file from earlier:
YAMLingress:
enabled: true
tls:
secretRef: "redacted-cert-staging"
All of this was quite straightforward and standard. Of course, I did have to waste an hour and a half staring at Traefik logs, re-deploying my chart, and deleting & re-creating my issuers because I kept using different names in different places and forgetting the syntax that various parts of the system expected.
Once the routing was ready, I repeated the process of creating a ClusterIssuer and Certificate, this time using the normal Let’s Encrypt environment, and updated my IngressRoute so there would be no more privacy error warnings.
Stymied by rejection
I authenticated with my new webmentiond server. It worked. I updated the site to include the appropriate metadata, deployed it, and ran a Go CLI tool for sending Webmentions on a random page. It was rejected with a 405 Method Not Allowed. I realized my mistake, checked the server source to see where the endpoint actually was, updated the site, and deployed it again. I tried the request again. This time, it failed with a 400 Bad Request.
I started to experience a sinking feeling. Thinking the program I was using might have bugs in it, I used xh to manually send a Webmention. It was rejected again. Thinking xh might not be correctly formatting my form data, I used the venerable cURL tool to send the same request. It was rejected again. Not thinking any more, I tried Postman. The outcome was the same. I checked the webmentiond logs. This is all there was:
Output8:26PM INF UI path served from /webmentiond/frontend 8:26PM INF Listening on 0.0.0.0:8080...
And this was the HTTP response cURL showed me:
OutputHTTP/1.1 400 Bad Request cache-control: no-cache, no-store, no-transform, must-revalidate, private, max-age=0 content-length: 6 content-type: text/plain; charset=utf-8 date: Sat, 01 May 2021 22:31:43 GMT expires: Thu, 01 Jan 1970 00:00:00 UTC pragma: no-cache vary: Origin x-accel-expires: 0 x-content-type-options: nosniff Error
Upon inspection, I found this lack of explanation was because the server squelches most errors and sets the log level too high to see the details with no way to change it. I thought I might tinker with the source locally to expose its workings. Sadly, the sqlite3 dependency wouldn’t build on Windows without gcc, and installing gcc would be complicated. I resorted to creating an Alpine Linux container that I could install the tools in and doing my testing there, using a nifty quickserve Git alias to copy my repository and good old docker cp to update files every time I changed it. Thusly armed, I increased the verbosity and added my own logging, which revealed… that the list of ‘allowed target domains’ I had passed webmentiond was incorrect.
Once I had corrected this elementary oversight, I was able to submit Webmentions without errors. I found that relative links in pages were not correctly verified by the server, so I opened a merge request to fix that.
There you have it, then. There were hurdles aplenty, but I now have a working Webmentions server. At last, I can start recording all those responses people have no doubt been waiting to write for weeks! As for displaying those responses, or sending my own… well… one thing at a time.
- Incidentally, installing kube-state-metrics for advanced metrics caused conflicts this time, which may be because of updated versions, but that’s neither here nor there.↩
- A single-user mode was suggested in 2018 but as far as I can see there’s been no move to implement it.↩
- I don’t know Go, but I’ve read enough about the language to understand a lot of it.↩