I used to use broken-links-inspector to verify links during development in GitLab. A single run would last more than 15 minutes, because it can either check all links simultaneously or check one at a time. Obviously, requesting thousands of links at once would be counterproductive, so I ran it in the second mode, which drastically slowed down the process. It would also add thousands of views per run to my live site. After having recently switched to self-hosted builds, I spent some time considering my options.

lychee sounded perfect but it doesn’t support recursion yet. The performance-focused muffet was promising too; I’d considered it when I initially implemented the check, but it (still) lacked the ability to produce a JUnit report that GitLab could parse. Aside from that, though, it seemed superior to the alternatives, so I opened a pull request that was soon accepted. I use it like this:

YAMLcheck_links:
  stage: test
  image: raviqqe/muffet:2.7.0
  variables:
    # No need to clone the repository.
    GIT_STRATEGY: none
  only:
    - merge_requests
  script:
    - |
      /muffet $DRAFT_URL --max-connections-per-host 15 --timeout 5 --max-redirections 10 --buffer-size 32768 -e '^https://shivjm.blog/' -e '^https://www.reddit.com/r/Shivalicious' -e '/this-page-doesnt-exist/$' -e 'https://whatismyipaddress.com/smtp' -e '^https://www.igdb.com/' --junit --ignore-fragments > junit-report.xml
  artifacts:
    when: always
    paths:
      - junit-report.xml
    reports:
      junit: junit-report.xml
  allow_failure: true

As a point of reference, broken-links-inspector took over 17 minutes to check 13,152 links; muffet takes 43 seconds to check 12,879 links and doesn’t add extra views to my site. It’s blindingly fast. In fact, it’s so fast that it’s being rate limited by sites like npm. I had to reduce the number of connections per host from the (rather optimistic) default of 512 to 32, then 25, then 15.

One thing I hadn’t realized earlier is that I should also run the checker against the production site in order to save the report, which allows GitLab to provide comparisons in merge requests. I doubt this will work correctly with different source URLs between the draft and production deployments, though, so I’ll probably have to find a way around that.

Next in series: (#35 in Colophon: Finding A Place For My Head)