Rewrote It In Rust: A Tiny Go Scraper
There’s a news website I frequent that helpfully dropped its Atom feed a few years ago as part of a redesign. I promptly wrote a user script to convert the many fragmented lists of articles on its homepage into a single, linear list. From then on, I had to open the website separately every day before my main feed reader. This was a constant source of irritation given that every other site cheerfully allows Inoreader to consolidate its articles into a single list with months of history and no gaps.
Since the website was redesigned as an SPA, it provides an endpoint for the frontend to fetch a list of articles from. Therefore, about a year ago, as part of forcing myself to learn some Go, I put together a tiny program to generate an HTML listing using said endpoint. Instead of opening the website and relying on my user script, I could peruse a minimal, greatly simplified list of articles. I could also fetch an arbitrary number of articles (necessary when catching up) on my own schedule.
This manually-driven system was acceptable as a temporary workaround. My ultimate goal was to build something that would let me emulate the old feed. I did just that last month. The first step was to extend the program. I split it into two commands, one for the HTML file and one to generate an Atom feed from the same data. That was when things went sour.
I really don’t like Go. It’s better than, say, C++, but it’s designed to hide complexity at the expense of correctness and reliability. At the same time, it feels primitive, as though it discards all prior experience designing programming languages. Writing even this minuscule utility was disproportionately difficult to begin with—as a rule, Go documentation is sub-par, simple things are hard, the compiler is fussy about formatting instead of more important things, and the package system makes me tear my hair out. The design choices (such as time formatting and parsing) and patterns (such as type assertions and switches instead of generics until the newest release) make no sense. Trying to rewrite my program to use the popular Cobra CLI library and Viper configuration library broke my spirit as I struggled to encode the most basic constraints. So I jumped ship.
Writing just the original Go took most of a day. Rewriting the entire expanded version in Rust took under two hours, this time with correct handling of commands using argh, a practically free test, and safeguards against all manner of error conditions. The biggest issue was that the atom_syndication crate uses chrono to handle dates & times whereas I used the newer time crate. Instead of switching to chrono everywhere, I figured out a hack to convert from one crate’s types to the other (involving adding chrono to my own dependencies) just so I could generate the Atom feed. Once it was working, I added a command to upload the resultant file to a predetermined S3 bucket and quickly and easily threw in structured logging with the tracing crate.
Finally, I needed to set all this up to run every 10 minutes in my Kubernetes cluster. The unexpectedly hard part was building the binary in Docker. Once that was done, I wrote a trivial CronJob and had it working immediately. As is usual with Rust, it’s been silently doing its job ever since. All that’s missing missing is metrics; I’ll set those up sometime using Prometheus’s Pushgateway.
What’s my point? Well, although the Rust version is more memory– and CPU-efficient, it’s not about performance—the time spent in my code is dwarfed by the time spent on the single HTTP request. It isn’t about putting Go down, either. Like with any language, people write all sorts of things in Go, from the tiny and trivial to the massive and critical, with the same spectrum of quality as anywhere else.
Rather, this is about how I was able to quickly and easily build something correct, reliable, and efficient thanks to Rust. I didn’t have to fight the language; by convention and by design, behaviour is explicit and comprehensive instead of implicit and partial, which suits how I think. It’s true that Rust expects me to put in more initial thought and effort (though the borrow checker was a negligible factor in this instance). In exchange, it gives me confidence in my work.
That’s all I want.