Splitting Lines of Code with rehype
Eleventy uses markdown-it by default to render Markdown, but I use remark and rehype. My original motivation was more control. While I’m not sure remark really offers that, I’ve customized it a fair bit at this point.
Either way, code blocks on A Place For My Head are highlighted by rehype-prism. I wanted to split each block into individually-wrapped lines, so that:
HTML<pre><code>foo
bar
baz</code></pre>
Becomes:
HTML<pre><code><span class="line">foo</span>
<span class="line">bar</span>
<span class="line">baz</span></code></pre>
However, pre
elements can contain any phrasing content, such as span
s. If I just split the text
based on newlines, the results could be incorrect. This:
HTML<pre><code><span class="something">foo
bar
baz</span></code></pre>
Would become this, in which the first span class="line"
is interpreted as wrapping the entire
contents of the pre
:
HTML<pre><code><span class="line"><span class="something">foo</span>
<span class="line">bar</span>
<span class="line">baz</span></span></code></pre>
Such a scenario might not be likely in practice, since as far as I can tell Prism highlights one line at a time anyway, but I wanted to do better. My solution is a tad convoluted:
- Turn the rehype AST into a string with hast-util-to-html.
- Split into raw lines, like the naïve approach.
- Trim trailing newlines.
- Parse each line with a regex (I know, I know… this is a specific scenario where the input is constrained) to get a list of tags left open in the end.
- For each line except the last, close the open tags at the end and re-open them on the following line.
- Add
data-line
to individual lines anddata-digits
to the block to simplify layout and styling later. - Parse this back into an AST with parse5. I originally intended to parse incomplete fragments, which I no longer need, so I might move away from parse5.
The final product is even more complicated because I needed a wrapper element within a wrapper element. Here’s an example of the results (which I unfortunately can’t pretty-print since it’s preformatted):
HTML<pre><code data-digits="1"><span class="line" data-line="1"><span class="line-content"><span class="token punctuation">(</span><span class="token defun"><span class="token keyword">defun</span> <span class="token function">aankh/activate-rfc1345</span> <span class="token punctuation">(</span><span class="token arguments"></span><span class="token punctuation">)</span></span>
</span></span><span class="line" data-line="2"><span class="line-content"> <span class="token punctuation">(</span><span class="token car">setq-local</span> default-input-method <span class="token quoted-symbol variable symbol">'rfc1345</span><span class="token punctuation">)</span>
</span></span><span class="line" data-line="3"><span class="line-content"> <span class="token punctuation">(</span><span class="token car">activate-input-method</span> <span class="token quoted-symbol variable symbol">'rfc1345</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
</span></span><span class="line" data-line="4"><span class="line-content">
</span></span><span class="line" data-line="5"><span class="line-content"><span class="token punctuation">(</span><span class="token defun"><span class="token keyword">defun</span> <span class="token function">aankh/activate-input-method-in-insert-state</span> <span class="token punctuation">(</span><span class="token arguments"><span class="token rest-vars"><span class="token lisp-marker">&rest</span> <span class="token argument variable">args</span></span></span><span class="token punctuation">)</span></span>
</span></span><span class="line" data-line="6"><span class="line-content"> <span class="token punctuation">(</span><span class="token car">activate-input-method</span> default-input-method<span class="token punctuation">)</span><span class="token punctuation">)</span>
</span></span><span class="line" data-line="7"><span class="line-content">
</span></span><span class="line" data-line="8"><span class="line-content"><span class="token punctuation">(</span><span class="token car">advice-add</span> <span class="token quoted-symbol variable symbol">'evil-insert-state</span> <span class="token
lisp-property property">:after</span> <span class="token quoted-symbol variable symbol">'aankh/activate-input-method-in-insert-state</span><span class="token punctuation">)</span></span></span></code></pre>
The core algorithm should be usable inside any plugin architecture that allows conversion between an AST and a string, or even just working with the raw string. I’ll publish my rehype implementation on GitHub eventually, once I’ve added some documentation and answered some release-related questions. (I’ve already added tests, which are in fact the first on this blog, but more on that another time.)