Adventures in Preformatted Text
I improved the markup for output. Something like this:
Markdown```output
terminal output
```
Normally produces markup similar to:
HTML<pre class="language-output"><code language="language-output">terminal output</code></pre>
But it should actually use samp
to indicate quoted
output instead of code
. To fix
this, I’ve added a post-processing step. The same Markdown now produces:
HTML<pre class="output"><samp class="">terminal output</samp></pre>
Which can be styled differently from code if I so choose. (The empty class
attribute is a bit of
laziness on my part and could be omitted.)
One complication is that, for example, a shell session might include commands typed by the user
intermingled, so it might be more correct to have parts of the session in samp
and parts in kbd
.
However, neither does Markdown have support for anything like this nor can I imagine any
straightforward syntax. Nor do I think it’s incorrect, strictly speaking—technically, the shell
session includes the result of the user pressing keys.
Line numbers and code blocks
I wanted to add line numbers to code blocks. The hitch was that I would need to do it after
converting them to HTML, by adding span
elements around each line. I expected this to
require a complicated algorithm that kept track of the current stack. Then I realized I could do it
in a more roundabout way, with a remark plugin that:
- Converts the contents of each
pre
into a string. - Splits it into lines.
- Uses an existing library to parse each line, automatically closing open tags at the end.
- Converts those parsed lines back into strings.
- Parses those strings into the hast format remark needs.
I tried this approach with sanitize-html, which
didn’t work because it double-escaped >
, and then
clean-html, which didn’t preserve whitespace. Through
trial and error, I determined I could instead parse the incomplete lines with
parse5, the output from which can be directly converted to
hast.
This worked to wrap the lines in span
s. Then I added the numbers. I found myself adding another
layer of span
s around the contents of each line for styling purposes, and had to make the remark
plugin communicate with the CSS through style
attributes and Custom Properties (though this
would have been unnecessary if display: subgrid
were
supported), but I managed to get it working:
This revealed three new issues. First, my main reason for wanting to split the blocks into lines was
so I could use white-space: pre-wrap
and avoid horizontal
scrollbars, but the blocks looked very odd with that change. Second, when I reverted to the default
of white-space: pre
, the blocks collapsed into a single line. Why? Because, unaware that it was
parsing preformatted text, parse5 discarded the newline characters I had added between the span
s.
Third, I had to use word-break: break-all
to make it wrap the lines properly, which is acceptable
in an editor like Emacs, which can display continuation markers in the margins, but doesn’t look
right on the web, which can’t.
Now, there are solutions. For example, I could explicitly wrap the lines in pre
elements before
passing them to parse5 and extract only the children when returning them to remark… but why do that?
The result is unappealing. Considering the lengths I went to only to arrive at a disappointing
conclusion, I decided to undo the changes, though I made sure to keep them in the repository
history for future inspiration.