I've recently taken to reading blog posts and other internet articles on my ereader. And I don't mean using my tablet's browser and wifi connection to load up websites. Instead, I convert the articles I want to read to PDF and read them like I would any other ebook (I have a large screen tablet on which reading PDFs is very comfortable; I would probably be playing around with EPUB conversion if I had a smaller screen).
The obvious way to get a PDF of a website would be to use my browser's built in print-to-PDF feature. But this has some minor problems for me:
That second point — about automation and scripting — was
particularly important to me. So the obvious tool for the job was the Swiss-army
knife of document conversions, pandoc
.
For a while I was wondering if I would have to write some clever script that
downloads all of the article's HTML and other resources (like images) and then
inputs them to pandoc
. Fortunately, it turns out that pandoc
<article url> -o <output file>
does exactly what you think it
does. The article ends up converted to PDF, with LaTeX used as an intermediate
step, so everything is in the beautiful LaTeX font. pandoc
also
takes care of downloading and including images.
I wrote a short script that calls pandoc
and saves the PDF in a
specific directory. With that script available and working, I added hotkeys to
my browser and RSS reader that invoke it. These are the two programs in which I
might find articles to read, and now I can easily generate PDFs from both.
Here's what the newsboat
config looks like:
macro p set browser "article2pdf %u" ; open-in-browser ; set browser "elinks %u"And here's the
qutebrowser
binding:
config.bind( '"P', 'spawn article2pdf {url}' )(
article2pdf
being the name of my script)
This doesn't work perfectly.
--pdf-engine=xelatex
flag when calling pandoc
doesn't fully mitigate the issue, but it will produce reasonable output
without completely failing.