Blog.



Downloading Articles for my Ebook Reader

(April 8, 2021)

I've recently taken to reading blog posts and other internet articles on my ereader. And I don't mean using my tablet's browser and wifi connection to load up websites. Instead, I convert the articles I want to read to PDF and read them like I would any other ebook (I have a large screen tablet on which reading PDFs is very comfortable; I would probably be playing around with EPUB conversion if I had a smaller screen).

The obvious way to get a PDF of a website would be to use my browser's built in print-to-PDF feature. But this has some minor problems for me:

That second point — about automation and scripting — was particularly important to me. So the obvious tool for the job was the Swiss-army knife of document conversions, pandoc.

For a while I was wondering if I would have to write some clever script that downloads all of the article's HTML and other resources (like images) and then inputs them to pandoc. Fortunately, it turns out that pandoc <article url> -o <output file> does exactly what you think it does. The article ends up converted to PDF, with LaTeX used as an intermediate step, so everything is in the beautiful LaTeX font. pandoc also takes care of downloading and including images.

Hotkeys

I wrote a short script that calls pandoc and saves the PDF in a specific directory. With that script available and working, I added hotkeys to my browser and RSS reader that invoke it. These are the two programs in which I might find articles to read, and now I can easily generate PDFs from both.

Here's what the newsboat config looks like:

macro p set browser "article2pdf %u" ; open-in-browser ; set browser "elinks %u"
And here's the qutebrowser binding:
config.bind(
        '"P',
        'spawn article2pdf {url}'
)
(article2pdf being the name of my script)

Caveats

This doesn't work perfectly.

If you have any questions or comments about this post or site in general, feel free to email me.