How to print a book from a wordpress blog
Even in our digital world you might want to convert your wordpress blog into a paper book. Of course there are services available claiming to do so. But the ones I found didn’t match my specific requirements of
- Minor content tweaks
- Custom formatting
- Inline comments
- High resolution pictures for offline processing
- Picture captions and mouse-over labels
- DIY attitude
So after a lot of trial & error, I found out that I can
- export the wordpress content as XML,
- create a single page HTML file with all content locally,
- automatically tweak some styles,
- import the HTML into Word,
- re-format as needed,
- save as PDF and optionally create an eBook version.
If you want to follow my steps, you need these tools (highly ‚personalized’ and quite a bit developer-driven – it might not be the right process for you).
- Firefox & text editor
- Microsoft Word (for Windows)
- Ruby development environment with nokogiri gem
- My wordpress to single HTML page script from github
- Optionally Calibre for eBook creation
Export the wordpress content as XML
Easily done through your WordPress Dashboard.
Create a single page HTML file with all content locally
Run the attached Ruby script to convert the WXR file into a single page HTML document. This is where most of the magic happens and also the most fragile part. The script is aligned to the elements I typically on my blogs and it might differ from others. But with a little bit f Ruby knowledge it shouldn’t be too hard to tune this. Basically it takes the XML file, filters for the posted and published stories, tags the various elements with different HTML classes and has some processing around images to include captions and mouse-over titles. It returns the HTML on the console, so best is to invoke it like this: wordpress_to_single_html.rb ‘your wordpress export’ > single_html.html
Now the pictures are still on the wordpress server. Use Firefox to open the HTML and save it again with the option ‘website complete’ to have everything on your system (incl. pictures) for faster offline access.
Finally open the newly saved HTML file in a text editor and search&replace all relative img URLs with absolute paths (e.g. substitute ‘myblog_files/ with file:///c/myblog_files/. This is sadly required for the Word import.
import the HTML into Word & reformat styles
After opening/importing the HTML in Microsoft Word you can modify styles and ‚pimp’ the content as you want. Check for styles beginning with an _ created by the Ruby script to mark different elements of the blog (content, headings, comments, post_date, …). Save as docx for future needs (and always keep the images folder with the docx).
save as PDF and optionally create an eBook version.
Most print on demand services take a PDF, so simply save your document as a PDF. If you want to create an eBook version as well it you enter the ‚format hell’ for eBook content. Calibre seems to understand most formats and can also load the HTML export from Word to e.g. create a version in the epub format. (More general info about ebooks.)
That’s all. Isn’t it simple?
Note about Apple Pages: Using Apple Pages seems the more obivous choice for text processing on a Mac. However the recent Pages versions removed the HTML import. So there wasn’t an easy way to get the wordpress content nicely formatted into Pages. Two workarounds are available: One ist to simply copy&paste content from the safari and the other is to use TextEdit (which still has a HTML import) to create a RTFD (RTF including attachents) and then load it into Apges. Unfortunately all pictures are scaled up tot he full page and this makes it painful if you have plenty of pictures embedded.
Note about Microsoft Word for Mac: It turned out that my mac version had multiple hiccups with a few hundred pages of text and plenty of included pictures. Switching to Windows made it less stressful for me.
Note about Microsoft Word: Seems plain wrong to me that recent versions of Word have problems with images you want to link in. My impression is that if you include a picture via a link to an external file, Word creates an absolute file path reference to this. Of course this makes it impossible to move the document and files around – even on your own local system. And when trying to embed to files right into the docx (which of course can seriously bloat the file size up), at least form e many pictures changed the scaling. Some oft hem were even uglily transformed.