Home » Blog
date 22.Oct.2017

■ Plain text preview and keyword searches in EPUB documents

EPUB is one of the popular ebook formats, but like most of them it is poorly integrated with windows search and preview. E-book fans use specialized tools like Calibre to manage their book collections, which creates its own search index. But a generic IFilter text extraction plugin was missing, and without it you cannot have fast indexed search in windows.

For TL;DR types I immediately present the EPUB text filter to download. It delivers plain text previews and fast indexed search in EPUB documents, with xplorer², windows explorer and all shell compatible file managers.

Click to download EPUB search filter (3.3MB, build 1.000)

Minimum requirements: windows XP or later (32 or 64 bit)

EPUB text preview in xplorer²
Figure 1. xplorer² showing text preview of EPUB ebook

EPUB IFilter misadventure in more detail

I did a search for an EPUB IFilter component to download, but oddly there was none to be found for such an easy format (EPUB is just a disguised zip archive). As I was contemplating to write my own IFilter (a shell extension DLL), I came across a forum post claiming that Sumatra PDF (one of the recommended lightweight PDF viewers) already had such a component but for some unknown reason they wouldn't distribute it.

The good news is that sumatraPDF is open source, and anyone can grab the C++ source code from Git and play with it. Apparently all that was required was defining a preprocessor symbol BUILD_EPUB_IFILTER and the EPUB code would be included in the IFilter DLL.

It also includes a LaTex filter but last time I used .TEX was 25 years ago in college

The down side for romantic types that exalt the simplicity of yesteryear and despise the pointless bloat of modern tools, is that building the EPUB-inclusive PDFFilter.DLL requires visual studio 2015 or later. So I had to install 10+ GB of vs2015 javascript rubbish on a virtual machine to compile and link the code.

An even bigger obstacle proved to be Git, the source code repository and collaboration fashion du jour. SumatraPDF has some external dependencies so getting the ZIP from github wasn't enough. I had to install git for windows, open a git account, follow the initial setup instructions, all to no avail. Using the windows command line interface git-cmd.exe would just fail:

git clone --recursive git@github.com:sumatrapdfreader/sumatrapdf.git
git@github.com: Permission denied (publickey).
fatal: Could not read from remote repository.

After wasting time trying to understand what that error meant, I ended up downloading git desktop, which delivered an old git from such undeserved command line misery, and I got all the sumatra source code cum dependencies in one go.

I built both 32 and 64 bit IFilter dlls, and wrapped them in an installer for the benefit of ebook readers at large who haven't read anything about REGSVR32 (manual registration would also ruin any PDF filters you had in place, which you probably don't want). The download location is further up.

The main credit (and any faults) with the EPUB plugin lie with SumatraPDF developers. I merely repackaged their work.

What about other ebook formats?
Arguably EPUB isn't the most popular ebook format. With amazon Kindle I mostly read MOBI and AZW ebooks. How do you preview and search in these? The bad news is that many of these books are encrypted and linked to the kindle device, so there's not much chance of a generic text preview. Still the xplorer² plugins page lists some shell extensions that provide thumbnails and attributes like book title and author (see the MobiHandler plugin). The plugin page also lists text filters for Sony/Canon ebooks (LRF and FB2) and DjVu ebooks — but personally I don't have any books in these formats.

Post a comment on this topic »

Share |

©2002-2017 ZABKAT LTD, all rights reserved | Privacy policy | Sitemap