Modify ePub Plugin Help

Introduction

The primary purpose of this plugin is to allow you to make repetitive cleanup or modifications to your ePub files to improve their appearance, file size or specification quality without the need to perform a calibre ePub->ePub conversion.

Performing a calibre conversion will force changes to your file that you have no control over, some of which are undesirable. For instance the CSS file is completely rewritten, margins are changed, files are split based on images being detected, etc. Often there may be no visible harm, however in other cases it can and does cause issues.

This plugin can be seen as a companion to the “Quality Check” plugin which provides a number of menu options for detecting ePubs showing symptoms of issues which this plugin can resolve for you.

Known Artifacts Modifications

Option

What it does

How to detect

Remove iTunes files

When viewing an ePub via iTunes, it will insert a playlist file inside the ePub and sometimes an artwork file for displaying the cover in iTunes. These files can be considered “cruft”, particularly if you do not intend to use iTunes to view your ePubs in future.

Use this option to remove iTunes plist and artwork files from within the ePub.

Quality Check:

“Check iTunes files”

Remove calibre bookmark files

When viewing ePub files in calibre, it will insert a bookmarks file similar to iTunes above storing your last reading position and any bookmarks added. You can disable this feature of the viewer in the viewer preferences.

Use this option to remove calibre bookmark txt files from within the ePub.

Quality Check:

“Check calibre bookmark files”

Remove OS artifacts

When extracting the contents of an ePub to your hard-drive and then rezipping, there can be additional files created by your operating system inserted that are not related to the ePub. For instance on windows systems an images folder may have a Thumbs.db file generated for the preview view of images in that folder. On MACOSX it may add .DS_Store files.

Use this option to remove OS files from within the ePub.

Quality Check:

“Check OS artifacts”

Remove unused images

Looks for orphaned jpg, png and gif files that are not referred to from the HTML content pages and can be removed from the ePub. This can happen if for instance in Sigil you delete a page without removing the associated image(s). WARNING: This does not inspect CSS files.

Use this option to remove unused image files from within the ePub to reduce space required.

Quality Check:

“Check unused image files”

Unpretty

De-indents HTML code and makes sure paragraphs/headers have their own lines.

 

Strip spans

Removes empty formatting elements (ie. <i/> or <b></b>) and SPAN elements that have no attributes.

This also converts empty container elements into self-closing elements (ie. <br></br> becomes <br/>).

 

Strip Kobo DRM remnants

Removes elements related to Kobo's DRM.

This also removes the kobo.js and rights.xml files from the book, any references to them, and the Kobo CSS definition in the documents.

 

Manifest (.opf File) Modifications

Option

What it does

How to detect

Remove missing file entries from manifest

If the ePub has been manually tweaked, it is possible that someone deleted a file from the directory but did not remove the entry from the .opf manifest xml file. Most tools will ignore missing files when viewing or editing that ePub, however it cannot be guaranteed that is always the case. It could also be from a typing error if the manifest was manually edited or the file renamed afterwards.

Use this option to cleanup the manifest by removing references to any missing files.

Quality Check:

“Check manifest files missing”

Add unmanifested files to manifest

The ePub may contain files that are not listed in the .opf manifest. These could be from incorrect matching names in the manifest, from orphaned files that should be deleted, or from third party tools that leave “cruft” inside the ePub file. Note that iTunes files, calibre bookmarks and OS artifacts are explicitly ignored by this check.

Use this option to add all files not currently listed in the manifest into it.

Quality Check:

“Check unmanifested files”

Remove unmanifested files from ePub

See above for the causes.

Use this option to delete all of the orphaned files in the ePub that are not listed in the manifest file.

Quality Check:

“Check unmanifested files”

Adobe Modifications

Option

What it does

How to detect

Remove margins from Adobe .xpgt files

ePubs that have been created using Adobe tools will contain a .xpgt file that enforces margins. These are in conflict with the traditional css styles and can cause wasted space when viewing on devices. Recent versions of calibre when converting will zero any margins in such a file.

Use this option to zero the margins in the .xpgt file without needing to perform a conversion.

This option is redundant and ignored if you tick “Remove .xpgt files and links.”

Quality Check:

“Check Adobe .xpgt margins”

Remove Adobe .xpgt files and links

As a more extreme version of the remove margins option above, users may choose to obliterate the xpgt file completely along with any links to it from the xhtml files.

Use this option to remove Adobe margins and all associated xpgt cruft without needing to perform a conversion.

Quality Check:

“Check Adobe inline .xpgt links”

Remove Adobe resource DRM meta tags

Books that have had DRM protection removed will still contain an Adobe <meta> tag with a urn identifier in the xhtml files.
Use this option to remove Adobe DRM cruft.

Important: Using this can break obfuscated fonts in the book.

Quality Check:

“Check Adobe DRM meta tag”

Remove page maps

Adobe has defined a proprietary extension to the ePub standard which identifies where pages break in the print version of a book.
Use this option to remove this file and, for Google Play books, the related anchors in the HTML code.
(This does not affect pagelists found in NCX files, which are part of the ePub standard.)

Quality Check:

Remove only Google Play page maps

The Google Play bookstore adds an Adobe page map file which does not correspond to any print version of the book.
Use this option to remove only Google Play page map files, as opposed to those from other sources.

This option is redundant if you tick “Remove page maps.”

Quality Check:

TOC Modifications

Option

What it does

How to detect

Flatten TOC hierarchy in NCX file

Some devices do not work well with hierarchical TOC (table of contents) navPoint entries in the ncx file. This option will flatten such entries to all be at the same top level.

Use this option to make a flat TOC to work better with some devices.

Quality Check:

“Check TOC hierarchical”

Remove broken TOC entries in NCX file

An NCX file containing broken links from missing html pages will cause errors when viewed. Broken links are most frequently caused by calibre conversions (orphaned cover page links) or manual editing via Tweak ePub/Sigil and not editing the NCX.

Use this option to ensure your TOC does not contain any entries which will cause errors due to missing content.

Quality Check:

“Check TOC with broken links”

Metadata Jacket Modifications

Option

What it does

How to detect

Remove all metadata jackets

Remove any calibre generated jackets listing book metadata such as title, authors, comments and rating. Jackets removed are both those from the latest versions of calibre, and those “legacy” jackets generated using versions of calibre prior to 0.6.50. The “newer” jackets are able to be identified by a metadata tag in the xhtml.

Use this option if you do not want jackets in your books.

Quality Check:

“Check having any jacket”

Remove legacy metadata jackets

Removes calibre generated jackets that were created using versions of calibre prior to 0.6.50. These jackets cause a problem when the file is reconverted, as the calibre code does not detect them and will duplicate and potentially split them.

Use this option if you do not want the legacy jackets, or intend to reconvert in future and wish to avoid issues.

This option is redundant and ignored if you tick “Remove all metadata jackets”

Quality Check:

“Check having legacy jacket”

Add/replace metadata jacket

Creates a metadata jacket page in the ePub if it does not exist, or replaces any existing one if it is found.

Use this option if you want to add a metadata jacket without performing a conversion.

 

Jacket at the end of the book

If a jacket is added/replaced, it is placed at the end of the book instead of the beginning.

 

HTML & Style Modifications

Option

What it does

How to detect

Encode HTML in UTF-8

Some ePubs have an invalid encoding in their HTML pages, which means they render incorrectly in readers like the calibre ebook viewer, Sigil or a web browser. Most often this is seen as characters like  appearing instead of quotes etc. These will however render correctly in ADE. Rather than doing an ePub->ePub conversion, we can instead strip the invalid <meta> tag from the html pages and insert an xml declaration indicating utf-8 instead which most often will be sufficient to resolve the issue.

Use this option to fix invalid encoding declarations on html pages that do not render correctly.

Visually in Sigil or calibre ebook viewer

Remove embedded fonts

Some ePubs carry embedded fonts as .ttf or .otf files, to ensure that their content is rendered with a font representing all the characters they contain. Some devices may not support embedded fonts, and these do significantly increase the ePub size so some users prefer to remove them. This also removes any @font-face declarations from css or html files.

Use this option to remove embedded font files.

Quality Check:

“Check embedded fonts”

Modify @page and body margin styles

An ePub that has not been converted by yourself in calibre may have body or @page styles with margins set to values that differ from your desired defaults. You can set your calibre conversion defaults using Preferences -> Conversion -> Common Options -> Page Setup. If you set negative values then this option will remove the margin attributes from the ePub , and if a CSS file is now empty then it will be removed from the ePub completely. Otherwise it will write whatever default value you have specified into a new @page style in each CSS file. Note it does not currently support changing named body styles.

Use this option to remove @page and body margin values and if your calibre defaults are non-negative then rewrite into an @page style.

Quality Check:

“Check CSS book margins” and “Check inline @page margins”

Smarten punctuation

Processes any html files in the ePub to ensure quotes and apostrophes are converted to smart quotes. In addition, double hyphens (--) are converted to an emdash (—).

Use this option to prettify your ePub without a conversion.

Quality Check:

“Check smarten punctuation”

Remove inline javascript and files

Looks for any .js files forming part of the ePub and and inline <script type=”text/javascript”> blocks. Javascript is usually a leftover from an original conversion from html and is unnecessary in an ePub.

Use this option to remove javascript cruft from your ePubs.

Quality Check:

“Check javascript <script>”

 

Cover Modifications

Option

What it does

How to detect

Remove broken image pages

Looks for html pages that contain nothing but a <img> or <image> tag that links to a non-existent image file. If that html page body contains no other text content, then the html page will be completely removed from the epub.

Use this option to remove orphaned cover pages that result from some calibre epub conversions due to the way it replaces some cover pages.

Quality Check:

“Check broken image links” will find a superset of all broken image links. View the log for details.

Remove existing cover

Examines the ePub to see if it has an existing cover identified by guide and/or meta entries in the opf manifest. If such a cover can be found, the relevant entries that indicate it is a cover page are removed, along with the cover page itself if it has no other images/text on it.

Use this option to remove cover pages from an epub if you do not want them such as either to reduce size or if removing calibre generated default covers.

This option is redundant and ignored if you tick “Insert or replace cover.”

 

Insert or replace cover

Performs the same steps as “Remove existing cover” above to identify and remove any existing cover. A new cover page will be generated using your default ePub output options and inserted as the first page in your ePub using the image associated with that book in your library. This option is far more reliable than using “Update metadata” option below to replace your covers, as it handles far more scenarios of identifying an existing cover.

Use this option to insert a new cover (replacing the existing one if detectable) to your ePub without requiring a conversion.

 

 

Metadata Modifications

Option

What it does

How to detect

Update metadata

Updates the ePub metadata in the manifest (such as title, description, authors etc). In some limited circumstances it will also update the cover as well, however you should instead also check the “Insert or replace cover” option for a more reliable cover replacement option.

Use this option to update the title/author/description internal metadata for your ePub to get it “up to date” for use in the calibre ebook viewer.

 

Remove non dc: metadata elements

Applications like calibre and Sigil will insert metadata elements of their own in the manifest opf file that have no relevance to the ePub and are either informational or only for use by that tool. The “Update metadata” option above will insert such elements. These elements do no harm, but if you are publishing your ePub via other websites, you may want such evidence of editing removed first.

Use this option to remove any elements from the manifest xml that start with the dc: namespace for a clean ePub.

Quality Check:

“Check non dc: metadata”

 

Further Help

For any problems, issues or obtaining the latest version of this plugin, please refer to the MobileRead forums for calibre plugins.