Improving Ebook Data Quality: A Frank Assessment and the Path Forward

Improving Ebook Data Quality: A Frank Assessment and the Path Forward

Quality is never an accident; it is always the result of high intention, sincere effort, intelligent direction and skillful execution; it represents the wise choice of many alternatives.


The level of data quality in EPUBs is generally low. This is true whether the EPUB is generated in house or through a service provider. Publishers give ebooks short shrift. The quality of publishers EPUBs is far lower than of their print products. While this does not come as a surprise to production managers and publishing services organizations—those involved in the nitty-gritty of data conversion—it often is a shock to senior management, who thought that EPUB creation was a problem solved long ago.

EPUB data quality is an issue that includes a number of aspects including:

  • Standards Conformance: Does the EPUB file conform to the IDPF standards? Does it pass validation?
  • Consistency of Data: Are a publisher’s EPUB titles implemented in a similar way or is there variability from title to title?
  • Robust Metadata: Has the publisher consistently implemented Dublin Core bibliographic metadata, subject-specific subject terms, and structural tags?
  • Quality of CSS: Cascading Style Sheets file in the EPUB package determines how the ebook will look on a device or in a browser. Was the CSS created with precision and with a sensitivity to design best practices, as well as the design standards of the publisher?
  • Cross-browser/device behavior: Different devices and browsers may handle display in different ways. Does the CSS and other EPUB components include the requisite information to optimize the appearance of the EPUB in these varying scenarios?

In order to validate my observations, I turned to friend and colleague, Joshua Tallent, chief ebook architect at eBook Architects. Joshua is one of my gurus for all things EPUB. Joshua echoed what I am seeing in terms of shaky EPUB data quality. “Publishers are either unaware of the issues of data quality in their EPUB files or unable to make changes needed to improve it. Publishers who outsource their work to vendors are at the mercy of the vendor’s practices and tools. Publishers who create their eBooks in-house run into other problems if they are relying on one-button conversion tools (as found in InDesign) without first defining well-formed manuscripts. These tools tend to be garbage in, garbage out, so the quality of the EPUB files coming out the other side is usually not very good.”

Publishers must start to raise the level of data quality of their EPUB titles. If not, they will find it difficult to hold to price points in the market place, as customers become savvier and their expectations rise. Further, the quality of EPUB data will increasingly be a competitive differentiator, not only for customer sales but also for author acquisition. Authors will sign with publishers that can create the best EPUB experience for their title.

The EPUB quality status quo appears untenable. Change is required.

It starts with the organization itself.Anyone involved in the creation or distribution of eBook files needs to learn more about the formats and best practices,” says Tallent. “That is true regardless of whether they are a developer, a manager, or the person doing QA. The more you know about what EPUB can do the better your eBook files will become. That means digging in and learning. It means reading the specs, taking classes about ebook development, and testing how these things work.”

“I have been impressed by some movement in the past year or so on this front. Production managers at some publishing houses are becoming more knowledgeable about the need for quality code, and are starting to implement standards for their EPUB files. Some are moving to EPUB3 with the goal of re-thinking their processes and taking their quality to the next level.”

Book publishers are a fortunate lot benefiting from a robust, digital formatting standard in EPUB. The EPUB3 specification is a comprehensive technical standard for digital book content that defines a standard way for formatting book content (HTML5), and a comprehensive set of specifications for metadata to define book structure and label its subject matter. EPUB 3 is a tremendous asset for publishers of all sizes in most market segments. However, the strategic value a publishing organization derives from EPUB will be determined in large part by the level of data quality of the EPUBs they produce.

A Recipe for Success

 Publishing organizations can improve the quality of the EPUBs they produce by addressing the following issues in the following sequence:

  1. People: EPUB quality rests on the quality of the publishing team. Starting with training is a great idea, as Joshua described. Joshua’s own eBook Ninjas a good place to start. The IDPF can provide additional options for ramping up the EPUB chops of the publishing teams.
  2. Standards Definition: Publishers must define and document their own EPUB standards to include the detailed specifications for CSS, subject tagging, structural tagging. These standards can be used as acceptance criteria for external EPUB service providers. All EPUBs produced must conform to these standards.
  3. Defined Workflow: A well-defined, documented production workflow assists the creation of high-quality EPUBs.
  4. Automation Platforms: An organization that has an EPUB-savvy team and well defined and documented EPUBS and workflow may be in a position to use an automated approach to create many of their EPUB titles directly from source manuscripts. Those organizations that embark on automation initiatives without first putting in place a team, standards, and workflow will find themselves on a more lengthy and painful course. A caveat: as Joshua points out there are limits to this approach and it may not be suitable for all titles.

For many publishing organizations, there is much to do order to raise the bar on EPUB quality. But there is a bright side: the path forward is clear, far clearer than it was even a few years ago.

We have robust digital book standards. We have identified opportunities in the marketplace. Now is the time to put the two together.