Handling Malformed EPUB files

EpubReader has configuration options to handle malformed EPUB files (i.e. the files that deviate from the EPUB specification). This can be done by creating an instance of the EpubReaderOptions class, setting the appropriate properties (see the sections below for examples), and passing it to one of the methods of the EpubReader class.

Alternatively, you can use one of the three configuration presets available via the EpubReaderOptionsPreset enumeration:

EpubReaderOptionsPreset.STRICT (default) — all EPUB validations are enabled. If a EPUB book fails any of the EPUB validations, an exception will be thrown.
EpubReaderOptionsPreset.RELAXED — disables EPUB validation errors that are most common for the real-world EPUB books.
EpubReaderOptionsPreset.IGNORE_ALL_ERRORS — disables all EPUB validation checks. EpubReader will try to salvage as much data as possible without throwing any EPUB validation exceptions.

Keep in mind that those options and presets affect only EPUB validation checks, but don't prevent EpubReader from throwing other exceptions. For example, if you're calling the EpubReader.ReadBook(filePath, EpubReaderOptionsPreset.IGNORE_ALL_ERRORS) method with the filePath value pointing to a non-existent file, EpubReader will still throw a FileNotFoundException.

If you are using one of the OpenBook or ReadBook method overloads of the EpubReader class without the EpubReaderOptions or the EpubReaderOptionsPreset parameter, EpubReader will use the EpubReaderOptionsPreset.STRICT preset to handle the book. In this case, the result (either EpubBook or EpubBookRef, depending on the method) is guaranteed not to be null. If you are providing a preset or a custom EpubReaderOptions configuration, EpubReader may return null, if none of the data within the book could be salvaged.

`EpubReaderOptionsPreset` examples

Using `EpubReaderOptionsPreset.STRICT` preset

try
{
    // Load the book into memory and enable all EPUB validations.
    // Because we are using the STRICT (default) preset, the book is guaranteed not to be null.
    EpubBook book = EpubReader.ReadBook("test.epub");
}
catch (EpubReaderException ex)
{
    // The book failed one of the EPUB validations.
}
catch (Exception ex)
{
    // An exception unrelated to EPUB validations has occurred.
}

Using `EpubReaderOptionsPreset.RELAXED` preset

try
{
    // Load the book into memory and ignore common EPUB validation errors.
    EpubBook? book = EpubReader.ReadBook("test.epub", EpubReaderOptionsPreset.RELAXED);
    if (book == null)
    {
        // None of the book's data could be salvaged.
    }
}
catch (EpubReaderException ex)
{
    // The book failed one of the EPUB validations not disabled by the preset.
}
catch (Exception ex)
{
    // An exception unrelated to EPUB validations has occurred.
}

Using `EpubReaderOptionsPreset.IGNORE_ALL_ERRORS` preset

try
{
    // Load the book into memory and ignore all EPUB validation errors.
    EpubBook? book = EpubReader.ReadBook("test.epub", EpubReaderOptionsPreset.IGNORE_ALL_ERRORS);
    if (book == null)
    {
        // None of the book's data could be salvaged.
    }
}
catch (Exception ex)
{
    // An exception unrelated to EPUB validations has occurred.
}

`EpubReaderOptions` examples

If none of the EpubReaderOptionsPreset presets fit your needs, you can customize the behavior of EpubReader by creating your own instance of the EpubReaderOptions class and passing it to one of the methods of the EpubReader class. You can also use one of the EpubReaderOptionsPreset presets as the basis for a custom EpubReaderOptions instance. For example:

EpubReaderOptions options = new(EpubReaderOptionsPreset.RELAXED);
options.BookCoverReaderOptions.Epub3IgnoreMissingContentFile = true;

If none of the presets are specified when creating an instance of the EpubReaderOptions class, the EpubReaderOptionsPreset.STRICT preset is used as the basis.

The following sections provide a few examples of creating custom EpubReaderOptions.

Missing TOC attribute in EPUB 2 spine

The spine element of the EPUB manifest contains the toc attribute which is not required for EPUB 3 books but is required for EPUB 2 books. There are some EPUB 2 books that have the toc attribute missing which causes EpubReader to throw the "Incorrect EPUB spine: TOC is missing" exception.

PackageReaderOptions.IgnoreMissingToc property can be used to instruct EpubReader to ignore this error:

EpubReaderOptions options = new()
{
    PackageReaderOptions = new PackageReaderOptions()
    {
        IgnoreMissingToc = true
    }
};

Invalid EPUB manifest items

The item element within the EPUB manifest has three required attributes: id, href, and media-type. There are some EPUB books that have at least one of those three attributes missing which causes EpubReader to throw the "Incorrect EPUB manifest: item ... is missing" exception.

PackageReaderOptions.SkipInvalidManifestItems property can be used to instruct EpubReader to ignore this error:

EpubReaderOptions options = new()
{
    PackageReaderOptions = new PackageReaderOptions()
    {
        SkipInvalidManifestItems = true
    }
};

Missing content files

The item element within the EPUB manifest has a required href attribute which points to a content file in the EPUB archive. There are some EPUB books that declare content files in the EPUB manifest which do not exist in the actual EPUB archive. This causes EpubReader to throw the "EPUB parsing error: file ... was not found in the EPUB file" exception. Such exception is thrown immediately, if application uses EpubReader.ReadBook / EpubReader.ReadBookAsync methods because they try to load the whole content of the book into memory. EpubReader.OpenBook and EpubReader.OpenBookAsync methods don't load the content, so the exception will be thrown only during an attempt to call any of those methods for a missing file:

ContentReaderOptions.ContentFileMissing event can be used to detect those issues and to instruct EpubReader how to handle missing content files. Alternatively, the ContentReaderOptions.IgnoreMissingFileError property can be used to suppress the error. Application can choose one of the following options:

1. Get notified about missing content files

EpubReaderOptions options = new();
options.ContentReaderOptions.ContentFileMissing += (sender, e) =>
{
    Console.WriteLine($"Content file is missing: content file name = '{e.FileName}', content file path in the EPUB archive = '{e.FilePathInEpubArchive}', content type = {e.ContentType}, MIME type = {e.ContentMimeType}.");
};

This will let application to be notified about the missing content file but will not prevent the exception from being thrown by the EpubReader.

2. Suppress exceptions

EpubReaderOptions options = new();

// Option 2.1 (get notified of missing files):
options.ContentReaderOptions.ContentFileMissing += (sender, e) =>
{
    e.SuppressException = true;
};

// Option 2.2 (if application doesn't need to be notified of missing files):
options.ContentReaderOptions.IgnoreMissingFileError = true;

This will suppress all missing content file exceptions from being thrown. The EpubReader will treat missing content files as existing but empty files.

3. Provide a replacement content

EpubReaderOptions options = new();
options.ContentReaderOptions.ContentFileMissing += (sender, e) =>
{
    if (e.FileName == "chapter1.html")
    {
        e.ReplacementContentStream = new FileStream(@"C:\Temp\chapter1-replacement.html", FileMode.Open);
    }
};

This will let application to substitute the content of a missing file with another content. The value of the ReplacementContentStream property can be any Stream. The content of the stream is read only once, after which it will be cached in the EPUB content reader. The stream will be closed after its content is fully read.

The navPoint element within the EPUB 2 NCX navigation document must contain a nested content element pointing to a content file associated with this navigation item. There are some EPUB 2 books that have navigation points without a nested content element which causes EpubReader to throw the "EPUB parsing error: navigation point X should contain content" exception.

Epub2NcxReaderOptions.IgnoreMissingContentForNavigationPoints property can be used to instruct EpubReader to skip such navigation points (as well as all their child navigation points):

EpubReaderOptions options = new()
{
    Epub2NcxReaderOptions = new Epub2NcxReaderOptions()
    {
        IgnoreMissingContentForNavigationPoints = true
    }
};

Handling XML 1.1 schema files

.NET doesn't have a built-in support for XML 1.1 files (only XML 1.0 files are currently supported). There are some EPUB books that have at least one of their schema files (typically the OPF package file) saved in XML 1.1 format, even though they don't use any XML 1.1 features. This causes EpubReader to throw an XmlException with the "Version number '1.1' is invalid" message.

XmlReaderOptions.SkipXmlHeaders property can be used to enable a workaround for handling XML 1.1 files in EpubReader:

EpubReaderOptions options = new()
{
    XmlReaderOptions = new XmlReaderOptions()
    {
        SkipXmlHeaders = true
    }
};

Important

Keep in mind that enabling this workaround adds an additional overhead for processing all schema files within EPUB book. If this property is set to true, EpubReader will check if an XML file contains a declaration (<?xml version="..." encoding="UTF-8"?>) in which case the reader will skip it before passing the file to the underlying .NET XDocument class.