Module sndjvu_format::parsing
source · Expand description
Low-level parser for the DjVu transfer format.
“Low-level” means that only the grittiest details of the file format are abstracted away, and that flexibility is prioritized over convenience in the design of the interface. Callers only need to load a relatively small amount of data into memory to begin parsing, and can choose exactly which parts of the document they want to parse.
Entry points are the document and indirect_component free functions; other key items
are Progress and the pointer-like types ComponentP, ElementP.
Incremental parsing
A DjVu document in the transfer format is a sequence of bytes. A major design goal of this module is that you can parse parts of a document without having all the bytes in memory at any one time. The parser keeps track of a correspondence between positions in an abstract byte stream and features of the document structure, and the caller provides a chunk of bytes from a specific position when it wants to parse the corresponding feature. The functions that work this way are:
The return type of each of these looks like Result<Progress<T, N>, Error>, which is how we
represent this set of possible outcomes:
- some part of the provided data was invalid (
Result::Err) - not enough bytes were provided to parse the requested document feature
(
Result::Ok(Progress::None)) - parsing succeeded (
Result::Ok(Progress::Advanced)) - there is no feature to parse at this position (
Result::Ok(Progress::End))—this is a possible outcome only forComponentP::feedandElementP::feed
Structs
- A single bookmark record from the (decompressed)
NAVMchunk. - Data from the
DIRMchunk that’s only present in bundled documents. - Iterator over “pointers” to each component of a bundled multi-page document.
- A chunk from a
DJVUorDJVIcomponent whose type wasn’t recognized. - Basic metadata about one component of a multi-page document.
- Pointer-like immutable cursor to the start or end of a component.
- Contents of an
ANTzchunk after BZZ decompression. - Contents of the
NAVMchunk after BZZ decompression. - Parsed representation of the
DIRMchunk. - Compressed portion of the
DIRMchunk. - Pointer-like immutable cursor to the start or end of an element.
- An error encountered while parsing.
- Parsed representation of an
FGbzchunk. - Compressed portion of an
FGbzchunk. - Parsed representation of an
INFOchunk. - Parsed representation of an
FG44,BG44, orTH44chunk. - Raw MMR-compressed data from an
Smmrchunk. - “Striped” MMR-compressed data from an
Smmrchunk. - Fallible iterator for parsing the annotations from a string.
- Fallible iterator for parsing the bookmarks from a decompressed
NAVMchunk. - Fallible iterator for parsing the stripe data from an
Smmrchunk. - Unparsed representation of an
ANTachunk. - Unparsed representation of an
ANTzchunk. - Unparsed representation of a
BG44chunk. - Represents a
BGjpchunk, which doesn’t need parsing. - Unparsed representation of the
DIRMchunk. - Represents a
Djbzchunk, which doesn’t need further parsing. - Unparsed representation of an
FG44chunk. - Unparsed representation of an
FGbzchunk. - Represents an
FGjpchunk, which doesn’t need parsing. - Represents an
INCLchunk, which doesn’t need parsing. - Unparsed representation of an
INFOchunk. - Unparsed representation of the
NAVMchunk. - Represents an
Sjbzchunk, which doesn’t need further parsing. - Unparsed representation of an
Smmrchunk. - Unparsed representation of an
Smmrchunk. - Unparsed representation of an
TXTachunk. - Unparsed representation of a
TXTzchunk. - Parsed representation of an
Smmrchunk. - Pointer-like immutable cursor to the start or end of a thumbnail.
- Parsed representation of a
TXTaor (decompressed)TXTzchunk.
Enums
- Parsed representation of the start of a component.
- Parsed representation of the start of a document.
- An unparsed chunk that represents a page element.
- Subtypes of the
FG44/BG44/TH44chunk types. - The outcome of a parsing operation, if no
Errorwas encountered. - Possible formats for the MMR data in an
Smmrchunk.
Functions
- Start parsing annotations from the string content of a
ANTaorANTzchunk. - Start parsing a DjVu document from some bytes.
- Start parsing a component of an indirect multi-page DjVu document from some bytes.