This is an unofficial transcription, with minor orthographic and stylistic changes, of the Celartem/Lizardtech DjVu Reference, version 3. Official information about DjVu is available at djvu.sourceforge.net.

DjVu Reference

Document Date:November 2005
From:Lizardtech, a Celartem Company
Status of Standard:Released

1 Introduction

Although the Internet has given us a worldwide infrastructure on which to build the universal library, much of the world’s knowledge, history, and literature is still trapped on paper in the basements of the world’s traditional libraries. Many libraries and content owners are in the process of digitizing their collections. While many such efforts involve the painstaking process of converting paper documents to computer-friendly form, such as SGML-based formats, the high cost of such conversions limits their extent. Scanning documents and distributing the resulting images electronically is not only considerably cheaper, but also more faithful to the original document because it preserves its visual aspect.

Despite the quickly-improving speed of network connections and computers, the number of scanned document images accessible on the Web today is relatively small. There are several reasons for this.

The first reason is the relatively high cost of scanning anything else but unbound sheets in black and white. This problem is slowly going away with the appearance of fast and low-cost color scanners with sheet feeders.

The second reason is that long-established image compression standards and file formats have proved inadequate for distributing scanned documents at high resolution, particularly color documents. Not only are the file sizes and download times impractical, the decoding and rendering times are also prohibitive. A typical magazine page scanned in color at 100 dpi in JPEG would typically occupy 100 KB to 200 KB, but the text would be hardly readable: insufficient for screen viewing and totally unacceptable for printing. The same page at 300 dpi would have sufficient quality for viewing and printing, but the file size would be 300 KB to 1000 KB at best, which is impractical for remote access. Another major problem is that a fully-decoded 300 dpi color image of a letter-size page occupies 24 MB of memory and easily causes disk swapping.

The third reason is that digital documents are more than just a collection of individual page images. Pages in a scanned document have a natural serial order. Special provision must be made to ensure that flipping pages be instantaneous and effortless so as to maintain a good user experience. Even more important, most existing document formats force users to download the entire document first before displaying a chosen page. However, users often want to jump to individual pages of the document without waiting for the entire document to download. Efficient browsing requires efficient random page access, fast sequential page flipping, and quick rendering. This can be achieved with a combination of advanced compression, pre-fetching, pre-decoding, caching, and progressive rendering. DjVu decomposes each page into multiple components (text, backgrounds, images, libraries of common shapes…) that may be shared by several pages and downloaded on demand. This allows a suitably designed DjVu-viewing application to handle on-demand downloading, pre-fetching, decoding, caching, and progressive rendering of the page images.

2 Document organization

This document describes the DjVu file format. It is written “from top down”, providing first a high-level understanding of the features and techniques used in DjVu (see §3), then a mid-level view at the IFF85 level (see §7), and finally a very detailed description of the underlying algorithms and byte-by-byte makeup of DjVu files (see §8).

3 Overview

This section describes the DjVu file format at a high level: how DjVu uses the Mixed Raster Content model, how images are composed into documents, and the non-raster data that such documents can also contain.

3.1 DjVu images

The principal imaging model used in DjVu is the “Mixed Raster Content” (MRC) model described in ITU-T Recommendation T.44, ISO/IEC 16485. In this model, an image is decomposed into foreground and background layers. To select whether a particular pixel comes from the foreground or background, a bitonal “selection” or “mask” layer is provided. These three layers are compressed separately using techniques which are optimized for each type of data.

The foreground and background layers are compressed using a wavelet-based continuous-tone image compression technique known as “IW44”.

The mask layer is compressed using a bitonal image compression technique that takes advantage of repetitions of nearly identical shapes on the page (such as characters) to efficiently compress text images.

A DjVu image need not contain all three layers and alternative compression techniques are available for each layer.

3.2 DjVu documents

DjVu documents can be single- or multi-page. Each page consists of a DjVu image as described above (photo, bitonal, or an MRC-based composition). Such a page, by itself, is a valid DjVu document. Multi-page documents can take either of two forms: bundled or indirect.

3.2.1 Bundled multi-page documents

A bundled multi-page DjVu document uses a single file to represent the entire document. This single file contains all the pages as well as ancillary information (e.g. the page directory, data shared by several pages, thumbnails, etc.). Using a single-file format is very convenient for storing documents or for sending email attachments.

3.2.2 Indirect multi-page documents

There are problems inherent to storing multiple pages in a single file. A viewer may not be able to utilize a byte-serving mechanism such as that available in HTTP/1.1. Therefore any request for any page of such a file will necessarily result in the entire document being transmitted. Furthermore, a reasonable work pattern is to read the first few pages (perhaps a table of contents) and then navigate to a page much further into the document. However, in such a file, data for page 100 can not be viewed until after data for pages 1–99 have been downloaded.

Indirect multi-page documents address these problems. Such a document is composed of several files. The main file is named the index file. You can view document using the URL of the index file, just like you do with a bundled multi-page document. However, the index file is very small. It simply contains the document directory and the URLs of secondary files containing the page data. When you view an indirect multi-page document, the viewer only needs to download the files corresponding to the pages you are viewing.

3.3 Non-raster data

3.3.1 Annotations

Every DjVu image optionally includes several different kinds of annotations. These annotations are often used to define hyperlinks to other document pages or to arbitrary Web pages. They can also be used for other purposes such as setting the initial viewing mode of a page and defining highlighted zones.

3.3.2 Hidden text

Every DjVu image optionally includes a hidden text layer that associates graphical features with the corresponding text. The hidden text layer is usually generated by running optical character recognition software. This textual information provides for indexing DjVu documents and copying/pasting text from DjVu page images.

3.3.3 Thumbnails

DjVu documents sometimes contain pre-computed page thumbnails. These allow a viewer to display a graphical representation of many pages by downloading a very small “thumbnail” file instead of the actual pages themselves.

4 What’s new in the DjVu file format

Since the last update to the file format documentation, Reference 1, the file format has been extended to include:

Multi-page formats
DjVu documents can span more than one page. There are two multi-page formats available: bundled (single file) and indirect (separate file for each page). See §3.2 and §7.2.
Annotations
Both initial viewing parameters (background color, initial zoom) and overlayed annotations (hyperlinks, text boxes) can be specified either at the document level (“shared”) or at the page level. See §8.3.4.
Hidden text
Text and the associated layout information can be stored with each image. This allows documents to be searched and indexed. See §8.3.5.
Document outline
A hierarchical outline can be specified at the document level. This allows the document to present an integrated outline for overview and navigation. See §8.3.2.
Colorized JB2
A palettized extension is provided for the bitonal encoder. See §8.3.10.

5 Acknowledgements

This work is significantly based on Reference 1 and the summary of file format changes described in the DjVuLibre project maintained by Leon Bottou and others.

6 References

6.1 DjVu 2

The DjVu file format specification that was originally released by AT&T in 1999. http://djvuzone.org/djvu/djvu/djvuspec/001.djvu

6.2 IFF

EA IFF 85 format, Electronic Arts’ public-domain IFF standard for an Interchange File Format, released in January, 1985. http://www.dcs.ed.ac.uk/home/mxr/gfx/2d/IFF.txt

6.3 JPEG

JPEG File Interchange Format, Version 1.01 (ISO DIS 10918-1, JPEG JFIF). The specification is located at http://www.w3.org/Graphics/JPEG/jfif.txt.

6.4 TIFF

http://partners.adobe.com/public/developer/en/tiff/TIFF6.pdf

6.5 G4

ITU-T (CCITT) T.6. Facsimile Coding Schemes and Coding Control Functions for Group 4 Facsimile Apparatus. https://www.itu.int/rec/T-REC-T.6-198811-I/en

6.6 UTF-8

All text in DjVu files is Unicode-encoded using the UTF-8 encoding. http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf

6.7 DjVuLibre

An open-source reference implementation of this file format specification is available at http://sourceforge.net/projects/djvu/. Throughout this specification, there are numerous references to source files in this implementation.

7 Component pieces (IFF chunks) of DjVu documents and images

This section describes the DjVu file format at a middle level. This includes types of chunks which can go into various types of documents but not a detailed layout of the contents of those chunks.

DjVu documents are IFF85 files (see Reference 2 for details). The IFF85 structure provides a hierarchy of containers which hold various types of information in a DjVu file. The containers are called “chunks”. How the chunk is used (what it holds) can be determined by its “chunk type” or “chunk ID”. For example, the list of files contained in a multi-page document is held in the DIRM (“directory”) chunk; annotations are held in a ANTz chunk.

FORM chunks are composite (contain other chunks). Their specific use is exposed by a secondary chunk ID. For example, a single page consists of several different chunks all contained within a single FORM:DJVU chunk. A multi-page document consists of several pages (and other chunks) all contained in a FORM:DJVM chunk.

This section discusses the various kinds of DjVu documents and the corresponding chunks of which they consist.

7.1 Single-page documents

A single-page document is composed of a single FORM:DVJU composite chunk. This composite chunk always begins with one INFO chunk describing the image size, resolution and related information (see §8.3.11). The document contains exactly one DjVu image, whose content varies as described below.

7.1.1 Photo DjVu image

Photo DjVu image files are best used for encoding photographic images in colors or in shades of gray. The data compression model relies on the IW44 wavelet representation. This format is designed such that the IW44 decoder is able to quickly perform progressive rendering of any image segment using only a small amount of memory. One or more additional BG44 chunks contain the image data encoded with the IW44 representation. The image size specified in the INFO chunk and the image size specified in the IW44 data must be equal.

7.1.2 Bi-level DjVu image

Bi-level DjVu image files are used to compress black and white images representing text and simple drawings. The JB2 data compression model uses the soft pattern-matching technique, which essentially consists of encoding each character by describing how it differs from a well-chosen already-encoded character. A Sjbz chunk contains the bi-level data encoded with the JB2 representation (see Appendix 2). The image size specified in the INFO chunk and the image size specified in the JB2 data must be equal.

7.1.3 Compound DjVu image

Compound DjVu image files are an extremely efficient way to compress high-resolution compound document images containing both images and text, such as a page of a magazine. Compound DjVu files represent the document image using two layers. The background layer is used for encoding the pictures and the paper texture.

The foreground layer is used for encoding the text and the drawings. Additional chunks hold the components of either the foreground or the background layers.

The main component of the foreground layer is a bi-level image named the foreground mask. The pixel size of the foreground mask is equal to the size of the DjVu image. It contains a black-on-white representation of the text and the drawings. This image is encoded by a Sjbz chunk using the JB2 representation. There may also be a companion chunk Djbz containing a shape dictionary that defines bi-level shapes referenced by the Sjbz chunk.

7.1.3.1 Foreground encoding

The foreground colors can be encoded according to two models:

  • Using a small color image, the foreground color image, encoded as a single FG44 chunk using the IW44 representation (see IW44Image.h).
  • By specifying one solid color per object described by the JB2-encoded mask. These JB2 colors are color-quantized and stored in a single FGbz chunk (see §6.3.10). Such compound DjVu images are rendered by painting each foreground object on top of the background color image using the solid color specified by the FGbz chunk.
7.1.3.2 Background encoding

The background layer is a color image, the background color image, encoded by an arbitrary number of BG44 chunks containing successive IW44 refinements (see Appendix 1). The size of this image is computed by rounding up the quotient of the mask size by an integer sub-sampling factor ranging from 1 to 12. Most compound DjVu images use a background sub-sampling factor equal to 3. Smaller sub-sampling factors are adequate for images with a very rich paper texture. Larger sub-sampling factors are adequate for images containing no pictures.

There are no ordering or interleaving constraints on these chunks except that (a) the INFO chunk must appear first, and (b) the successive BG44 refinements must appear with their natural order. The chunk order simply affects the progressive rendering of DjVu images in a web browser.

7.1.3.3 Alternative encodings

Besides the JB2 and IW44 encoding schemes, the DjVu format supports alternative encdoding methods for its components.

The foreground mask may be represented by a single Smmr chunk instead of Sjbz. The Smmr chunk contains a bi-level image encoded with the Fax-G4/MMR method. Although the resulting files are typically six times larger, this capability can be useful when DjVu is used as a front-end for fax machines and scanners with embedded Fax-G4/MMR capabilities.

The background color image may be represented by a single BGjp chunk instead of several BG44 chunks. The BGjp chunk contains a JPEG-encoded color image (see JPEGDecoder.cpp). The resulting files are significantly larger and lack the progressivity of the usual DjVu files. This is useful because some scanners have embedded JPEG capabilities.

The foreground color image may be represented by a single FGjp chunk instead of a single FG44 chunk. This is useful because some scanners have embedded JPEG capabilities.

7.1.3.4 Annotations and textual information

All types of DjVu images may contain annotation chunks. Annotation chunks are used to describe hyperlinks, to specify more viewer settings (page background, initial zoom, etc.), and to hold metadata information. Annotations are contained in ANTa or ANTz chunks.

All types of DjVu image files may also contain a computer-readable description of the text appearing on the page. This information is contained in either a TXTa chunk or a TXTz chunk.

7.2 Multi-page documents

A multi-page document is composed of a FORM:DJVM whose first chunk is a DIRM chunk containing the document directory. This directory lists all component files composing the given document, helps to access every component file and identifies the pages of the document.

In a bundled multi-page file, the component files are stored immediately after the DIRM chunk, within the FORM:DJVM composite chunk.

In an indirect multi-page file, the component files are stored in different files whose URLs are composed using information stored in the DIRM chunk.

7.2.1 Component files

A multi-page DjVu document necessarily references other FORM (composite) chunks. Specifically:

  • Each page is a single-page document (FORM:DJVU chunk).
  • Embedded thumbnails (if any) are contained in one or more FORM:THUM chunks.
  • Shared annotations (if any) and shape dictionaries (if any) are contained in one or more FORM:DJVI chunks.

Each of these composite chunks (FORM:DJVU, FORM:THUM, FORM:DJVI) is a well-formed IFF byte stream in its own right and can be held in a separate disk file. In the context of a multi-page — either bundled or indirect — document, we refer to these composite chunks as component files.

7.2.2 Including shared information

In many cases, efficiencies can be achieved by sharing JB2 shape definitions and/or annotations across pages. To facilitate this, any DjVu image file contained in a multi-page file may contain an INCL chunk containing the ID of a shared component file. The decoder processes the chunks contained in the shared component file as if the DjVu image file contained them. All relevant pages include this shared component file. Although they appear in several pages, these shared shapes are encoded only once in the document.

A shared component file is composed of a single FORM:DJVI potentially containing any information otherwise allowed in a DjVu image file (except for the INFO chunk of course).

8 Low-level chunk structure and definition

This section describes the DjVu file format at a low level. This includes the binary layout of the IFF85 wrapper and, of course, the layout of each contained chunk.

8.1 Header

The first four bytes of a DjVu file are 0x41 0x54 0x26 0x54. This preamble is not part of the EA IFF85 format, but it is required in order to identify DjVu files.

8.2 DjVu file structure

8.2.1 IFF wrapper

An IFF file consists of a number of chunks. Each chunk is laid out in three fields:

BYTE[4] Chunk ID. Describes the use of the chunk. The strings that identify the types of chunks used in DjVu are listed below.
BE32 The length of the data.
BYTE[] The contained data.

A chunk whose type is not recognized by the application is to be ignored. In the IFF format, chunks may be nested: a chunk may contain other chunks as part of its data. In the DjVu format, there is only one chunk at the outermost nesting level, a FORM chunk. All other chunks appear within the FORM chunk, sequentially, with no nesting.[1]

Example:

00000000 41 54 26 54 "AT&T"; magic described in §8.1.
00000004 46 4f 52 4d "FORM"; chunk ID = FORM
00000008 00 00 68 a6 0xa668 = 26970; length of this FORM chunk
0000000b 44 4a 56 55 "DJVU"; first four bytes of contained data. Since this is a FORM chunk, this starts with the sub-identifier. This is a FORM:DJVU chunk, a single-page document.

8.2.2 Chunk summary

The chunks used in the DjVu file format are summarized in Table 1.

Table 1: chunk summary
Chunk ID Usage
FORM The composite chunk. The first four bytes of the FORM chunk are a secondary identifier. Such chunks are referred to as FORM:XXXX where XXXX stands for the secondary identifier.
FORM:DJVM A multi-page DjVu document. Composite chunk that contains the DIRM chunk, possible shared/included chunks and subsequent FORM:DJVU chunks which make up a multi-page document.
FORM:DJVU A DjVu page/single-page DjVu document. Composite chunk that contains the chunks that make up a page in a DjVu document.
FORM:DJVI A “shared” DjVu file which is included via the INCL chunk. Shared annotations, shared shape dictionary.
FORM:THUM Composite chunk that contains the TH44 chunks that are the embedded thumbnails.
DIRM Page name information for multi-page documents.
NAVM Bookmark information.
ANTa, ANTz Annotations, including both initial view settings and overlaid hyperlinks, text boxes, etc.
TXTa, TXTz Unicode text and layout information.
Djbz Shared shape table.
Sjbz BZZ-compressed JB2 bitonal data used to store mask.
FG44 IW44 data used to store foreground.
BG44 IW44 data used to store background.
TH44 IW44 data used to store embedded thumbnail images.
WMRM JB2 data required to remove a watermark.[1]
FGbz Color JB2 data. Provides a color for each [blit or shape?] in the corresponding Sjbz chunk.
INFO Information about a DjVu page.
INCL The ID of an included FORM:DJVI chunk.
BGjp JPEG-encoded background.
FGjp JPEG-encoded foreground.
Smmr G4-encoded mask.

8.3 IFF chunk types

8.3.1 Container chunk: FORM

The FORM chunk is used as a chunk container. The first four bytes of the FORM chunk are a secondary ID used to identify the contained chunks.

8.3.1.1 FORM:DJVM

As discussed in §7.2, a multi-page DjVu document is composed of a single (composite) FORM:DJVM chunk. The first nested chunk is always a DIRM chunk containing the document directory (see DjVmDir.h), which represents the list of the component files that make up the document. An optional NAVM chunk, which describes the outline of the document, may follow the DIRM chunk.

Example:

FORM:DJVM [126475 bytes]
  DIRM [59 bytes] Document directory (bundled, 3 files, 2 pages)
  FORM:DJVI [3493 bytes] dict0002.iff
  FORM:DJVU [115016 bytes] p0001.djvu
  FORM:DJVU [7869 bytes] p0002.djvu
8.3.1.2 FORM:DJVU

As discussed in §7.1, a single page in a DjVu document is contained in a single (composite) FORM:DJVU chunk. The nested first chunk must be the INFO chunk. The chunks after the INFO chunk may occur in any order, although the order of the BG44 chunks, if there is more than one, is significant.

Example:

FORM:DJVU [26790 bytes]
  INFO [10 bytes] 2202 × 967, version 26, 300 dpi, gamma = 2.2
  Sjbz [13133 bytes] JB2 bi-level data
  FG44 [185 bytes] IW44 data #1, 76 slices, version 1.2 (color), 184 × 81
  BG44 [935 bytes] IW44 data #1, 74 slices, version 1.2 (color), 734 × 323
  BG44 [1672 bytes] IW44 data #2, 10 slices
  BG44 [815 bytes] IW44 data #3, 4 slices
  BG44 [9976 bytes] IW44 data #4, 9 slices
8.3.1.3 FORM:DJVI

Multi-page DjVu documents can share information between pages by nesting a chunk inside a FORM:DJVI chunk (which is itself held inside the FORM:DJVM chunk) and referencing the contained chunk from within a page. Individual pages reference the shared chunks via the INCL chunk.

Example:

FORM:DJVM [126475 bytes]
  DIRM [59 bytes] Document directory (bundled, 3 files, 2 pages)
  FORM:DJVI [3493 bytes] dict0002.iff
    Djbz [3481 bytes] JB2 shared dictionary
  FORM:DJVU [115016 bytes] p0001.djvu
    INFO [10 bytes] 2539 × 3295, version 25, 300 dpi, gamma = 2.2
    INCL [12 bytes] Indirection chunk → dict0002.iff
    Sjbz [70497 bytes] JB2 bi-level data
8.3.1.4 FORM:THUM

Pre-rendered thumbnails may be included. This allows very large documents to render thumbnails of pages without downloading and decoding them. FORM:THUM chunks contain several TH44 chunks. Each of these chunks contains the thumbnails of the pages that follow.

Example:

FORM:DJVM [2272012 bytes]
  DIRM [108 bytes] Document directory (bundled, 7 files, 4 pages)
  FORM:THUM [5960 bytes] p0001.thumb
    TH44 [5984 bytes] Thumbnail icon for page 1
  FORM:DJVU [1413380 bytes] p0001.djvu
    INFO [10 bytes] 4728 × 6300, version 25, 600 dpi, gamma = 2.2
    …
  FORM:THUM [12148 bytes] p0004.thumb
    TH44 [3418 bytes] Thumbnail icon for page 2
    TH44 [4150 bytes] Thumbnail icon for page 3
    TH44 [4552 bytes] Thumbnail icon for page 4
  FORM:DJVU [777858 bytes] p0002.djvu

8.3.2 Directory chunk: DIRM

As described in §7.2, a multi-page document will contain “component files” such as individual pages (FORM:DJVU) or shared annotations (FORM:DJVI).

The first contained chunk in a FORM:DJVM composite chunk is the DIRM chunk containing the document directory. It contains information the decoder will need to access the component files (see §7.2).

8.3.2.1 Unencoded data

The first part of the DIRM chunk is unencoded:

BYTE

Bit 7 (the MSB) is the bundled flag: 1 for bundled, 0 for indirect.

Bits 6 through 0 are the version, currently 1.

BE16 Number of component files.
BE32[]

When the document is a bundled document (i.e. the flag bundled is set), the header above is followed by the offsets of each of the component files within the FORM:DJVM. These offsets allow for random component file access.

When the document is indirect, these offsets are omitted.

8.3.2.2 BZZ-encoded data

The rest of the chunk is entirely compressed with the BZZ general-purpose compressor. We describe now the data fed into (or retrieved from) the BZZ codec (see BSByteStream.cpp and Appendix 4).

BE24[] Size of each component file. May be 0 for indirect documents.
BYTE[]

Flag byte for each component file.

  • 0bhasname⟩⟨hastitle000000 for a file included by other files
  • 0bhasname⟩⟨hastitle000001 for a file representing a page
  • 0bhasname⟩⟨hastitle000010 for a file containing thumbnails

Flag hasname is set when the name of the file is different from the file ID. Flag hastitle is set when the title of the file is different from the file ID. These flags are used to avoid encoding the same string three times.

Note: in practice, the hasname and hastitle bits are poorly tested and not used.

ZSTR[] There are one to three zero-terminated strings per component file. The first string contains the ID of the component file. If hasname is set, then there is a second string which contains the name of the component file (in the case of an indirect file, this is the disk filename). If hastitle is set, then there is a third string which contains the name of the component (for display — for example, alternate page numberings in the Foreword or Preface).

Examples:

81
00 03
00 00 00 54
00 00 0e 02
00 01 cf 52

00 0d ad
01 c1 50
00 1e c5
00
01
01
64 69 63 74 30 30 30 32 2e 69 66 66 00
70 30 30 30 31 2e 64 6a 76 75 00
70 30 30 30 32 2e 64 6a 76 75 00
01
00 03

00 0d ad
01 c1 50
00 1e c5
00
01
01
64 69 63 74 30 30 30 32 2e 69 66 66 00
70 30 30 30 31 2e 64 6a 76 75 00
70 30 30 30 32 2e 64 6a 76 75 00

8.3.3 Document outline chunk: NAVM

The NAVM chunk contains bookmarks which describe an outline of the document. The intent is to allow content authors to create an electronic table of contents which gives users rapid access to various parts of the document.

This chunk is optional, but, if present, must immediately follow the DIRM chunk.

The entire chunk is BZZ-encoded and starts with a single field specifying the total number of bookmark records:

BE16 The total number of bookmarks in the document.

And then the individual bookmark records, nested as necessary:

BYTE The number of immediate child bookmark records.
BE24 Size of the description text.
STR The description text.
BE24 Size of the URL text.
STR The URL text. This may (and typically does) use the syntax described for the URLs in the ANTa chunk (and similarly, is not URL-encoded).

Example (as passed to the BZZ codec):

Consider a small document outline as follows:

There is no hyperlink associated with the single root entry "Table of Contents". At a binary level, the chunk looks like this:

8.3.4 Annotation chunk: ANTa, ANTz

Annotations are contained in ANTa or ANTz chunks. The ANTa chunks contain the annotation in plain text. The ANTz chunks contain the same information compressed with the BZZ encoder (see BSByteStream.h).

The use of the ANTa chunk is discouraged.

Pages can share annotations using an INCL chunk as explained in §7.2.2. The complete annotation text is obtained by concatenating all annotation chunks present in the page. A restriction of the current reference library implementation limits the number of shared annotation files to one.[1]

The syntax of the annotation text uses a simple parenthesized notation. All text is standard UTF-8.

8.3.4.1 Initial document view
8.3.4.1.1 Background color
8.3.4.1.2 Initial zoom
8.3.4.1.3 Initial display level
8.3.4.1.4 Alignment
8.3.4.2 Maparea (overprinted annotations)
8.3.4.2.1 url
8.3.4.2.2 comment
8.3.4.2.3 area

9 DjVu in the raw (binary and IFF-level dumps)

10 Appendix 1: IW44 coding

11 Appendix 2: JB2 coding

11.1 General considerations

Selection layer coding is used in compound DjVu images. In such images, there are three layers. The foreground layer is coded in one FG44 chunk, and is rendered as described in Appendix 1. The background layer is coded in one or more BG44 chunks, and is rendered as described in Appendix 1. The selection layer is coded using one Sjbz chunk. Black pixels in the selection layer specify those pixels that are to be rendered using the foreground color. All other pixels are to be rendered using the background color.

Black-and-white coding is used in bi-level DjVu images. In such images, there are three layers. The foreground layer is black. The background layer is white. The selection layer is coded using one Sjbz chunk. The selection layer specifies those pixels that are to be rendered in black. All other pixels are to be rendered in white.

An Sjbz chunk contains a single arithmetically-coded data stream, coded using the Z′-Coder (Appendix 3). All data, including headers and record types, is coded in this arithmetically-coded stream.

11.2 Arithmetic coding

The arithmetically-coded data in an Sjbz chunk consists logically of records. The record types are listed in Table 6, and described in §11.4. The records consist of fields. The fields present for records of each record type are listed in Table 6. The fields within a record are coded in the order listed in Table 6 for records of that type. Details of the coding for each field appear in §11.5.

A field may contain one or more data elements. The data elements consist of flags, pixel colors, and integers. Because of the nature of arithmetic coding, the records, fields, and data elements are not of fixed sizes, and do not necessarily begin on bit boundaries within the data stream.

Flags are binary decisions, each coded using the Z′-Coder with a particular context. There are two different contexts for flags, the eventual image refinement context and the offset type context.

Pixel colors are binary decisions, coded using the Z′-Coder with a particular context. For pixel colors, there are 3072 different contexts. There are 1024 contexts used for direct coding of bitmaps; these correspond to the 2 10 = 1024 different combinations of values that the pixels in the direct coding template can assume. There are 2048 contexts used for refinement coding of bitmaps; these correspond to the 2 11 = 2048 different combinations of values that the pixels in the refinement coding template can assume.

Integers are coded using the multivalue extension to the Z′-Coder, described below. There are 15 contexts for coding multivalued integers, as described in Table 7.

11.2.1 Initialization of the Z′-Coder

All Z′-Coder contexts are initialized to the value 0. This applies both to contexts used to encode single-bit values, including pixel colors, and to contexts that are part of an integer context used by the multivalue extension to the Z′-Coder.

11.2.2 The multivalue extension to the Z′-Coder for coding of numeric data

Quantities that can take on multiple values are coded as integers using the multivalue extension to the Z′-Coder. This extension of the Z′-Coder allows all data in the bit stream to be coded using the same coder, the Z′-Coder. There are 15 integer contexts, specified in Table 7. A single integer context includes a number of binary contexts.

One integer context consists of a binary decision tree. See Figure 1 for an example of part of such a tree. The root node of the tree corresponds to the decision about the sign of the number n being decoded. Each of the two sub-trees under the root corresponds to a set of decisions that eventually identify a range in which n lies. The sub-trees under the nodes corresponding to identified ranges are complete binary trees that identify the exact value of n.

Each node of the binary decision tree for an integer context maintains its own binary probability estimation context for the Z′-Coder. The trees for different integer contexts are completely independent. Thus each node of a tree contains probability information conditioned on a conditioning context. The conditioning context consists of both the type of value being coded (i.e., the selection of the integer context), and of the values of the decisions coded so far when encoding the current integer.

11.2.3 Record types

Table 6: record types and fields coded for each record type
Record type coded value Record type Fields coded
0 Start of image
  1. Record type
  2. Image size
  3. Eventual image refinement flag
1 New symbol, add to image and library
  1. Record type
  2. Absolute symbol size
  3. Bitmap by direct coding
  4. Location relative to a previous symbol
2 New symbol, add to library only
  1. Record type
  2. Absolute symbol size
  3. Bitmap by direct coding
3 New symbol, add to image only
  1. Record type
  2. Absolute symbol size
  3. Bitmap by direct coding
  4. Location relative to a previous symbol
4 Matched symbol with refinement, add to image and library
  1. Record type
  2. Index of matching symbol in bitmap library
  3. Relative symbol size
  4. Bitmap by refinement coding
  5. Location relative to a previous symbol
5 Matched symbol with refinement, add to library only
  1. Record type
  2. Index of matching symbol in bitmap library
  3. Relative symbol size
  4. Bitmap by refinement coding
6 Matched symbol with refinement, add to image only
  1. Record type
  2. Index of matching symbol in bitmap library
  3. Relative symbol size
  4. Bitmap by refinement coding
  5. Location relative to a previous symbol
7 Matched symbol, copy to image without refinement
  1. Record type
  2. Index of matching symbol in bitmap library
  3. Location relative to a previous symbol
8 Non-symbol data
  1. Record type
  2. Absolute symbol size
  3. Bitmap by direct coding
  4. Absolute location
9 Shared dictionary or reset
  1. Record type
  2. Shared dictionary size
10 Comment
  1. Record type
  2. Comment length
  3. Comment data
11 End of data
  1. Record type

11.2.4 Fields and contexts

Table 7: multivalued integer contexts for arithmetic coding
Context name Integer data coded using this context
Record type Record type
Image size Image height; image width
Matching symbol index Index within the symbol library of the symbol matching the current symbol
Symbol width Width of the current symbol in pixels
Symbol height Height of the current symbol in pixels
Symbol width difference Number of pixels that must be added to the width of the matching symbol to obtain the width of the current symbol
Symbol height difference Number of pixels that must be added to the height of the matching symbol to obtain the height of the current symbol
Symbol column number Column number of the absolute location of the left edge of the current symbol (leftmost column of the image is column number 1)
Symbol row number Row number of the absolute location of the top edge of the current symbol (bottom row of the image is row number 1)
Same line column offset Number of pixels that must be added to the column number of the right edge of the previous symbol on the current text line to obtain the column number of the left edge of the current symbol
Same line row offset Number of pixels that must be added to the row number of the current baseline on the current text line to obtain the row number of the bottom edge of the current symbol
New line column offset Number of pixels that must be added to the column number of the left edge of the first symbol on the current text line to obtain the column number of the left edge of the current symbol
New line row offset Number of pixels that must be added to the row number of the bottom edge of the first symbol on the current text line to obtain the row number of the top edge of the current symbol
Comment length The number of octets in the current comment
Comment octet One octet in the current comment
Dictionary size Number of shapes in the shared dictionary

11.2.5 Coding phases

Figure 1: Part of the coding tree for multivalue arithmetic coding. Each internal node represents one context with its own probability information, to be used by the Z′-Coder. The square node at the root of the tree represents the Phase 1 decision, whether the integer n being coded is negative. The filled circles are the Phase 2 nodes, moving down the tree in ever-increasing ranges. The open circles represent Phase 3 decisions, traversing a complete binary sub-tree to reach the specific value of n. A decoded value of 0 indicates a left branch in this tree. A decoded value of 1 indicates a right branch.

This method allows high compression efficiency by allowing the coder to adapt to the statistics of the data. In effect, the binary probability information stored collectively in the nodes of the decision tree closely approximates the probability distribution of the underlying multivalued integer.

The allowable range of values for n is always specified. The smallest value that n could possibly take is denoted by l. The largest value that n could possibly take is denoted by h. When l and h are equal, n is equal to both of them, and no Z′-Coder decoding is performed.

The decoder maintains a non-negative intermediate value v, defined as follows: v = { | n | if n 0 | n | 1 if n < 0 . At the end of the process of decoding an integer, v is converted to n, the value of the decoded integer.

The value of an integer is coded by making a sequence of binary decisions, each one narrowing the set of values that the integer can possibly take. The decisions are based on traversing a binary decision tree to one of its leaves. Note: although the tree conceptually has a large number of nodes, it is possible in an implementation to allocate memory only for those nodes actually traversed. Decoding proceeds in four phases.

11.2.5.1 Phase 1

Phase 1 determines the sign of n. A value of 0 returned by the Z′-Coder means that n < 0 . A value of 1 returned by the Z′-Coder means that n 0 .

11.2.5.2 Phase 2

Phase 2 determines a range of possible values for v. The Z′-Coder is invoked repeatedly to answer the question “Is the value of v in the range being tested?” The sequence of ranges tested is given in Table 8. A value of 0 returned by the Z′-Coder means that v is not in the specified range, and the next range in the sequence must be tested. A value of 1 returned by the Z′-Coder means that v is in the specified range, and decoding is to proceed to Phase 3.

Table 8: sequence of ranges in which v may fall
     0
     1–2
     3–6
     7–14
    15–30
    31–62
    63–126
   127–254
   255–510
   511–1022
  1023–2046
  2047–4094
  4095–8190
  8191–16382
 16383–32766
 32767–65534
 65535–131070
131071–262142
11.2.5.3 Phase 3

Phase 3 consists of determining the exact value of v within the range determined in Phase 2. If Phase 2 determined that v = 0 , then Phase 3 is skipped. Otherwise, since the size of the range is a power of 2, the corresponding sub-tree is a complete binary tree. The sequence of coding decisions is based directly on traversing the binary tree. At each node, 0 returned by the Z′-Coder means left branch (smaller values of v) and 1 means right branch (larger values of v. The bits returned by the Z′-Coder during Phase 3 are the bits of v, most significant bit first.

11.2.5.4 Phase 4

In Phase 4, the unsigned value v is converted to n, the signed value to be returned, as follows: n = { v if n is non-negative, as determined in Phase 1; v 1 if n is negative.

In any of the phases, if the input values of l and h (the range of allowable values) predetermine any decision, then the coding for that decision is not performed; the predetermined decision is assumed.

Each type of integer has its own set of binary contexts. Thus the probability information will reflect the underlying probability distribution of the particular type of integer. The Z′-Coder probability state indices of all the binary nodes are initialized to 0.

11.3 Image reconstruction

Records in an Sjbz chunk are interpreted in the order in which they appear. A "Start of data" record specifies the dimensions of the image. An "Image refinement data" record indicates […][1]. An "End of data" record indicates the end of the Sjbz chunk. A "Comment" record contains uninterpreted data.

A record identified by any other record type describes one bitmap. The model used in DjVu for the selection layer is based on symbol-based coding. Bitmaps are placed into the reconstructed image as follows: The image is initially entirely white. When a bitmap is placed into the image, the pixels that are black in the current symbol become black at the appropriate position in the reconstructed image. Once a pixel in the reconstructed image becomes black, it remains black.

Because symbols in the document images are often similar to each other, it is often possible to obtain more efficient coding by making use of previously coded symbols. As symbols are decoded, their bitmaps may be placed into a symbol bitmap library. There is exactly one symbol bitmap library. Once a symbol has been placed into the symbol bitmap library, later records may cause copies of the symbol to be placed into the image, or may define a new bitmap by refining the bitmap in the library.

Depending on the record type, the symbol bitmap may be described by direct coding, by refinement coding, or by a copy operation. In direct coding, all pixels of the bitmap are coded directly, without reference to any other bitmap. In refinement coding, all pixels of the bitmap are also coded directly, but a bitmap in the library is used to make the coding more efficient. In a copy operation, the pixels of the bitmap are the same as the pixels of a bitmap in the library.

Depending on the record type, the bitmap may or may not be placed into the image. If the bitmap is placed into the image, then depending on the record type, it may be placed either at an absolute location or at a location relative to a previously placed bitmap.

Depending on the record type, the bitmap may or may not be placed into the symbol bitmap library. The first symbol placed into the library has index 0. Subsequent symbols are assigned consecutive integer indices.

The pixels of the reconstructed image are arranged in a rectangular coordinate system. For the pixel in the lower left corner of the image, the column number is 1 and the row number is 1. All coordinates refer to the pixels themselves, not to the edges between pixels.

11.4 Records

Records in Sjbz chunks have the following interpretations.

11.4.1 Start of image

This record is the first record in an Sjbz chunk. It specifies the dimensions of the image.

11.4.2 New symbol, add to image and library

This record specifies the bitmap of a symbol that is coded directly and placed into the reconstructed image and into the symbol bitmap library.

11.4.3 New symbol, add to library only

This record specifies the bitmap of a symbol that is coded directly and placed into the symbol bitmap library but not into the image.

11.4.4 New symbol, add to image only

This record specifies the bitmap of a symbol that is coded directly and placed into the reconstructed image but not into the symbol bitmap library.

11.4.5 Matched symbol with refinement, add to image and library

This record specifies the bitmap of a symbol that is coded by refinement of a symbol in the symbol bitmap library and placed into the reconstructed image and into the symbol bitmap library.

11.4.6 Matched symbol with refinement, add to library only

This record specifies the bitmap of a symbol that is coded by refinement of a symbol in the symbol bitmap library and placed into the symbol bitmap library, but not into the reconstructed image.

11.4.7 Matched symbol with refinement, add to image only

This record specifies the bitmap of a symbol that is coded by refinement of a symbol in the symbol bitmap library and placed into the reconstructed image, but not into the symbol bitmap library.

11.4.8 Matched symbol, copy to image without refinement

This record specifies the location at which the bitmap of a symbol in the symbol bitmap library is to be placed into the reconstructed image.

11.4.9 Non-symbol data

This record specifies a direct-coded bitmap to be placed at an absolute location in the reconstructed image. A bitmap of non-symbol data is not placed into the symbol bitmap library.

11.4.10 Shared dictionary or reset

This record is overloaded and its meaning depends on its context. If the record occurs before a "Start of data" record, then it is a "Required dictionary" record. If the record occurs after a "Start of data" record then it is a "Reset" record.

11.4.10.1 Shared shape dictionaries

Starting with version 21, the JB2 format provides support for sharing symbol definitions between the pages of a document. To achieve this objective, the JB2 image data chunk must be able to address symbols defined elsewhere by a JB2 dictionary data chunk shared by all the pages of a document.

A "Shared dictionary or reset" record can appear before the "Start of data" record. The record type field is followed by a single number arithmetically encoded using the sixteenth dictionary size context. This record appears when the JB2 data chunk requires symbols encoded in a separate JB2 dictionary data chunk. The number (the dictionary size) indicates how many symbols should have been defined by the JB2 dictionary data chunk. The decoder should simply load these symbols into the symbol library and proceed as usual. New symbols potentially defined by the subsequent JB2 image data records will therefore be numbered with integers greater than or equal to the dictionary size.

11.4.10.2 Numcoder reset

The encoding of numbers potentially uses an unbounded number of binary coding contexts. These contexts are normally allocated when they are used for the first time (see ).

Starting with version 21, a "Shared dictionary or reset" record can appear after the "Start of data" record. The decoder should proceed with the next record after clearing all binary contexts used for coding numbers. This operation implies that all binary contexts previously allocated for coding numbers can be deallocated.

Starting with version 21, the JB2 encoder should insert a "Shared dictionary or reset" record whenever the number of these allocated binary contexts exceeds 20000. Only very large documents ever reach such a large number of allocated binary contexts (e.g. large maps). Hardware implementation however can benefit greatly from a hard bound on the total number of binary coding contexts. Old JB2 decoders will treat this record type as an "End of data" record and cleanly stop decoding (see ).

11.4.10.3 Record types in a shared dictionary

The shared JB2 dictionary data format is a pure subset of the JB2 image data format:

  • Required dictionary
  • Start of data
  • New symbol, add to library only
  • Matched symbol with refinement, add to library only
  • Numcoder reset
  • Comment
  • End of data

Note that each shared dictionary can itself include another shared dictionary.

The JB2 dictionary data is usually located in an Djbz chunk. Each page may directly contain a Djbz chunk, or may indirectly point to such a chunk using an INCL chunk (see ).

11.4.11 Comment

This record contains data whose interpretation is not specified by the standard.

11.4.12 End of data

This record is the last record of an Sjbz chunk.

11.5 Fields

The following fields are coded in records of the types specified in Table 6 and in §11.4.

11.5.1 Record type

The record type is coded by the multivalue extension to the Z′-Coder using the record type context. The range of allowable record types is from 0 to 11. The coded values are specified in the first column of Table 6.

11.5.2 Image size

The width and height of the image are coded by the multivalue extension to the Z′-Coder using the image size context. The width is coded first, then the height. The range of allowable values is from 0 to 262142. The width and height of a compound DjVu image or bi-level DjVu image must be the same as the width and height of the image specified in the INFO chunk.

11.5.3 Eventual image refinement flag

The eventual image refinement flag is coded once, in the "Start of image" record, to notify the decoder whether image refinement data will eventually be provided. It is a binary value, coded by the Z′-Coder using the eventual image refinement context. The coded value 1 means TRUE and the coded value 0 means FALSE. Note: this flag is always FALSE in the current version of the standard, but it may be TRUE in later versions.

11.5.4 Index of matching symbol in bitmap library

The index of a matching symbol in the bitmap library is coded with the multivalue extension to the Z′-Coder, using the matching symbol index context. The range of allowable values is from 0 to one less than the number of symbols currently in the bitmap library.

11.5.5 Absolute symbol size

The width of a symbol is coded by the multivalue extension to the Z′-Coder using the symbol width context. Then the height of the symbol is coded by the multivalue extension to the Z′-Coder using the symbol height context. The range of allowable values for both of these data elements is from 0 to 262142.

11.5.6 Relative symbol size

The signed differences between the width and height of the current symbol and the width and heigh of the matching symbol are coded by the multivalue extension to the Z′-Coder using the symbol width different context and the symbol height difference context for the height. The width difference is coded first, then the height difference. The coded signed difference is added to the width or height of the matching symbol to obtain the width or height of the current symbol. The range of allowable values for both of these data elements is from -262143 to 262142.

11.5.7 Absolute location

The horizontal and vertical positions of the upper left corner of the bitmap are coded by the multivalue extension to the Z′-Coder using the symbol column number context for the horizontal position and the symbol row number context for the vertical position. The horizontal position is coded first, then the vertical position. The range of allowable values for the horizontal position is from 1 to the width of the image in pixels. The range of allowable values for the vertical position is from 1 to the height of the image in pixels.

11.5.8 Location relative to a previous symbol

The offset type flag is coded by the Z′-Coder using the offset type context. It indicates the reference symbol for coding the offset of the location of the current symbol. The coded value 1 means FIRST, which means that the location of the current symbol is being specified relative to the first symbol on the current text line. The value 0 means PREVIOUS, which means that the location of the current symbol is being specified relative to the most recently coded symbol on the current text line.

If the offset type flag is FIRST, then the reference symbol is the first symbol on the current text line. The horizontal offset is the signed difference between the left edge of the current symbol and the left edge of the reference symbol. It is coded with the multivalue extension to the Z′-Coder using the new line column offset context.

11.5.9 Bitmap by direct coding

Non-symbol bitmaps and symbol bitmaps with no sufficiently closely matching symbol in the symbol library are coded directly. A directly-coded bitmap is coded by repeated applications of the Z′-Coder to the pixels of the bitmap left to right across the rows, starting with the top row. When one row has been coded, the next lower row is coded. Each pixel is coded by the Z′-Coder using an appropriate context based on the values of 10 previously coded pixels. A coded value of 1 means the pixel is black. A coded value of 0 means the pixel is white.

The colors of the pixels numbered 1 through 10 in Figure 2, taken collectively, form a 10-bit value. Each of these values is an index into a table of 1024 different directly-coded bitmap contexts. The pixel labeled P in Figure 2 is coded using the context indexed by the collective values of the other 10 numbered pixels in the template. Pixels outside the bounding box are considered to be white.

Figure 2: template for direct coding

11.5.10 Bitmap by refinement coding

Some bitmaps are coded by making use of data from another bitmap; this process is called refinement coding. Matched symbols other than those to be copied are coded using refinement coding.

A bitmap coded by refinement coding is coded by repeated application of the Z′-Coder to the pixels of the bitmap, left to right across the rows. When one row has been coded, the next lower row is coded. Each pixel is coded by the Z′-Coder using an appropriate context based on the values of four previously coded pixels from the bitmap being coded and seven pixels from the matching bitmap. (The pixels numbered 1 through 4 in Figure 3 are from the current symbol; the pixels numbered 5 through 11 are from the matching symbol.) A coded value of 1 means the pixel is black. A coded value of 0 means the pixel is white. The colors of the pixels numbered 1 through 11 in Figure 3, taken collectively, form an 11-bit value. Each of these values is an index into a table of 2048 different refinement-coded bitmap contexts. The pixel labeled P in Figure 3 is coded using the context indexed by the collective values of the 11 numbered pixels in the template. Pixel 7 is in the position in the matching symbol that corresponds to the position of pixel P in the current symbol when the two symbols are aligned.

Alignment of the current bitmap and the matching bitmap proceeds as follows. For matched symbols, the current symbol and the matching symbol are aligned according to the geometric centers of their bounding rectangles. If the number of columns or rows is even, the geometric center falls between two columns or rows, respectively. In this case, the leftmost of the two central columns or the lowermost of the two central rows is considered to be the central column or row, respectively.

Figure 3: template for refinement coding. (a) Pixels from symbol being coded from matching symbol. (b) Pixels from matching symbol.

It is possible for the current symbol to have empty edge rows or columns. These empty rows and columns are coded, and are included in the bounding rectangle. For symbols added to the library, the symbol is added to the library after it has been placed into the image. Any empty edge rows and columns are removed before the symbol is added to the library.

11.5.11 Comment length

The comment length is the number of octets in the comment. It is coded by the multivalue extension to the Z′-Coder using the comment length context. The range of allowable values for the comment length is from 0 to 262142.

11.5.12 Comment data

Comment data consists of the individual octets of the comment. The number of octets in the comment is given by the comment length field. Each of the octets is coded using the multivalue extension to the Z′-Coder using the comment octet context. The range of allowable values for each octet is from 0 to 255.

12 Appendix 3: Z′ coding

The Z′-Coder is an approximate binary arithmetic coder. Decoding proceeds as follows.[1]

See also files ZPCodec.h and ZPCodec.cpp in DjVuLibre.

12.1 Registers and data storage

In Figure 1 and Figure 2, the values of variables A, C, D, and Z are stored in registers of at least 16 bits each. A and C retain their values between invocations of the Z′-Coder. Note: if register overflow can be ignored, storing variables A and C in registers of exactly 16 bits allows a simplification of lines 11, 12, 16, and 17 of Figure 1 and lines 8, 9, 12, and 13 of Figure 2.

At the beginning of a chunk, the values of A and C are reinitialized. When the decoder is decoding a chunk, it may require more bits than are present within the chunk’s data. In this case, all additional required bits are to be assumed by the decoder to be 1. If there are excess bits at the end of a chunk, they are ignored.

K is conceptually an array with a single 8-bit entry for each binary decision context. (In practice, K consists of a number of of individual values, arrays, and tree nodes, but each one has a specific address and a single 8-bit value at any time.) This array is indexed by the value of i, which is the input to the decoder. K [ i ] is the current value of the probability state index for context i. K [ i ] may be updated as part of the decoding process.

In pass-through mode, the decoder is invoked with no input argument. No context is involved.

B is the 1-bit value returned by the decoder.

The Z′-Coder is state-based. Decoding is governed by four fixed tables, given in Table 9. The tables are indexed by K [ i ] , the probability state index for the current context. All probability state indices are initialized to 0. That is, at the beginning of coding, for all i, K [ i ] = 0 . These values are not reinitialized at the beginning of chunks after the first.

The more probably symbol is denoted by MPS. The MPS is 1 if the probability state index is an odd integer, and 0 if the probability state index is an even integer. The less probable symbol is denoted by LPS. The LPS is 0 if the probability state index is an odd integer, and 1 if the probability state context is an even integer.

Δ k is the amount by which the current arithmetic coding interval is reduced if the decoded symbol is the MPS. θ k is the threshold above which an MPS triggers a probability state update. μ k is the next probability state index for context k after an MPS triggers a probability state index update. An LPS always triggers a probability state index update. λ k is the next probability state index for context k after an LPS.

12.2 Initialization

Initially, A is set to 0x0000. Two octets are read from the input data stream into the lowest 16 bits of C. If the bits of C are numbered such that bit 15 is the most significant bit and bit 0 is the least significant bit, then the first input octet is stored in bits 15 through 8, and the second input octet is stored in bits 7 through 0.

12.3 Decoding

Figure 1 shows the steps involved in decoding a single binary decision. The input to the decoder is the index i of the appropriate context for the binary decision being decoded. The output from the decoder is a single bit B.

Z := A + ΔK[i]
D := 0x6000 + (Z + A)/4
if (Z > D) { Z := D }
if (C > Z) {
    B := K[i] mod 2
    if (A ≥ θK[i]) { K[i] = μK[i] }
    A := Z
}
else {
    B := 1 - (K[i] mod 2)
    A := A + 0x10000 - Z
    C := C + 0x10000 - Z
    K[i] = λK[i]
}
while (A ≥ 0x8000) {
    A := A + A - 0x10000
    A := C + C - 0x10000 + next code bit
}
return B
Figure 1: decoder for the Z′-Coder

12.3.1 Notes on specific lines of Figure 1

Line 2
The division is a right shift, discarding the two least significant bits.
Lines 4–8
These lines are executed when the decoded bit is the MPS.
Line 5
This line determines the MPS from from the parity of the probability state index.
Line 6
Sometimes an MPS event triggers an update of the probability state index, based on the value of θ k . Note that when the probability state index k = 0 or k > 83 , θ k = 0 , so an MPS will trigger an update of the probability state index. All probability state indices are initialized to 0, but the first coded decision for a context causes the index to become at least 83. When k = 0 or k 83 , the probability estimate for the context is in its early estimation phase. When 0 < k < 83 , the probability estimate for the context is in its steady-state phase, which it never leaves.
Lines 9–14
These lines are executed when the decoded bit is the LPS.
Line 10
This line determines the LPS from the parity of the probability state index.
Line 13
An LPS always triggers an update of the probability state index.
Lines 15–18
When the values in the registers are too large, they must be renormalized.
Lines 16–17
A + A and C + C may be accomplished by left shifts, leaving the least significant bit equal to 0.
Line 17
The least significant bit of C is filled with the next bit from the input stream. Bits are taken from each octet in the input stream most significant bit first.

12.4 Pass-through decoding

Figure 2 shows the steps involved in decoding a single binary decision using the Z′-Coder in pass-through mode. No input is required. No context is involved. No probability state index values are updated. The output from the decoder is the single bit B.

Z := 0x8000 + (A + A + A)/8
if (C > Z) {
    B := 0
    A := Z
}
else {
    B := 1
    A := A + 0x10000 - Z
    C := C + 0x10000 - Z
}
while (A ≥ 0x8000) {
    A := A + A - 0x10000
    A := C + C - 0x10000 + next code bit
}
return B
Figure 2: decoder for the Z′-Coder operating in pass-through mode

12.4.1 Notes on specific lines of Figure 2

Line 1
The division is a right shift, discarding the three least significant bits.
Lines 2–5
These lines are executed when the decoded bit is 0.
Lines 6–10
These lines are executed when the decoded bit is 1.
Lines 11–14
When the values in the registers are too large, they must be renormalized.
Lines 12–13
A + A and C + C may be accomplished by left shifts, leaving the least significant bit equal to 0.
Line 13
The least significant bit of C is filled with the next bit from the input stream. Bits are taken from each octet in the input stream most significant bit first.

13 Appendix 4: BZZ coding

Numerous streams in the DjVu file format are compressed using the general-purpose compressor described here called “BZZ”. BZZ transforms the input data using the well-documented Burrows–Wheeler transform. However, the traditional “Move To Front” permutation table is augmented with a frequency estimation provided by the Z′-Coder.

See also file BSByteStream.cpp.

13.1 Encoding

BZZ first takes as input a 24-bit integer as block size between 10K and 4M and an input stream (to be compressed). The stream is partitioned into blocks terminated with a special ⟨EOB⟩ symbol. It is then transformed using the well-documented Burrows–Wheeler (BW or “block sorting”) transform. Then, one block at a time, the block size and resulting output stream are passed as input to be compressed using the Z′-Coder (Appendix 3).

13.2 Decoding

We describe the decoding algorithm by means of pseudo-code.

13.2.1 Decoding pseudo-code

13.2.2 Notes

13.2.2.1 Overview of decoding a block

For each block, one must decode

  • the block size (with decode_raw)
  • the estimation speed FSHIFT ∈ {0, 1, 2} (two bits[1] with the pass-through decoder)
  • the sequence of symbols representing the Burrows–Wheeler transform of the block. At this point, the sequence of symbols is logically encoded as a sequence of numbers representing the position of each symbol in the MTF array.

Then one must perform the inverse Burrows–Wheeler transform to recover the decoded block.

The following points are significant when recovering the BWT and are discussed below:

  • The MTF array is reordered after decoding each number.
  • The numbers themselves are arithmetically encoded.
13.2.2.2 MTF array reordering

The MTF array contains 256 bytes initialized with the identity mapping, that is MTF[0] = 0, MTF[1] = 1, … , MTF[255] = 255.

Whenever one decodes a number MTFNO, the corresponding symbol to store in the Burrows–Wheeler buffer is MTF[MTFNO] (except for the ⟨EOB⟩ symbol — see §13.2.2.3) and the contents of the MTF array are rotated. The rotation moves the symbol that was at position MTF[MTFNO] to a position M that can be 0, 1, 2, or 3. Meanwhile the symbols MTF[M] to MTF[MTFNO - 1] are moved to positions M + 1 to MTFNO.

The position M is chosen using an estimate of the frequency of the symbol MTF[MTFNO]. One strives to position the most frequent symbols at the beginning of the MTF array. To that end, one maintains an array FREQ[0..3] that contains numbers representative of the instantaneous frequencies of the symbols MTF[0..3].

Of course this array must also be “rotated” when the rotation of the MTF array affects its first four elements.

Consider the frequency F ( T ) of a particular symbol S measured after decoding the Tth symbol. Ideally, F ( T ) = λ F ( T 1 ) + D where

  • 0 < λ 1 . This models how quickly one forgets past information, and
  • D = 1 if the Tth symbol is S, and D = 0 otherwise. This allows F ( T ) to grow each time the symbol S occurs.

To avoid multiplying all the frequencies by λ, the FREQ array contains instead G ( T ) = F ( T ) / λ T It is then easy to see that G ( T ) = G ( T 1 ) + D / λ T Therefore we only need to update the G corresponding to the symbol being decoded (i.e.  D = 1 ), since the G for the other symbols does not change.

A dedicated variable FADD contains λ T . Before each rotation we divide FADD by λ T . This is accomplished by the line

FADD = FADD + SHIFTRIGHT(FADD, FSHIFT)

The values 0, 1 or 2 of variable FSHIFT correspond to λ = 1 / 2 , 2/3, or 4/5. To avoid overflows we divide everything (FADD and FREQ[0..3]) by 0x10000000 whenever FADD becomes bigger than 0x10000000. This happens rarely enough to take very little time.

The G ( T ) of the freshly decoded symbol is therefore G ( T 1 ) + FADD. We can only compute this exactly when S is one of the first four symbols of the MTF because we only store FREQ[0..3]. If the decoded number MTFNO is greater than 3, we assume that G ( T 1 ) = 0 and simply consider G ( T ) = FADD.

The number M is then chosen to make sure the array FREQ remains sorted in decreasing order after the rotation.

13.2.2.3 Decoding the number MTFNO

Now we can discuss how the numbers MTFNO are stored. There are 262 arithmetic coding contexts. These are initialized to zero at the beginning of the stream decoding process. They should not be reset to zero at the beginning of the block decoding process.

Because the most frequently used symbols should appear near the front of the array, we expect small values for MTFNO (the index into the MTF array). By design, the number of bits and the number of contexts required to decode increases for larger values of MTFNO:

  • A first bit is decoded using context 0, 1 or 2. Context 0 or 1 is used if the previous MTFNO was 0 or 1. Otherwise context 2 is used. If this bit is set, the new MTFNO is 0.
  • Otherwise a second bit is decoded using context 3, 4 or 5. Context 3 or 4 is used if the previous MTFNO was 0 or 1. Otherwise context 5 is used. If this bit is set, the new MTFNO is 1.
  • Otherwise a third bit is decoded using context 6. If this bit is set, the new MTFNO is obtained by adding 2 to a 1-bit number decoded with decode_bin using context 7.
  • Otherwise a fourth bit is decoded using context 8. If this bit is set, the new MTFNO is obtained by adding 4 to a 2-bit number decoded with decode_bin using contexts 9..11.
  • And so forth until…
  • Otherwise a ninth bit is decoded using context 132. If this bit is set, the new MTFNO is obtained by adding 128 to a 7-bit number decoded with decode_bin using contexts 133..261.
  • Otherwise the next symbol is the ⟨EOB⟩ symbol. Since there is only one ⟨EOB⟩ symbol, we store a zero in the Burrows–Wheeler buffer and record its position in variable MARKERPOS.
13.2.2.4 Inverse Burrows–Wheeler transform

After decoding the BLOCKSIZE symbols composing the Burrows–Wheeler buffer, we need to perform the inverse Burrows–Wheeler transform to recover the BLOCKSIZE - 1 decoded bytes followed by the ⟨EOB⟩ symbol.

To start, we

  • copy the buffer into an array POSC[0..BLOCKSIZE - 1],
  • prepare an array COUNT[0..255] that counts how many occurrences of each symbol are found,
  • prepare an array POSN[0..BLOCKSIZE - 1] that indicates the rank of each occurrence of a symbol in the buffer.

Imagine that we are sorting the buffer in symbol order (⟨EOB⟩ being the smallest symbol). The buffer would be composed of a single ⟨EOB⟩, followed by a run of COUNT[0] symbols 0, followed by a run of COUNT[1] symbols 1, etc.

Using the COUNT array, we compute the position SORTEDPOS[0..255] of each run of symbol in this array.

To perform the inverse Burrows–Wheeler transform, it is now sufficient to follow the thread backwards:


The array DATA[0..BLOCKSIZE - 2] then contains the decoded bytes of the block.