╔══════════════════════════════════════════════════════════════════════════════╗ ║ Elastic, Compressed, Content-Addressed Container ║ ║ ╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍ ║ ║ File Format Specification ║ ╚══════════════════════════════════════════════════════════════════════════════╝ version: 1.0 1 Introduction ══════════════ This section provides a brief introduction to the goals that EC3 is intended to fulfill. 1.1 File Format Purpose and Design Goals ──────────────────────────────────────── The primary goals of the EC3 image format can be found in its name: * Elastic: The format should be adaptable and useful in a wide range of use-cases * Compressed: The format should support compression to reduce filesize and increase efficiency, without compromising random-access to file data * Content-Addressed: The format should support data de-duplication to further increase storage efficiency. * Container: The format should support storing multiple independent filesystems. At a low-level, EC3 is designed to be a format for storing multiple independent streams of data in a single image, with support for optional features such as encryption and compression. On top of this base, EC3 provides facilities for storing multiple whole filesystems within an image file. With support for extended attributes, a directory (or whole filesystem) can be accurately captured within an EC3 image, while compression and chunk-based data de-duplication greatly reduces the amount of disk space required. 1.2 Document Scope ────────────────── This document describes the general layout of an EC3 image, and all of the data structures contained within. It provides all of the information required to read and write fully-featured container images. This document does not describe how to implement any software that can read or write containers, with the exception of describing any algorithms that are used. 2 Overview ══════════ This section provides a general overview of what an EC3 image is, how it works, and a preview of some of the internal data structures. 2.1 What Is An EC3 Image? ───────────────────────── An EC3 image is a data file that can contain, among other things, a set of zero or more logical filesystems, called volumes. Each volume has its own distinct tree of directories and files, while the actual file data is shared across all volumes within the container. An EC3 image is analogous to a traditional disk image containing a logical volume management (LVM) partition. Under an LVM partition scheme, a disk can have multiple "logical" partitions contained within a single "physical" partition. The logical partitions are separate, just like traditional partitions, but they all make use of the same contiguous range of sectors on the disk. Because of this, resizing partitions within an LVM group is as simple as changing the quota of blocks that a particular logical partition is allowed to allocate, and doesn't require physically moving any sectors around. EC3 builds upon this concept by employing cross-volume data de-duplication. Every file that is stored within an EC3 image is split into a set of fixed- size, content-addressed chunks. The size of these chunks is constant within a container. A typical chunk size would be 32KB. So, if two files within a container have the same contents, even if those files are in different volumes, the files will reference the same range of chunks. Only one copy of the file data is stored within the container. Even if the two files vary to some degree, as long as at least one chunk's worth of data is identical, some data can still be shared between the files. Chunks can also be compressed to further reduce file size. The chunking system provides some additional benefits when compression is in use. Seeking through a file is more performant, as you don't have to decompress the entire file to reach the target offset. You can simply skip to the chunk that corresponds to the offset you're looking for. Editing files within a volume is also easier as, again, you only have to decompress and re-write the chunk that has changed. Alongside volumes, EC3 images can contain a range of other data, including: * Manifests * Arbitrary binary blobs. * Executable files. * Digital signatures. * Certificates for digital signature verification. In contrast to volumes, these other data types are much simpler. An application can wrap their own binary data within an EC3 image and immediately make use of features like compression, encryption, and digital signature verification. 2.2 Tags: The Core Unit Of Data ─────────────────────────────── At its most basic level, an EC3 image is just a set of one or more tags. A tag is a contiguous segment of binary data with an associated type and identifier. The contents of a tag can be optionally encrypted and signed. With the exception of the image header and tag table, all data contained within an EC3 image can be found in a tag. The tag tables contains information about all of the tags in the image. 3 Types & Units ═══════════════ This section describes the fundamental data types used within EC3 data structures, as well as some of the units used throughout this document. 3.1 Integral Types ────────────────── All integer values are stored in big-endian format. All signed integer values are stored in 2s-complement format. The following integer types are used: Name Size Sign ─────────────────────────────────────────────── uint8 8 bits (1 byte) Unsigned uint16 16 bits (2 bytes) Unsigned uint32 32 bits (4 bytes) Unsigned uint64 64 bits (8 bytes) Unsigned int8 8 bits (1 byte) Signed int16 16 bits (2 bytes) Signed int32 32 bits (4 bytes) Signed int64 64 bits (8 bytes) Signed 3.2 String Types ──────────────── All strings are stored in UTF-8 Unicode format with a trailing null terminator byte. 3.3 Storage Size Units ────────────────────── Throughout this document, any reference to kilobytes, megabytes, etc refer to the base-2 units, rather than the base-10 units. For example, 1 kilobyte (or 1 KB) is equal to 1024 bytes (rather than 1000 bytes). 4 Algorithms ════════════ EC3 uses a range of algorithms. A selection of hashing algorithms are used for fast data lookup and for ensuring data integrity. 4.1 Fast Hast ───────────── The Fast Hash algorithm is optimised for hashing string data. It is intended for use in string-based hashmaps. The algorithm used for this purpose is the Fowler-Noll-Vo FNV-1 hashing algorithm, with a 64-bit digest size. The implementation of this algorithm can be found elsewhere, but the integer constants used to calculate hashes used by EC3 are provided here: * Offset Basis: 0xCBF29CE484222325 * Prime: 0x100000001B3 4.2 Slow Hash ───────────── The Slow Hash function is optimised for minimal chance of hash collisions. It is intended to generate the content hashes used to uniquely identify data chunks. The algorithm used for this purpose is the SHA-3 algorithm with a 256-bit digest size. 4.3 Checksum ──────────── The Checksum algorithm is used to validate the contents of an EC3 image and detect any corruption. The algorithm used for this purpose is the CRC32 algorithm with a 32-bit digest size. Note that it is not intended to defend against intentional modification of an image, as this can be easily hidden by re-calculating the checksum. EC3 provides other features to defend against malicious modifications. 3 Image Header ══════════════ The Image Header can be found at the beginning of every EC3 image file. It provides critical information about the rest of the file, including the version of the file format that the file uses, and the location and size of the tag table. The header also includes two magic numbers: * A signature to validate that the file is in fact an EC3 image. This must have the value 0x45433358 ('EC3X' in ASCII). * An application magic number that is reserved for use by the creator of the image. 3.1 Image Header Layout ─────────────────────── Offset Description Type ───────────────────────────────────────────── 0x00 Signature uint32 0x04 Format Version uint16 0x06 Chunk Size uint16 0x08 Tag Table Offset uint64 0x10 Tag Count uint64 0x18 Application Magic uint64 3.1.1 Signature The Signature is found at the very beginning of the image file. It, like all integer types, is stored in big-endian. It always has the value 0x45433358 (or 'EC3X' is ASCII). 3.1.2 Format Version This specifies which version of the EC3 Image file format the rest of the file conforms to. Only the Signature and Format Version header items are guaranteed to be the same across all format versions. The format version is encoded as a 16-bit integer, with the following format: 0 1 0 6 XXXXXXXXYYYYYYYY Where X encodes the major number of the format version, and Y encodes the minor version of the format version. For example, version 3.2 would be encoded as 0x0302. 3.1.3 Chunk Size This specifies the size of all data chunks stored within the image, before any transformation operations such as compression or encryption are applied. The following chunk size values are defined: Header Value Chunk Size (bytes) Chunk Size (kilobytes) ──────────────────────────────────────────────────────────────── 0x00 16,384 16 0x01 32,768 32 0x02 65,536 64 0x03 131,072 128 0x04 262,144 256 0x05 524,288 512 0x06 1,048,576 1,024 3.1.4 Tag Table Offset This specifies the offset in bytes from the beginning of the image file to the beginning of the tag table. 3.1.5 Tag Count This specifies the number of entries in the tag table. 3.1.6 Application Magic This is an application-defined value. The creator of an EC3 image can set this to any arbitrary value. Any generic EC3 manipulation tools should preserve the value of this field and, if the tool supports creating EC3 images, allow the user to specify the value to store in this field. 4 Tags ══════ 4.1 The Tag Table ───────────────── 4.2 Tag Types ───────────── 5 Manifest ══════════ 6 Volumes ═════════ 6.1 Filesystem Tree ─────────────────── 6.2 Clusters ──────────── 6.3 String Table ──────────────── 6.4 Extended Attributes ─────────────────────── 7 Binary Blobs ══════════════ 8 Embedded Executables ══════════════════════ 9 Signature Verification ════════════════════════ 10 Encryption ═════════════ vim: shiftwidth=3 expandtab