437 lines
18 KiB
Plaintext
Executable File
437 lines
18 KiB
Plaintext
Executable File
╔══════════════════════════════════════════════════════════════════════════════╗
|
|
║ Elastic, Compressed, Content-Addressed Container ║
|
|
║ ╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍ ║
|
|
║ File Format Specification ║
|
|
╚══════════════════════════════════════════════════════════════════════════════╝
|
|
|
|
version: 1.0
|
|
|
|
1 Introduction
|
|
══════════════
|
|
|
|
This section provides a brief introduction to the goals that EC3 is intended
|
|
to fulfill.
|
|
|
|
|
|
1.1 File Format Purpose and Design Goals
|
|
────────────────────────────────────────
|
|
|
|
The primary goals of the EC3 image format can be found in its name:
|
|
|
|
* Elastic: The format should be adaptable and useful in a wide range of
|
|
use-cases
|
|
|
|
* Compressed: The format should support compression to reduce filesize
|
|
and increase efficiency, without compromising random-access to file
|
|
data
|
|
|
|
* Content-Addressed: The format should support data de-duplication to
|
|
further increase storage efficiency.
|
|
|
|
* Container: The format should support storing multiple independent
|
|
filesystems.
|
|
|
|
At a low-level, EC3 is designed to be a format for storing multiple
|
|
independent streams of data in a single image, with support for optional
|
|
features such as encryption and compression.
|
|
|
|
On top of this base, EC3 provides facilities for storing multiple whole
|
|
filesystems within an image file. With support for extended attributes,
|
|
a directory (or whole filesystem) can be accurately captured within an
|
|
EC3 image, while compression and chunk-based data de-duplication greatly
|
|
reduces the amount of disk space required.
|
|
|
|
|
|
1.2 Document Scope
|
|
──────────────────
|
|
|
|
This document describes the general layout of an EC3 image, and all of the
|
|
data structures contained within. It provides all of the information required
|
|
to read and write fully-featured container images.
|
|
|
|
This document does not describe how to implement any software that can
|
|
read or write containers, with the exception of describing any algorithms
|
|
that are used.
|
|
|
|
|
|
2 Overview
|
|
══════════
|
|
|
|
This section provides a general overview of what an EC3 image is, how it
|
|
works, and a preview of some of the internal data structures.
|
|
|
|
|
|
2.1 What Is An EC3 Image?
|
|
─────────────────────────
|
|
|
|
An EC3 image is a data file that can contain, among other things, a set of
|
|
zero or more logical filesystems, called volumes. Each volume has its own
|
|
distinct tree of directories and files, while the actual file data is shared
|
|
across all volumes within the container.
|
|
|
|
An EC3 image is analogous to a traditional disk image containing a logical
|
|
volume management (LVM) partition. Under an LVM partition scheme, a disk
|
|
can have multiple "logical" partitions contained within a single "physical"
|
|
partition. The logical partitions are separate, just like traditional
|
|
partitions, but they all make use of the same contiguous range of sectors on
|
|
the disk. Because of this, resizing partitions within an LVM group is as
|
|
simple as changing the quota of blocks that a particular logical partition
|
|
is allowed to allocate, and doesn't require physically moving any sectors
|
|
around.
|
|
|
|
EC3 builds upon this concept by employing cross-volume data de-duplication.
|
|
Every file that is stored within an EC3 image is split into a set of fixed-
|
|
size, content-addressed chunks. The size of these chunks is constant within
|
|
a container. A typical chunk size would be 32KB. So, if two files within
|
|
a container have the same contents, even if those files are in different
|
|
volumes, the files will reference the same range of chunks. Only one copy
|
|
of the file data is stored within the container. Even if the two files vary
|
|
to some degree, as long as at least one chunk's worth of data is identical,
|
|
some data can still be shared between the files.
|
|
|
|
Chunks can also be compressed to further reduce file size. The chunking
|
|
system provides some additional benefits when compression is in use. Seeking
|
|
through a file is more performant, as you don't have to decompress the entire
|
|
file to reach the target offset. You can simply skip to the chunk that
|
|
corresponds to the offset you're looking for. Editing files within a volume
|
|
is also easier as, again, you only have to decompress and re-write the chunk
|
|
that has changed.
|
|
|
|
Alongside volumes, EC3 images can contain a range of other data, including:
|
|
* Manifests
|
|
* Arbitrary binary blobs.
|
|
* Executable files.
|
|
* Digital signatures.
|
|
* Certificates for digital signature verification.
|
|
|
|
In contrast to volumes, these other data types are much simpler. An
|
|
application can wrap their own binary data within an EC3 image and
|
|
immediately make use of features like compression, encryption, and digital
|
|
signature verification.
|
|
|
|
|
|
2.2 Tags: The Core Unit Of Data
|
|
───────────────────────────────
|
|
|
|
At its most basic level, an EC3 image is just a set of one or more tags.
|
|
A tag is a contiguous segment of binary data with an associated type and
|
|
identifier. The contents of a tag can be optionally encrypted and signed.
|
|
With the exception of the image header and tag table, all data contained
|
|
within an EC3 image can be found in a tag. The tag tables contains
|
|
information about all of the tags in the image.
|
|
|
|
|
|
3 Types & Units
|
|
═══════════════
|
|
|
|
This section describes the fundamental data types used within EC3 data
|
|
structures, as well as some of the units used throughout this document.
|
|
|
|
3.1 Integral Types
|
|
──────────────────
|
|
|
|
All integer values are stored in big-endian format. All signed integer values
|
|
are stored in 2s-complement format. The following integer types are used:
|
|
|
|
Name Size Sign
|
|
───────────────────────────────────────────────
|
|
uint8 8 bits (1 byte) Unsigned
|
|
uint16 16 bits (2 bytes) Unsigned
|
|
uint32 32 bits (4 bytes) Unsigned
|
|
uint64 64 bits (8 bytes) Unsigned
|
|
int8 8 bits (1 byte) Signed
|
|
int16 16 bits (2 bytes) Signed
|
|
int32 32 bits (4 bytes) Signed
|
|
int64 64 bits (8 bytes) Signed
|
|
|
|
|
|
3.2 String Types
|
|
────────────────
|
|
|
|
All strings are stored in UTF-8 Unicode format with a trailing null
|
|
terminator byte.
|
|
|
|
|
|
3.3 Storage Size Units
|
|
──────────────────────
|
|
|
|
Throughout this document, any reference to kilobytes, megabytes, etc refer
|
|
to the base-2 units, rather than the base-10 units. For example, 1 kilobyte
|
|
(or 1 KB) is equal to 1024 bytes (rather than 1000 bytes).
|
|
|
|
|
|
4 Algorithms
|
|
════════════
|
|
|
|
EC3 uses a range of algorithms. A selection of hashing algorithms are used
|
|
for fast data lookup and for ensuring data integrity.
|
|
|
|
|
|
4.1 Fast Hast
|
|
─────────────
|
|
|
|
The Fast Hash algorithm is optimised for hashing string data. It is intended
|
|
for use in string-based hashmaps. The algorithm used for this purpose is
|
|
the Fowler-Noll-Vo FNV-1 hashing algorithm, with a 64-bit digest size.
|
|
|
|
The implementation of this algorithm can be found elsewhere, but the integer
|
|
constants used to calculate hashes used by EC3 are provided here:
|
|
|
|
* Offset Basis: 0xCBF29CE484222325
|
|
* Prime: 0x100000001B3
|
|
|
|
|
|
4.2 Slow Hash
|
|
─────────────
|
|
|
|
The Slow Hash function is optimised for minimal chance of hash collisions.
|
|
It is intended to generate the content hashes used to uniquely identify data
|
|
chunks. The algorithm used for this purpose is the SHA-3 algorithm with a
|
|
256-bit digest size.
|
|
|
|
|
|
4.3 Checksum
|
|
────────────
|
|
|
|
The Checksum algorithm is used to validate the contents of an EC3 image
|
|
and detect any corruption. The algorithm used for this purpose is the CRC32
|
|
algorithm with a 32-bit digest size.
|
|
|
|
Note that it is not intended to defend against intentional modification of an
|
|
image, as this can be easily hidden by re-calculating the checksum. EC3
|
|
provides other features to defend against malicious modifications.
|
|
|
|
|
|
5 Image Header
|
|
══════════════
|
|
|
|
The Image Header can be found at the beginning of every EC3 image file.
|
|
It provides critical information about the rest of the file, including the
|
|
version of the file format that the file uses, and the location and size of
|
|
the tag table. The header also includes two magic numbers:
|
|
|
|
* A signature to validate that the file is in fact an EC3 image. This
|
|
must have the value 0x45433358 ('EC3X' in ASCII).
|
|
* An application magic number that is reserved for use by the creator of
|
|
the image.
|
|
|
|
|
|
5.1 Image Header Layout
|
|
───────────────────────
|
|
|
|
Offset Description Type
|
|
────────────────────────────────────────
|
|
0x00 Signature uint32
|
|
0x04 Format Version uint16
|
|
0x06 Chunk Size uint16
|
|
0x08 Tag Table Offset uint64
|
|
0x10 Tag Count uint64
|
|
0x18 Application Magic uint64
|
|
|
|
5.1.1 Signature
|
|
The Signature is found at the very beginning of the image file. It, like
|
|
all integer types, is stored in big-endian. It always has the value
|
|
0x45433358 (or 'EC3X' is ASCII).
|
|
|
|
5.1.2 Format Version
|
|
This specifies which version of the EC3 Image file format
|
|
the rest of the file conforms to. Only the Signature and Format Version
|
|
header items are guaranteed to be the same across all format versions.
|
|
The format version is encoded as a 16-bit integer, with the following
|
|
format:
|
|
0 1
|
|
0 6
|
|
XXXXXXXXYYYYYYYY
|
|
|
|
Where X encodes the major number of the format version, and Y encodes
|
|
the minor version of the format version. For example, version 3.2 would
|
|
be encoded as 0x0302.
|
|
|
|
5.1.3 Chunk Size
|
|
This specifies the size of all data chunks stored within the image, before
|
|
any transformation operations such as compression or encryption are
|
|
applied.
|
|
|
|
The following chunk size values are defined:
|
|
|
|
Header Value Chunk Size (bytes) Chunk Size (kilobytes)
|
|
────────────────────────────────────────────────────────────────
|
|
0x00 16,384 16
|
|
0x01 32,768 32
|
|
0x02 65,536 64
|
|
0x03 131,072 128
|
|
0x04 262,144 256
|
|
0x05 524,288 512
|
|
0x06 1,048,576 1,024
|
|
|
|
5.1.4 Tag Table Offset
|
|
This specifies the offset in bytes from the beginning of the image file
|
|
to the beginning of the tag table.
|
|
|
|
5.1.5 Tag Count
|
|
This specifies the number of entries in the tag table.
|
|
|
|
5.1.6 Application Magic
|
|
This is an application-defined value. The creator of an EC3 image can
|
|
set this to any arbitrary value. Any generic EC3 manipulation tools should
|
|
preserve the value of this field and, if the tool supports creating EC3
|
|
images, allow the user to specify the value to store in this field.
|
|
|
|
|
|
6 Tags
|
|
══════
|
|
|
|
Tags are the fundamental units of data storage in an EC3 image. Every image
|
|
contains one or more tags. A tag is essentially a contiguous range of data
|
|
within an image, with an associated type, identifier, and flags. Various
|
|
data processing layers can be applied to the contents of a tag, such as
|
|
encryption or compression. Every tag within an image can be referenced either
|
|
by its index within the tag table or by an optional 64-bit identifier.
|
|
|
|
|
|
6.1 The Tag Table
|
|
─────────────────
|
|
|
|
The Tag Table describes all of the tags in an image. Its location and size
|
|
can be found by parsing the Image Header. The Tag Table consists of a number
|
|
of entries, one for each tag in the image.
|
|
|
|
Each entry in the Tag Table has the following layout:
|
|
|
|
Offset Description Type
|
|
────────────────────────────────────────
|
|
0x00 Tag Type uint32
|
|
0x04 Flags uint32
|
|
0x08 Checksum uint32
|
|
0x1C Reserved uint32
|
|
0x20 Identifier uint64
|
|
0x28 Offset uint64
|
|
0x30 Size uint64
|
|
0x38 Reserved uint64
|
|
|
|
6.1.1 Tag Type
|
|
A 32-bit integer indicating the type of the tag. EC3 defines a range
|
|
of different tag types, which can be found in Section 4.2
|
|
|
|
6.1.2 Flags
|
|
Flags describing certain attributes of a tag, such as whether the tag
|
|
is compressed, encrypted, or signed. The full set of flags can be found
|
|
in Section 6.3
|
|
|
|
6.1.3 Checksum
|
|
A checksum of the tag data, calculated on the raw data as it appears
|
|
on-disk, after any data processing layers (compression, encryption, etc)
|
|
have been applied. This checksum should be checked before the tag data is
|
|
processed any further. The checksum is calculated using the algorithm
|
|
described in Section 4.3
|
|
|
|
6.1.4 Identifier
|
|
An arbitrary 64-bit integer that can be used to identify a tag. Every tag
|
|
within an image must have a unique identifier. The only exception is the
|
|
identifier value 0x00, which any number of tags can use as their
|
|
identifier and is used to indicate that a tag has no identifier.
|
|
|
|
6.1.5 Offset and Size
|
|
The offset from the beginning of the image file to the beginning of the
|
|
tag data, and the length of the tag data. Both values are measured in
|
|
bytes.
|
|
|
|
|
|
6.2 Tag Types
|
|
─────────────
|
|
|
|
The type of a tag determines the format of the data contained within it.
|
|
|
|
6.2.1 VOLU: Volume
|
|
Volume tags contain the filesystem tree and file/directory metadata for a
|
|
single volume within the container.
|
|
|
|
6.2.2 CTAB: Chunk Table
|
|
The Chunk Table contains the file data chunks for all volumes within the
|
|
container.
|
|
|
|
6.2.3 XATR: Extended Attributes Table
|
|
The Extended Attributes table contains any extended attributes referenced
|
|
by any file or directory stored in any of the volumes in the container.
|
|
|
|
6.2.4 STAB: String Table
|
|
The String Table contains all of the strings used as file/directory names
|
|
for all files and directores stored in the container.
|
|
|
|
6.2.5 MFST: Manifest
|
|
The manifest is a key-value data store that holds information describing
|
|
the container. Apart from a few required keys, any arbitrary keys and
|
|
values can be stored in the manifest.
|
|
|
|
6.2.6 BLOB: Binary Data
|
|
Binary blobs are contiguous buffers of arbitrary binary data. EC3 places
|
|
no requirements on the length or layout of this data, so these tags can
|
|
be used for any application-defined purpose.
|
|
|
|
6.2.7 EXEC: Executable
|
|
Executable tags are used to store embedded executable files. For certain
|
|
executable file formats, these tags can also include auxiliary information
|
|
about the executable file to allow readers to load and run the executable
|
|
without having to implement a parser for the executable file format.
|
|
|
|
6.2.8 CERT: Digital Certificate
|
|
If any part of the image is digitally signed, it will also contain one or
|
|
more Digital Certificate tags. These tags contain either:
|
|
|
|
a) the certificate used to sign the container; or
|
|
b) (optionally) any intermediate certificates needed to link the
|
|
signing certificate back to a trusted root certificate.
|
|
|
|
6.2.9 CSIG: Digital Signature
|
|
If any part of the image is digitally signed, this tag contains the actual
|
|
signature data.
|
|
|
|
|
|
6.3 Tag Flags
|
|
─────────────
|
|
|
|
|
|
6.4 Tag Identifiers
|
|
───────────────────
|
|
|
|
|
|
7 Manifest
|
|
══════════
|
|
|
|
8 Volumes
|
|
═════════
|
|
|
|
8.1 Filesystem Tree
|
|
───────────────────
|
|
|
|
|
|
8.2 Clusters
|
|
────────────
|
|
|
|
|
|
8.3 String Table
|
|
────────────────
|
|
|
|
|
|
8.4 Extended Attributes
|
|
───────────────────────
|
|
|
|
|
|
9 Binary Blobs
|
|
══════════════
|
|
|
|
|
|
10 Embedded Executables
|
|
═══════════════════════
|
|
|
|
|
|
11 Signature Verification
|
|
═════════════════════════
|
|
|
|
|
|
12 Encryption
|
|
═════════════
|
|
|
|
|
|
vim: shiftwidth=3 expandtab
|