330 lines
13 KiB
Plaintext
Executable File
330 lines
13 KiB
Plaintext
Executable File
╔══════════════════════════════════════════════════════════════════════════════╗
|
|
║ Elastic, Compressed, Content-Addressed Container ║
|
|
║ ╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍ ║
|
|
║ File Format Specification ║
|
|
╚══════════════════════════════════════════════════════════════════════════════╝
|
|
|
|
version: 1.0
|
|
|
|
1 Introduction
|
|
══════════════
|
|
|
|
This section provides a brief introduction to the goals that EC3 is intended
|
|
to fulfill.
|
|
|
|
|
|
1.1 File Format Purpose and Design Goals
|
|
────────────────────────────────────────
|
|
|
|
The primary goals of the EC3 image format can be found in its name:
|
|
|
|
* Elastic: The format should be adaptable and useful in a wide range of
|
|
use-cases
|
|
|
|
* Compressed: The format should support compression to reduce filesize
|
|
and increase efficiency, without compromising random-access to file
|
|
data
|
|
|
|
* Content-Addressed: The format should support data de-duplication to
|
|
further increase storage efficiency.
|
|
|
|
* Container: The format should support storing multiple independent
|
|
filesystems.
|
|
|
|
At a low-level, EC3 is designed to be a format for storing multiple
|
|
independent streams of data in a single image, with support for optional
|
|
features such as encryption and compression.
|
|
|
|
On top of this base, EC3 provides facilities for storing multiple whole
|
|
filesystems within an image file. With support for extended attributes,
|
|
a directory (or whole filesystem) can be accurately captured within an
|
|
EC3 image, while compression and chunk-based data de-duplication greatly
|
|
reduces the amount of disk space required.
|
|
|
|
|
|
1.2 Document Scope
|
|
──────────────────
|
|
|
|
This document describes the general layout of an EC3 image, and all of the
|
|
data structures contained within. It provides all of the information required
|
|
to read and write fully-featured container images.
|
|
|
|
This document does not describe how to implement any software that can
|
|
read or write containers, with the exception of describing any algorithms
|
|
that are used.
|
|
|
|
|
|
2 Overview
|
|
══════════
|
|
|
|
This section provides a general overview of what an EC3 image is, how it
|
|
works, and a preview of some of the internal data structures.
|
|
|
|
|
|
2.1 What Is An EC3 Image?
|
|
─────────────────────────
|
|
|
|
An EC3 image is a data file that can contain, among other things, a set of
|
|
zero or more logical filesystems, called volumes. Each volume has its own
|
|
distinct tree of directories and files, while the actual file data is shared
|
|
across all volumes within the container.
|
|
|
|
An EC3 image is analogous to a traditional disk image containing a logical
|
|
volume management (LVM) partition. Under an LVM partition scheme, a disk
|
|
can have multiple "logical" partitions contained within a single "physical"
|
|
partition. The logical partitions are separate, just like traditional
|
|
partitions, but they all make use of the same contiguous range of sectors on
|
|
the disk. Because of this, resizing partitions within an LVM group is as
|
|
simple as changing the quota of blocks that a particular logical partition
|
|
is allowed to allocate, and doesn't require physically moving any sectors
|
|
around.
|
|
|
|
EC3 builds upon this concept by employing cross-volume data de-duplication.
|
|
Every file that is stored within an EC3 image is split into a set of fixed-
|
|
size, content-addressed chunks. The size of these chunks is constant within
|
|
a container. A typical chunk size would be 32KB. So, if two files within
|
|
a container have the same contents, even if those files are in different
|
|
volumes, the files will reference the same range of chunks. Only one copy
|
|
of the file data is stored within the container. Even if the two files vary
|
|
to some degree, as long as at least one chunk's worth of data is identical,
|
|
some data can still be shared between the files.
|
|
|
|
Chunks can also be compressed to further reduce file size. The chunking
|
|
system provides some additional benefits when compression is in use. Seeking
|
|
through a file is more performant, as you don't have to decompress the entire
|
|
file to reach the target offset. You can simply skip to the chunk that
|
|
corresponds to the offset you're looking for. Editing files within a volume
|
|
is also easier as, again, you only have to decompress and re-write the chunk
|
|
that has changed.
|
|
|
|
Alongside volumes, EC3 images can contain a range of other data, including:
|
|
* Manifests
|
|
* Arbitrary binary blobs.
|
|
* Executable files.
|
|
* Digital signatures.
|
|
* Certificates for digital signature verification.
|
|
|
|
In contrast to volumes, these other data types are much simpler. An
|
|
application can wrap their own binary data within an EC3 image and
|
|
immediately make use of features like compression, encryption, and digital
|
|
signature verification.
|
|
|
|
|
|
2.2 Tags: The Core Unit Of Data
|
|
───────────────────────────────
|
|
|
|
At its most basic level, an EC3 image is just a set of one or more tags.
|
|
A tag is a contiguous segment of binary data with an associated type and
|
|
identifier. The contents of a tag can be optionally encrypted and signed.
|
|
With the exception of the image header and tag table, all data contained
|
|
within an EC3 image can be found in a tag. The tag tables contains
|
|
information about all of the tags in the image.
|
|
|
|
|
|
3 Types & Units
|
|
═══════════════
|
|
|
|
This section describes the fundamental data types used within EC3 data
|
|
structures, as well as some of the units used throughout this document.
|
|
|
|
3.1 Integral Types
|
|
──────────────────
|
|
|
|
All integer values are stored in big-endian format. All signed integer values
|
|
are stored in 2s-complement format. The following integer types are used:
|
|
|
|
Name Size Sign
|
|
───────────────────────────────────────────────
|
|
uint8 8 bits (1 byte) Unsigned
|
|
uint16 16 bits (2 bytes) Unsigned
|
|
uint32 32 bits (4 bytes) Unsigned
|
|
uint64 64 bits (8 bytes) Unsigned
|
|
int8 8 bits (1 byte) Signed
|
|
int16 16 bits (2 bytes) Signed
|
|
int32 32 bits (4 bytes) Signed
|
|
int64 64 bits (8 bytes) Signed
|
|
|
|
|
|
3.2 String Types
|
|
────────────────
|
|
|
|
All strings are stored in UTF-8 Unicode format with a trailing null
|
|
terminator byte.
|
|
|
|
|
|
3.3 Storage Size Units
|
|
──────────────────────
|
|
|
|
Throughout this document, any reference to kilobytes, megabytes, etc refer
|
|
to the base-2 units, rather than the base-10 units. For example, 1 kilobyte
|
|
(or 1 KB) is equal to 1024 bytes (rather than 1000 bytes).
|
|
|
|
|
|
4 Algorithms
|
|
════════════
|
|
|
|
EC3 uses a range of algorithms. A selection of hashing algorithms are used
|
|
for fast data lookup and for ensuring data integrity.
|
|
|
|
|
|
4.1 Fast Hast
|
|
─────────────
|
|
|
|
The Fast Hash algorithm is optimised for hashing string data. It is intended
|
|
for use in string-based hashmaps. The algorithm used for this purpose is
|
|
the Fowler-Noll-Vo FNV-1 hashing algorithm, with a 64-bit digest size.
|
|
|
|
The implementation of this algorithm can be found elsewhere, but the integer
|
|
constants used to calculate hashes used by EC3 are provided here:
|
|
|
|
* Offset Basis: 0xCBF29CE484222325
|
|
* Prime: 0x100000001B3
|
|
|
|
|
|
4.2 Slow Hash
|
|
─────────────
|
|
|
|
The Slow Hash function is optimised for minimal chance of hash collisions.
|
|
It is intended to generate the content hashes used to uniquely identify data
|
|
chunks. The algorithm used for this purpose is the SHA-3 algorithm with a
|
|
256-bit digest size.
|
|
|
|
|
|
4.3 Checksum
|
|
────────────
|
|
|
|
The Checksum algorithm is used to validate the contents of an EC3 image
|
|
and detect any corruption. The algorithm used for this purpose is the CRC32
|
|
algorithm with a 32-bit digest size.
|
|
|
|
Note that it is not intended to defend against intentional modification of an
|
|
image, as this can be easily hidden by re-calculating the checksum. EC3
|
|
provides other features to defend against malicious modifications.
|
|
|
|
|
|
3 Image Header
|
|
══════════════
|
|
|
|
The Image Header can be found at the beginning of every EC3 image file.
|
|
It provides critical information about the rest of the file, including the
|
|
version of the file format that the file uses, and the location and size of
|
|
the tag table. The header also includes two magic numbers:
|
|
|
|
* A signature to validate that the file is in fact an EC3 image. This
|
|
must have the value 0x45433358 ('EC3X' in ASCII).
|
|
* An application magic number that is reserved for use by the creator of
|
|
the image.
|
|
|
|
|
|
3.1 Image Header Layout
|
|
───────────────────────
|
|
|
|
Offset Description Type
|
|
─────────────────────────────────────────────
|
|
0x00 Signature uint32
|
|
0x04 Format Version uint16
|
|
0x06 Chunk Size uint16
|
|
0x08 Tag Table Offset uint64
|
|
0x10 Tag Count uint64
|
|
0x18 Application Magic uint64
|
|
|
|
3.1.1 Signature
|
|
The Signature is found at the very beginning of the image file. It, like
|
|
all integer types, is stored in big-endian. It always has the value
|
|
0x45433358 (or 'EC3X' is ASCII).
|
|
|
|
3.1.2 Format Version
|
|
This specifies which version of the EC3 Image file format
|
|
the rest of the file conforms to. Only the Signature and Format Version
|
|
header items are guaranteed to be the same across all format versions.
|
|
The format version is encoded as a 16-bit integer, with the following
|
|
format:
|
|
0 1
|
|
0 6
|
|
XXXXXXXXYYYYYYYY
|
|
|
|
Where X encodes the major number of the format version, and Y encodes
|
|
the minor version of the format version. For example, version 3.2 would
|
|
be encoded as 0x0302.
|
|
|
|
3.1.3 Chunk Size
|
|
This specifies the size of all data chunks stored within the image, before
|
|
any transformation operations such as compression or encryption are
|
|
applied.
|
|
|
|
The following chunk size values are defined:
|
|
|
|
Header Value Chunk Size (bytes) Chunk Size (kilobytes)
|
|
────────────────────────────────────────────────────────────────
|
|
0x00 16,384 16
|
|
0x01 32,768 32
|
|
0x02 65,536 64
|
|
0x03 131,072 128
|
|
0x04 262,144 256
|
|
0x05 524,288 512
|
|
0x06 1,048,576 1,024
|
|
|
|
3.1.4 Tag Table Offset
|
|
This specifies the offset in bytes from the beginning of the image file
|
|
to the beginning of the tag table.
|
|
|
|
3.1.5 Tag Count
|
|
This specifies the number of entries in the tag table.
|
|
|
|
3.1.6 Application Magic
|
|
This is an application-defined value. The creator of an EC3 image can
|
|
set this to any arbitrary value. Any generic EC3 manipulation tools should
|
|
preserve the value of this field and, if the tool supports creating EC3
|
|
images, allow the user to specify the value to store in this field.
|
|
|
|
|
|
4 Tags
|
|
══════
|
|
|
|
4.1 The Tag Table
|
|
─────────────────
|
|
|
|
4.2 Tag Types
|
|
─────────────
|
|
|
|
|
|
5 Manifest
|
|
══════════
|
|
|
|
6 Volumes
|
|
═════════
|
|
|
|
6.1 Filesystem Tree
|
|
───────────────────
|
|
|
|
|
|
6.2 Clusters
|
|
────────────
|
|
|
|
|
|
6.3 String Table
|
|
────────────────
|
|
|
|
|
|
6.4 Extended Attributes
|
|
───────────────────────
|
|
|
|
|
|
7 Binary Blobs
|
|
══════════════
|
|
|
|
|
|
8 Embedded Executables
|
|
══════════════════════
|
|
|
|
|
|
9 Signature Verification
|
|
════════════════════════
|
|
|
|
|
|
10 Encryption
|
|
═════════════
|
|
|
|
|
|
vim: shiftwidth=3 expandtab
|