add existing documentation
This commit is contained in:
329
doc/format.txt
Executable file
329
doc/format.txt
Executable file
@@ -0,0 +1,329 @@
|
||||
╔══════════════════════════════════════════════════════════════════════════════╗
|
||||
║ Elastic, Compressed, Content-Addressed Container ║
|
||||
║ ╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍ ║
|
||||
║ File Format Specification ║
|
||||
╚══════════════════════════════════════════════════════════════════════════════╝
|
||||
|
||||
version: 1.0
|
||||
|
||||
1 Introduction
|
||||
══════════════
|
||||
|
||||
This section provides a brief introduction to the goals that EC3 is intended
|
||||
to fulfill.
|
||||
|
||||
|
||||
1.1 File Format Purpose and Design Goals
|
||||
────────────────────────────────────────
|
||||
|
||||
The primary goals of the EC3 image format can be found in its name:
|
||||
|
||||
* Elastic: The format should be adaptable and useful in a wide range of
|
||||
use-cases
|
||||
|
||||
* Compressed: The format should support compression to reduce filesize
|
||||
and increase efficiency, without compromising random-access to file
|
||||
data
|
||||
|
||||
* Content-Addressed: The format should support data de-duplication to
|
||||
further increase storage efficiency.
|
||||
|
||||
* Container: The format should support storing multiple independent
|
||||
filesystems.
|
||||
|
||||
At a low-level, EC3 is designed to be a format for storing multiple
|
||||
independent streams of data in a single image, with support for optional
|
||||
features such as encryption and compression.
|
||||
|
||||
On top of this base, EC3 provides facilities for storing multiple whole
|
||||
filesystems within an image file. With support for extended attributes,
|
||||
a directory (or whole filesystem) can be accurately captured within an
|
||||
EC3 image, while compression and chunk-based data de-duplication greatly
|
||||
reduces the amount of disk space required.
|
||||
|
||||
|
||||
1.2 Document Scope
|
||||
──────────────────
|
||||
|
||||
This document describes the general layout of an EC3 image, and all of the
|
||||
data structures contained within. It provides all of the information required
|
||||
to read and write fully-featured container images.
|
||||
|
||||
This document does not describe how to implement any software that can
|
||||
read or write containers, with the exception of describing any algorithms
|
||||
that are used.
|
||||
|
||||
|
||||
2 Overview
|
||||
══════════
|
||||
|
||||
This section provides a general overview of what an EC3 image is, how it
|
||||
works, and a preview of some of the internal data structures.
|
||||
|
||||
|
||||
2.1 What Is An EC3 Image?
|
||||
─────────────────────────
|
||||
|
||||
An EC3 image is a data file that can contain, among other things, a set of
|
||||
zero or more logical filesystems, called volumes. Each volume has its own
|
||||
distinct tree of directories and files, while the actual file data is shared
|
||||
across all volumes within the container.
|
||||
|
||||
An EC3 image is analogous to a traditional disk image containing a logical
|
||||
volume management (LVM) partition. Under an LVM partition scheme, a disk
|
||||
can have multiple "logical" partitions contained within a single "physical"
|
||||
partition. The logical partitions are separate, just like traditional
|
||||
partitions, but they all make use of the same contiguous range of sectors on
|
||||
the disk. Because of this, resizing partitions within an LVM group is as
|
||||
simple as changing the quota of blocks that a particular logical partition
|
||||
is allowed to allocate, and doesn't require physically moving any sectors
|
||||
around.
|
||||
|
||||
EC3 builds upon this concept by employing cross-volume data de-duplication.
|
||||
Every file that is stored within an EC3 image is split into a set of fixed-
|
||||
size, content-addressed chunks. The size of these chunks is constant within
|
||||
a container. A typical chunk size would be 32KB. So, if two files within
|
||||
a container have the same contents, even if those files are in different
|
||||
volumes, the files will reference the same range of chunks. Only one copy
|
||||
of the file data is stored within the container. Even if the two files vary
|
||||
to some degree, as long as at least one chunk's worth of data is identical,
|
||||
some data can still be shared between the files.
|
||||
|
||||
Chunks can also be compressed to further reduce file size. The chunking
|
||||
system provides some additional benefits when compression is in use. Seeking
|
||||
through a file is more performant, as you don't have to decompress the entire
|
||||
file to reach the target offset. You can simply skip to the chunk that
|
||||
corresponds to the offset you're looking for. Editing files within a volume
|
||||
is also easier as, again, you only have to decompress and re-write the chunk
|
||||
that has changed.
|
||||
|
||||
Alongside volumes, EC3 images can contain a range of other data, including:
|
||||
* Manifests
|
||||
* Arbitrary binary blobs.
|
||||
* Executable files.
|
||||
* Digital signatures.
|
||||
* Certificates for digital signature verification.
|
||||
|
||||
In contrast to volumes, these other data types are much simpler. An
|
||||
application can wrap their own binary data within an EC3 image and
|
||||
immediately make use of features like compression, encryption, and digital
|
||||
signature verification.
|
||||
|
||||
|
||||
2.2 Tags: The Core Unit Of Data
|
||||
───────────────────────────────
|
||||
|
||||
At its most basic level, an EC3 image is just a set of one or more tags.
|
||||
A tag is a contiguous segment of binary data with an associated type and
|
||||
identifier. The contents of a tag can be optionally encrypted and signed.
|
||||
With the exception of the image header and tag table, all data contained
|
||||
within an EC3 image can be found in a tag. The tag tables contains
|
||||
information about all of the tags in the image.
|
||||
|
||||
|
||||
3 Types & Units
|
||||
═══════════════
|
||||
|
||||
This section describes the fundamental data types used within EC3 data
|
||||
structures, as well as some of the units used throughout this document.
|
||||
|
||||
3.1 Integral Types
|
||||
──────────────────
|
||||
|
||||
All integer values are stored in big-endian format. All signed integer values
|
||||
are stored in 2s-complement format. The following integer types are used:
|
||||
|
||||
Name Size Sign
|
||||
───────────────────────────────────────────────
|
||||
uint8 8 bits (1 byte) Unsigned
|
||||
uint16 16 bits (2 bytes) Unsigned
|
||||
uint32 32 bits (4 bytes) Unsigned
|
||||
uint64 64 bits (8 bytes) Unsigned
|
||||
int8 8 bits (1 byte) Signed
|
||||
int16 16 bits (2 bytes) Signed
|
||||
int32 32 bits (4 bytes) Signed
|
||||
int64 64 bits (8 bytes) Signed
|
||||
|
||||
|
||||
3.2 String Types
|
||||
────────────────
|
||||
|
||||
All strings are stored in UTF-8 Unicode format with a trailing null
|
||||
terminator byte.
|
||||
|
||||
|
||||
3.3 Storage Size Units
|
||||
──────────────────────
|
||||
|
||||
Throughout this document, any reference to kilobytes, megabytes, etc refer
|
||||
to the base-2 units, rather than the base-10 units. For example, 1 kilobyte
|
||||
(or 1 KB) is equal to 1024 bytes (rather than 1000 bytes).
|
||||
|
||||
|
||||
4 Algorithms
|
||||
════════════
|
||||
|
||||
EC3 uses a range of algorithms. A selection of hashing algorithms are used
|
||||
for fast data lookup and for ensuring data integrity.
|
||||
|
||||
|
||||
4.1 Fast Hast
|
||||
─────────────
|
||||
|
||||
The Fast Hash algorithm is optimised for hashing string data. It is intended
|
||||
for use in string-based hashmaps. The algorithm used for this purpose is
|
||||
the Fowler-Noll-Vo FNV-1 hashing algorithm, with a 64-bit digest size.
|
||||
|
||||
The implementation of this algorithm can be found elsewhere, but the integer
|
||||
constants used to calculate hashes used by EC3 are provided here:
|
||||
|
||||
* Offset Basis: 0xCBF29CE484222325
|
||||
* Prime: 0x100000001B3
|
||||
|
||||
|
||||
4.2 Slow Hash
|
||||
─────────────
|
||||
|
||||
The Slow Hash function is optimised for minimal chance of hash collisions.
|
||||
It is intended to generate the content hashes used to uniquely identify data
|
||||
chunks. The algorithm used for this purpose is the SHA-3 algorithm with a
|
||||
256-bit digest size.
|
||||
|
||||
|
||||
4.3 Checksum
|
||||
────────────
|
||||
|
||||
The Checksum algorithm is used to validate the contents of an EC3 image
|
||||
and detect any corruption. The algorithm used for this purpose is the CRC32
|
||||
algorithm with a 32-bit digest size.
|
||||
|
||||
Note that it is not intended to defend against intentional modification of an
|
||||
image, as this can be easily hidden by re-calculating the checksum. EC3
|
||||
provides other features to defend against malicious modifications.
|
||||
|
||||
|
||||
3 Image Header
|
||||
══════════════
|
||||
|
||||
The Image Header can be found at the beginning of every EC3 image file.
|
||||
It provides critical information about the rest of the file, including the
|
||||
version of the file format that the file uses, and the location and size of
|
||||
the tag table. The header also includes two magic numbers:
|
||||
|
||||
* A signature to validate that the file is in fact an EC3 image. This
|
||||
must have the value 0x45433358 ('EC3X' in ASCII).
|
||||
* An application magic number that is reserved for use by the creator of
|
||||
the image.
|
||||
|
||||
|
||||
3.1 Image Header Layout
|
||||
───────────────────────
|
||||
|
||||
Offset Description Type
|
||||
─────────────────────────────────────────────
|
||||
0x00 Signature uint32
|
||||
0x04 Format Version uint16
|
||||
0x06 Chunk Size uint16
|
||||
0x08 Tag Table Offset uint64
|
||||
0x10 Tag Count uint64
|
||||
0x18 Application Magic uint64
|
||||
|
||||
3.1.1 Signature
|
||||
The Signature is found at the very beginning of the image file. It, like
|
||||
all integer types, is stored in big-endian. It always has the value
|
||||
0x45433358 (or 'EC3X' is ASCII).
|
||||
|
||||
3.1.2 Format Version
|
||||
This specifies which version of the EC3 Image file format
|
||||
the rest of the file conforms to. Only the Signature and Format Version
|
||||
header items are guaranteed to be the same across all format versions.
|
||||
The format version is encoded as a 16-bit integer, with the following
|
||||
format:
|
||||
0 1
|
||||
0 6
|
||||
XXXXXXXXYYYYYYYY
|
||||
|
||||
Where X encodes the major number of the format version, and Y encodes
|
||||
the minor version of the format version. For example, version 3.2 would
|
||||
be encoded as 0x0302.
|
||||
|
||||
3.1.3 Chunk Size
|
||||
This specifies the size of all data chunks stored within the image, before
|
||||
any transformation operations such as compression or encryption are
|
||||
applied.
|
||||
|
||||
The following chunk size values are defined:
|
||||
|
||||
Header Value Chunk Size (bytes) Chunk Size (kilobytes)
|
||||
────────────────────────────────────────────────────────────────
|
||||
0x00 16,384 16
|
||||
0x01 32,768 32
|
||||
0x02 65,536 64
|
||||
0x03 131,072 128
|
||||
0x04 262,144 256
|
||||
0x05 524,288 512
|
||||
0x06 1,048,576 1,024
|
||||
|
||||
3.1.4 Tag Table Offset
|
||||
This specifies the offset in bytes from the beginning of the image file
|
||||
to the beginning of the tag table.
|
||||
|
||||
3.1.5 Tag Count
|
||||
This specifies the number of entries in the tag table.
|
||||
|
||||
3.1.6 Application Magic
|
||||
This is an application-defined value. The creator of an EC3 image can
|
||||
set this to any arbitrary value. Any generic EC3 manipulation tools should
|
||||
preserve the value of this field and, if the tool supports creating EC3
|
||||
images, allow the user to specify the value to store in this field.
|
||||
|
||||
|
||||
4 Tags
|
||||
══════
|
||||
|
||||
4.1 The Tag Table
|
||||
─────────────────
|
||||
|
||||
4.2 Tag Types
|
||||
─────────────
|
||||
|
||||
|
||||
5 Manifest
|
||||
══════════
|
||||
|
||||
6 Volumes
|
||||
═════════
|
||||
|
||||
6.1 Filesystem Tree
|
||||
───────────────────
|
||||
|
||||
|
||||
6.2 Clusters
|
||||
────────────
|
||||
|
||||
|
||||
6.3 String Table
|
||||
────────────────
|
||||
|
||||
|
||||
6.4 Extended Attributes
|
||||
───────────────────────
|
||||
|
||||
|
||||
7 Binary Blobs
|
||||
══════════════
|
||||
|
||||
|
||||
8 Embedded Executables
|
||||
══════════════════════
|
||||
|
||||
|
||||
9 Signature Verification
|
||||
════════════════════════
|
||||
|
||||
|
||||
10 Encryption
|
||||
═════════════
|
||||
|
||||
|
||||
vim: shiftwidth=3 expandtab
|
||||
Reference in New Issue
Block a user