doc: lots more information about the image layout

This commit is contained in:
2024-12-17 21:48:08 +00:00
parent 5ec115756c
commit 59d40ea4d8
2 changed files with 162 additions and 53 deletions

View File

@@ -38,7 +38,7 @@ version: 1.0
On top of this base, EC3 provides facilities for storing multiple whole
filesystems within an image file. With support for extended attributes,
a directory (or whole filesystem) can be accurately captured within an
EC3 image, while compression and chunk-based data de-duplication greatly
EC3 image, while compression and cluster-based data de-duplication greatly
reduces the amount of disk space required.
@@ -54,6 +54,49 @@ version: 1.0
that are used.
1.3 Terminology
---------------
Several terms have particular meaning in the context of EC3. Those terms
and their meaning are listed here.
1.1.1 Image
An Image is any EC3 file. An Image contains one or more Tags containing
binary data.
1.1.2 Tag
A Tag is a contiguous range of binary data, with an associated type and
identifier. The type of a Tag determines the format of the data and how
it should be interpreted, while the identifier can be used to distinguish
one Tag from another.
1.1.2 Container
A Container refers to an EC3 file that contains one or more Volumes. It
is analogous to a storage device that contains one or more formatted
partitions. Containers represent a subset of Images: while all Containers
are Images, not all Images are Containers.
1.1.3 Volume
A Volume is a structured collection of logical files and directories
stored within a Container. It is analogous to a partition of a storage
device. The data that makes up a Volume is stored across a set of Tags
within an Image.
1.1.4 Image Key
The Image Key is the symmetric cryptograpic key used to encrypt and
decrypt data within an Image.
1.1.5 Image Certificate
The Image Certificate is a cryptographic public key and certificate that
is embedded within an Image, and is used for digital signature
verification.
1.1.6 Image Signature
The Image Signature is the cryptographic signature that is calculated
from the data stored in the Image, and stored in a dedicated Tag.
2 Overview
==========
@@ -81,21 +124,21 @@ version: 1.0
EC3 builds upon this concept by employing cross-volume data de-duplication.
Every file that is stored within an EC3 image is split into a set of fixed-
size, content-addressed chunks. The size of these chunks is constant within
a container. A typical chunk size would be 32KB. So, if two files within
a container have the same contents, even if those files are in different
volumes, the files will reference the same range of chunks. Only one copy
of the file data is stored within the container. Even if the two files vary
to some degree, as long as at least one chunk's worth of data is identical,
some data can still be shared between the files.
size, content-addressed clusters. The size of these clusters is constant
within a container. A typical cluster size would be 32KB. So, if two files
within a container have the same contents, even if those files are in
different volumes, the files will reference the same range of clusters. Only
one copy of the file data is stored within the container. Even if the two
files vary to some degree, as long as at least one cluster's worth of data is
identical, some data can still be shared between the files.
Chunks can also be compressed to further reduce file size. The chunking
Clusters can also be compressed to further reduce file size. The clustering
system provides some additional benefits when compression is in use. Seeking
through a file is more performant, as you don't have to decompress the entire
file to reach the target offset. You can simply skip to the chunk that
file to reach the target offset. You can simply skip to the cluster that
corresponds to the offset you're looking for. Editing files within a volume
is also easier as, again, you only have to decompress and re-write the chunk
that has changed.
is also easier as, again, you only have to decompress and re-write the
cluster that has changed.
Alongside volumes, EC3 images can contain a range of other data, including:
* Manifests
@@ -186,7 +229,7 @@ version: 1.0
The Slow Hash function is optimised for minimal chance of hash collisions.
It is intended to generate the content hashes used to uniquely identify data
chunks. The algorithm used for this purpose is the SHA-3 algorithm with a
clusters. The algorithm used for this purpose is the SHA-3 algorithm with a
256-bit digest size.
@@ -223,7 +266,7 @@ version: 1.0
----------------------------------------
0x00 Signature uint32
0x04 Format Version uint16
0x06 Chunk Size uint16
0x06 Cluster Size uint16
0x08 Tag Table Offset uint64
0x10 Tag Count uint64
0x18 Application Magic uint64
@@ -247,22 +290,22 @@ version: 1.0
the minor version of the format version. For example, version 3.2 would
be encoded as 0x0302.
5.1.3 Chunk Size
This specifies the size of all data chunks stored within the image, before
any transformation operations such as compression or encryption are
5.1.3 Cluster Size
This specifies the size of all data clusters stored within the image,
before any transformation operations such as compression or encryption are
applied.
The following chunk size values are defined:
The following cluster size values are defined:
Header Value Chunk Size (bytes) Chunk Size (kilobytes)
----------------------------------------------------------------
0x00 16,384 16
0x01 32,768 32
0x02 65,536 64
0x03 131,072 128
0x04 262,144 256
0x05 524,288 512
0x06 1,048,576 1,024
Header Value Cluster Size (bytes) Cluster Size (kilobytes)
--------------------------------------------------------------------
0x00 16,384 16
0x01 32,768 32
0x02 65,536 64
0x03 131,072 128
0x04 262,144 256
0x05 524,288 512
0x06 1,048,576 1,024
5.1.4 Tag Table Offset
This specifies the offset in bytes from the beginning of the image file
@@ -320,7 +363,7 @@ version: 1.0
6.1.3 Checksum
A checksum of the tag data, calculated on the raw data as it appears
on-disk, after any data processing layers (compression, encryption, etc)
on-disk, after any Data Filters (compression, encryption, etc)
have been applied. This checksum should be checked before the tag data is
processed any further. The checksum is calculated using the algorithm
described in Section 4.3
@@ -346,9 +389,9 @@ version: 1.0
Volume tags contain the filesystem tree and file/directory metadata for a
single volume within the container.
6.2.2 CTAB: Chunk Table
The Chunk Table contains the file data chunks for all volumes within the
container.
6.2.2 CTAB: Cluster Table
The Cluster Table contains the file data clusters for all volumes within
the container.
6.2.3 XATR: Extended Attributes Table
The Extended Attributes table contains any extended attributes referenced
@@ -390,46 +433,112 @@ version: 1.0
6.3 Tag Flags
-------------
A Tag can have a number of different flags set. A full list of these flags,
including their values and meanings, is provided here.
6.3.1 0x00000001: Signed
The data in this Tag is included in the Image's digital
signature.
6.3.2 0x00000002: Compressed
The data in this Tag is compressed. Note that, in most cases, this flag
will not be enabled on the Cluster Table, as each Cluster is compressed
separately.
6.3.3 0x00000004: Encrypted
The data in this Tag is encrypted using the Image Key.
6.4 Tag Identifiers
-------------------
Every Tag in an Image must have a unique Identifier. The Identifier is a
64-bit integer value, which can optionally be interpreted as a string of no
more than 8 ASCII characters.
7 Manifest
==========
8 Volumes
=========
8.1 Filesystem Tree
-------------------
If no Identifier is specified for a Tag, a sequential Identifier should be
assigned automatically.
8.2 Clusters
------------
6.5 Data Filtering
------------------
The different types of processing that can be performed on a Tag's data, such
as encryption and compression, are called Filters. Filters are applied to a
Tag's data as it is being written, and are applied in reverse order when the
data is being read.
To facilitate multiple Filters being used together, the order in which
Filters are applied to a particular Tag's data is strictly defined. When
It is critical that Filters are applied in the correct order to maximise
effectiveness. For example, Tag data must be compressed BEFORE it is encrypted.
Encrypting data greatly increases its entropy and "randomness", making it
essentially uncompressable.
The types of Filters supported by EC3 are listed below, in the order they are
applied when writing data to a Tag. When reading Tag data, the filters are
applied in the reverse order.
6.3.1 Compression
Tag data is compressed before being written to the Image to reduce
file size. This is the only Filter that changes the amount of data that
is written to a file.
Note that this Filter will reduce I/O performance and require that data
is read sequentially from the Tag. Random access to compressed Tag data
is not supported.
6.3.2 Encryption
Tag data is encrypted using the specified encryption key before being
written to disk.
6.3.3 Digital Signature
Tag data is included in the set of data that makes up the Image's digital
signature. Unlike the other Filters, this one does not modify the Tag
data that is written to the Image, but rather specifies that the data is
included as part of the whole Image's digital signature hash.
More information about how the Image Signature is calculated and verified
can be found in Section 11.
8.3 String Table
----------------
8.4 Extended Attributes
-----------------------
9 Binary Blobs
7 String Table
==============
10 Embedded Executables
8 Manifest
==========
9 Volumes
=========
9.1 Filesystem Tree
-------------------
9.2 Clusters
------------
9.3 Extended Attributes
-----------------------
10 Binary Blobs
===============
11 Embedded Executables
=======================
11 Signature Verification
12 Signature Verification
=========================
12 Encryption
13 Encryption
=============

BIN
doc/res/logo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB