doc: lots more information about the image layout
This commit is contained in:
215
doc/format.txt
215
doc/format.txt
@@ -38,7 +38,7 @@ version: 1.0
|
|||||||
On top of this base, EC3 provides facilities for storing multiple whole
|
On top of this base, EC3 provides facilities for storing multiple whole
|
||||||
filesystems within an image file. With support for extended attributes,
|
filesystems within an image file. With support for extended attributes,
|
||||||
a directory (or whole filesystem) can be accurately captured within an
|
a directory (or whole filesystem) can be accurately captured within an
|
||||||
EC3 image, while compression and chunk-based data de-duplication greatly
|
EC3 image, while compression and cluster-based data de-duplication greatly
|
||||||
reduces the amount of disk space required.
|
reduces the amount of disk space required.
|
||||||
|
|
||||||
|
|
||||||
@@ -54,6 +54,49 @@ version: 1.0
|
|||||||
that are used.
|
that are used.
|
||||||
|
|
||||||
|
|
||||||
|
1.3 Terminology
|
||||||
|
---------------
|
||||||
|
|
||||||
|
Several terms have particular meaning in the context of EC3. Those terms
|
||||||
|
and their meaning are listed here.
|
||||||
|
|
||||||
|
|
||||||
|
1.1.1 Image
|
||||||
|
An Image is any EC3 file. An Image contains one or more Tags containing
|
||||||
|
binary data.
|
||||||
|
|
||||||
|
1.1.2 Tag
|
||||||
|
A Tag is a contiguous range of binary data, with an associated type and
|
||||||
|
identifier. The type of a Tag determines the format of the data and how
|
||||||
|
it should be interpreted, while the identifier can be used to distinguish
|
||||||
|
one Tag from another.
|
||||||
|
|
||||||
|
1.1.2 Container
|
||||||
|
A Container refers to an EC3 file that contains one or more Volumes. It
|
||||||
|
is analogous to a storage device that contains one or more formatted
|
||||||
|
partitions. Containers represent a subset of Images: while all Containers
|
||||||
|
are Images, not all Images are Containers.
|
||||||
|
|
||||||
|
1.1.3 Volume
|
||||||
|
A Volume is a structured collection of logical files and directories
|
||||||
|
stored within a Container. It is analogous to a partition of a storage
|
||||||
|
device. The data that makes up a Volume is stored across a set of Tags
|
||||||
|
within an Image.
|
||||||
|
|
||||||
|
1.1.4 Image Key
|
||||||
|
The Image Key is the symmetric cryptograpic key used to encrypt and
|
||||||
|
decrypt data within an Image.
|
||||||
|
|
||||||
|
1.1.5 Image Certificate
|
||||||
|
The Image Certificate is a cryptographic public key and certificate that
|
||||||
|
is embedded within an Image, and is used for digital signature
|
||||||
|
verification.
|
||||||
|
|
||||||
|
1.1.6 Image Signature
|
||||||
|
The Image Signature is the cryptographic signature that is calculated
|
||||||
|
from the data stored in the Image, and stored in a dedicated Tag.
|
||||||
|
|
||||||
|
|
||||||
2 Overview
|
2 Overview
|
||||||
==========
|
==========
|
||||||
|
|
||||||
@@ -81,21 +124,21 @@ version: 1.0
|
|||||||
|
|
||||||
EC3 builds upon this concept by employing cross-volume data de-duplication.
|
EC3 builds upon this concept by employing cross-volume data de-duplication.
|
||||||
Every file that is stored within an EC3 image is split into a set of fixed-
|
Every file that is stored within an EC3 image is split into a set of fixed-
|
||||||
size, content-addressed chunks. The size of these chunks is constant within
|
size, content-addressed clusters. The size of these clusters is constant
|
||||||
a container. A typical chunk size would be 32KB. So, if two files within
|
within a container. A typical cluster size would be 32KB. So, if two files
|
||||||
a container have the same contents, even if those files are in different
|
within a container have the same contents, even if those files are in
|
||||||
volumes, the files will reference the same range of chunks. Only one copy
|
different volumes, the files will reference the same range of clusters. Only
|
||||||
of the file data is stored within the container. Even if the two files vary
|
one copy of the file data is stored within the container. Even if the two
|
||||||
to some degree, as long as at least one chunk's worth of data is identical,
|
files vary to some degree, as long as at least one cluster's worth of data is
|
||||||
some data can still be shared between the files.
|
identical, some data can still be shared between the files.
|
||||||
|
|
||||||
Chunks can also be compressed to further reduce file size. The chunking
|
Clusters can also be compressed to further reduce file size. The clustering
|
||||||
system provides some additional benefits when compression is in use. Seeking
|
system provides some additional benefits when compression is in use. Seeking
|
||||||
through a file is more performant, as you don't have to decompress the entire
|
through a file is more performant, as you don't have to decompress the entire
|
||||||
file to reach the target offset. You can simply skip to the chunk that
|
file to reach the target offset. You can simply skip to the cluster that
|
||||||
corresponds to the offset you're looking for. Editing files within a volume
|
corresponds to the offset you're looking for. Editing files within a volume
|
||||||
is also easier as, again, you only have to decompress and re-write the chunk
|
is also easier as, again, you only have to decompress and re-write the
|
||||||
that has changed.
|
cluster that has changed.
|
||||||
|
|
||||||
Alongside volumes, EC3 images can contain a range of other data, including:
|
Alongside volumes, EC3 images can contain a range of other data, including:
|
||||||
* Manifests
|
* Manifests
|
||||||
@@ -186,7 +229,7 @@ version: 1.0
|
|||||||
|
|
||||||
The Slow Hash function is optimised for minimal chance of hash collisions.
|
The Slow Hash function is optimised for minimal chance of hash collisions.
|
||||||
It is intended to generate the content hashes used to uniquely identify data
|
It is intended to generate the content hashes used to uniquely identify data
|
||||||
chunks. The algorithm used for this purpose is the SHA-3 algorithm with a
|
clusters. The algorithm used for this purpose is the SHA-3 algorithm with a
|
||||||
256-bit digest size.
|
256-bit digest size.
|
||||||
|
|
||||||
|
|
||||||
@@ -223,7 +266,7 @@ version: 1.0
|
|||||||
----------------------------------------
|
----------------------------------------
|
||||||
0x00 Signature uint32
|
0x00 Signature uint32
|
||||||
0x04 Format Version uint16
|
0x04 Format Version uint16
|
||||||
0x06 Chunk Size uint16
|
0x06 Cluster Size uint16
|
||||||
0x08 Tag Table Offset uint64
|
0x08 Tag Table Offset uint64
|
||||||
0x10 Tag Count uint64
|
0x10 Tag Count uint64
|
||||||
0x18 Application Magic uint64
|
0x18 Application Magic uint64
|
||||||
@@ -247,22 +290,22 @@ version: 1.0
|
|||||||
the minor version of the format version. For example, version 3.2 would
|
the minor version of the format version. For example, version 3.2 would
|
||||||
be encoded as 0x0302.
|
be encoded as 0x0302.
|
||||||
|
|
||||||
5.1.3 Chunk Size
|
5.1.3 Cluster Size
|
||||||
This specifies the size of all data chunks stored within the image, before
|
This specifies the size of all data clusters stored within the image,
|
||||||
any transformation operations such as compression or encryption are
|
before any transformation operations such as compression or encryption are
|
||||||
applied.
|
applied.
|
||||||
|
|
||||||
The following chunk size values are defined:
|
The following cluster size values are defined:
|
||||||
|
|
||||||
Header Value Chunk Size (bytes) Chunk Size (kilobytes)
|
Header Value Cluster Size (bytes) Cluster Size (kilobytes)
|
||||||
----------------------------------------------------------------
|
--------------------------------------------------------------------
|
||||||
0x00 16,384 16
|
0x00 16,384 16
|
||||||
0x01 32,768 32
|
0x01 32,768 32
|
||||||
0x02 65,536 64
|
0x02 65,536 64
|
||||||
0x03 131,072 128
|
0x03 131,072 128
|
||||||
0x04 262,144 256
|
0x04 262,144 256
|
||||||
0x05 524,288 512
|
0x05 524,288 512
|
||||||
0x06 1,048,576 1,024
|
0x06 1,048,576 1,024
|
||||||
|
|
||||||
5.1.4 Tag Table Offset
|
5.1.4 Tag Table Offset
|
||||||
This specifies the offset in bytes from the beginning of the image file
|
This specifies the offset in bytes from the beginning of the image file
|
||||||
@@ -320,7 +363,7 @@ version: 1.0
|
|||||||
|
|
||||||
6.1.3 Checksum
|
6.1.3 Checksum
|
||||||
A checksum of the tag data, calculated on the raw data as it appears
|
A checksum of the tag data, calculated on the raw data as it appears
|
||||||
on-disk, after any data processing layers (compression, encryption, etc)
|
on-disk, after any Data Filters (compression, encryption, etc)
|
||||||
have been applied. This checksum should be checked before the tag data is
|
have been applied. This checksum should be checked before the tag data is
|
||||||
processed any further. The checksum is calculated using the algorithm
|
processed any further. The checksum is calculated using the algorithm
|
||||||
described in Section 4.3
|
described in Section 4.3
|
||||||
@@ -346,9 +389,9 @@ version: 1.0
|
|||||||
Volume tags contain the filesystem tree and file/directory metadata for a
|
Volume tags contain the filesystem tree and file/directory metadata for a
|
||||||
single volume within the container.
|
single volume within the container.
|
||||||
|
|
||||||
6.2.2 CTAB: Chunk Table
|
6.2.2 CTAB: Cluster Table
|
||||||
The Chunk Table contains the file data chunks for all volumes within the
|
The Cluster Table contains the file data clusters for all volumes within
|
||||||
container.
|
the container.
|
||||||
|
|
||||||
6.2.3 XATR: Extended Attributes Table
|
6.2.3 XATR: Extended Attributes Table
|
||||||
The Extended Attributes table contains any extended attributes referenced
|
The Extended Attributes table contains any extended attributes referenced
|
||||||
@@ -390,46 +433,112 @@ version: 1.0
|
|||||||
6.3 Tag Flags
|
6.3 Tag Flags
|
||||||
-------------
|
-------------
|
||||||
|
|
||||||
|
A Tag can have a number of different flags set. A full list of these flags,
|
||||||
|
including their values and meanings, is provided here.
|
||||||
|
|
||||||
|
6.3.1 0x00000001: Signed
|
||||||
|
The data in this Tag is included in the Image's digital
|
||||||
|
signature.
|
||||||
|
|
||||||
|
6.3.2 0x00000002: Compressed
|
||||||
|
The data in this Tag is compressed. Note that, in most cases, this flag
|
||||||
|
will not be enabled on the Cluster Table, as each Cluster is compressed
|
||||||
|
separately.
|
||||||
|
|
||||||
|
6.3.3 0x00000004: Encrypted
|
||||||
|
The data in this Tag is encrypted using the Image Key.
|
||||||
|
|
||||||
|
|
||||||
6.4 Tag Identifiers
|
6.4 Tag Identifiers
|
||||||
-------------------
|
-------------------
|
||||||
|
|
||||||
|
Every Tag in an Image must have a unique Identifier. The Identifier is a
|
||||||
|
64-bit integer value, which can optionally be interpreted as a string of no
|
||||||
|
more than 8 ASCII characters.
|
||||||
|
|
||||||
7 Manifest
|
If no Identifier is specified for a Tag, a sequential Identifier should be
|
||||||
==========
|
assigned automatically.
|
||||||
|
|
||||||
8 Volumes
|
|
||||||
=========
|
|
||||||
|
|
||||||
8.1 Filesystem Tree
|
|
||||||
-------------------
|
|
||||||
|
|
||||||
|
|
||||||
8.2 Clusters
|
6.5 Data Filtering
|
||||||
------------
|
------------------
|
||||||
|
|
||||||
|
The different types of processing that can be performed on a Tag's data, such
|
||||||
|
as encryption and compression, are called Filters. Filters are applied to a
|
||||||
|
Tag's data as it is being written, and are applied in reverse order when the
|
||||||
|
data is being read.
|
||||||
|
|
||||||
|
To facilitate multiple Filters being used together, the order in which
|
||||||
|
Filters are applied to a particular Tag's data is strictly defined. When
|
||||||
|
|
||||||
|
It is critical that Filters are applied in the correct order to maximise
|
||||||
|
effectiveness. For example, Tag data must be compressed BEFORE it is encrypted.
|
||||||
|
Encrypting data greatly increases its entropy and "randomness", making it
|
||||||
|
essentially uncompressable.
|
||||||
|
|
||||||
|
The types of Filters supported by EC3 are listed below, in the order they are
|
||||||
|
applied when writing data to a Tag. When reading Tag data, the filters are
|
||||||
|
applied in the reverse order.
|
||||||
|
|
||||||
|
6.3.1 Compression
|
||||||
|
Tag data is compressed before being written to the Image to reduce
|
||||||
|
file size. This is the only Filter that changes the amount of data that
|
||||||
|
is written to a file.
|
||||||
|
|
||||||
|
Note that this Filter will reduce I/O performance and require that data
|
||||||
|
is read sequentially from the Tag. Random access to compressed Tag data
|
||||||
|
is not supported.
|
||||||
|
|
||||||
|
6.3.2 Encryption
|
||||||
|
Tag data is encrypted using the specified encryption key before being
|
||||||
|
written to disk.
|
||||||
|
|
||||||
|
6.3.3 Digital Signature
|
||||||
|
Tag data is included in the set of data that makes up the Image's digital
|
||||||
|
signature. Unlike the other Filters, this one does not modify the Tag
|
||||||
|
data that is written to the Image, but rather specifies that the data is
|
||||||
|
included as part of the whole Image's digital signature hash.
|
||||||
|
|
||||||
|
More information about how the Image Signature is calculated and verified
|
||||||
|
can be found in Section 11.
|
||||||
|
|
||||||
|
|
||||||
8.3 String Table
|
7 String Table
|
||||||
----------------
|
|
||||||
|
|
||||||
|
|
||||||
8.4 Extended Attributes
|
|
||||||
-----------------------
|
|
||||||
|
|
||||||
|
|
||||||
9 Binary Blobs
|
|
||||||
==============
|
==============
|
||||||
|
|
||||||
|
|
||||||
10 Embedded Executables
|
8 Manifest
|
||||||
|
==========
|
||||||
|
|
||||||
|
|
||||||
|
9 Volumes
|
||||||
|
=========
|
||||||
|
|
||||||
|
9.1 Filesystem Tree
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
|
||||||
|
9.2 Clusters
|
||||||
|
------------
|
||||||
|
|
||||||
|
|
||||||
|
9.3 Extended Attributes
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
|
||||||
|
10 Binary Blobs
|
||||||
|
===============
|
||||||
|
|
||||||
|
|
||||||
|
11 Embedded Executables
|
||||||
=======================
|
=======================
|
||||||
|
|
||||||
|
|
||||||
11 Signature Verification
|
12 Signature Verification
|
||||||
=========================
|
=========================
|
||||||
|
|
||||||
|
|
||||||
12 Encryption
|
13 Encryption
|
||||||
=============
|
=============
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
BIN
doc/res/logo.png
Normal file
BIN
doc/res/logo.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 24 KiB |
Reference in New Issue
Block a user