doc: lots more information about the image layout
This commit is contained in:
201
doc/format.txt
201
doc/format.txt
@@ -38,7 +38,7 @@ version: 1.0
|
||||
On top of this base, EC3 provides facilities for storing multiple whole
|
||||
filesystems within an image file. With support for extended attributes,
|
||||
a directory (or whole filesystem) can be accurately captured within an
|
||||
EC3 image, while compression and chunk-based data de-duplication greatly
|
||||
EC3 image, while compression and cluster-based data de-duplication greatly
|
||||
reduces the amount of disk space required.
|
||||
|
||||
|
||||
@@ -54,6 +54,49 @@ version: 1.0
|
||||
that are used.
|
||||
|
||||
|
||||
1.3 Terminology
|
||||
---------------
|
||||
|
||||
Several terms have particular meaning in the context of EC3. Those terms
|
||||
and their meaning are listed here.
|
||||
|
||||
|
||||
1.1.1 Image
|
||||
An Image is any EC3 file. An Image contains one or more Tags containing
|
||||
binary data.
|
||||
|
||||
1.1.2 Tag
|
||||
A Tag is a contiguous range of binary data, with an associated type and
|
||||
identifier. The type of a Tag determines the format of the data and how
|
||||
it should be interpreted, while the identifier can be used to distinguish
|
||||
one Tag from another.
|
||||
|
||||
1.1.2 Container
|
||||
A Container refers to an EC3 file that contains one or more Volumes. It
|
||||
is analogous to a storage device that contains one or more formatted
|
||||
partitions. Containers represent a subset of Images: while all Containers
|
||||
are Images, not all Images are Containers.
|
||||
|
||||
1.1.3 Volume
|
||||
A Volume is a structured collection of logical files and directories
|
||||
stored within a Container. It is analogous to a partition of a storage
|
||||
device. The data that makes up a Volume is stored across a set of Tags
|
||||
within an Image.
|
||||
|
||||
1.1.4 Image Key
|
||||
The Image Key is the symmetric cryptograpic key used to encrypt and
|
||||
decrypt data within an Image.
|
||||
|
||||
1.1.5 Image Certificate
|
||||
The Image Certificate is a cryptographic public key and certificate that
|
||||
is embedded within an Image, and is used for digital signature
|
||||
verification.
|
||||
|
||||
1.1.6 Image Signature
|
||||
The Image Signature is the cryptographic signature that is calculated
|
||||
from the data stored in the Image, and stored in a dedicated Tag.
|
||||
|
||||
|
||||
2 Overview
|
||||
==========
|
||||
|
||||
@@ -81,21 +124,21 @@ version: 1.0
|
||||
|
||||
EC3 builds upon this concept by employing cross-volume data de-duplication.
|
||||
Every file that is stored within an EC3 image is split into a set of fixed-
|
||||
size, content-addressed chunks. The size of these chunks is constant within
|
||||
a container. A typical chunk size would be 32KB. So, if two files within
|
||||
a container have the same contents, even if those files are in different
|
||||
volumes, the files will reference the same range of chunks. Only one copy
|
||||
of the file data is stored within the container. Even if the two files vary
|
||||
to some degree, as long as at least one chunk's worth of data is identical,
|
||||
some data can still be shared between the files.
|
||||
size, content-addressed clusters. The size of these clusters is constant
|
||||
within a container. A typical cluster size would be 32KB. So, if two files
|
||||
within a container have the same contents, even if those files are in
|
||||
different volumes, the files will reference the same range of clusters. Only
|
||||
one copy of the file data is stored within the container. Even if the two
|
||||
files vary to some degree, as long as at least one cluster's worth of data is
|
||||
identical, some data can still be shared between the files.
|
||||
|
||||
Chunks can also be compressed to further reduce file size. The chunking
|
||||
Clusters can also be compressed to further reduce file size. The clustering
|
||||
system provides some additional benefits when compression is in use. Seeking
|
||||
through a file is more performant, as you don't have to decompress the entire
|
||||
file to reach the target offset. You can simply skip to the chunk that
|
||||
file to reach the target offset. You can simply skip to the cluster that
|
||||
corresponds to the offset you're looking for. Editing files within a volume
|
||||
is also easier as, again, you only have to decompress and re-write the chunk
|
||||
that has changed.
|
||||
is also easier as, again, you only have to decompress and re-write the
|
||||
cluster that has changed.
|
||||
|
||||
Alongside volumes, EC3 images can contain a range of other data, including:
|
||||
* Manifests
|
||||
@@ -186,7 +229,7 @@ version: 1.0
|
||||
|
||||
The Slow Hash function is optimised for minimal chance of hash collisions.
|
||||
It is intended to generate the content hashes used to uniquely identify data
|
||||
chunks. The algorithm used for this purpose is the SHA-3 algorithm with a
|
||||
clusters. The algorithm used for this purpose is the SHA-3 algorithm with a
|
||||
256-bit digest size.
|
||||
|
||||
|
||||
@@ -223,7 +266,7 @@ version: 1.0
|
||||
----------------------------------------
|
||||
0x00 Signature uint32
|
||||
0x04 Format Version uint16
|
||||
0x06 Chunk Size uint16
|
||||
0x06 Cluster Size uint16
|
||||
0x08 Tag Table Offset uint64
|
||||
0x10 Tag Count uint64
|
||||
0x18 Application Magic uint64
|
||||
@@ -247,15 +290,15 @@ version: 1.0
|
||||
the minor version of the format version. For example, version 3.2 would
|
||||
be encoded as 0x0302.
|
||||
|
||||
5.1.3 Chunk Size
|
||||
This specifies the size of all data chunks stored within the image, before
|
||||
any transformation operations such as compression or encryption are
|
||||
5.1.3 Cluster Size
|
||||
This specifies the size of all data clusters stored within the image,
|
||||
before any transformation operations such as compression or encryption are
|
||||
applied.
|
||||
|
||||
The following chunk size values are defined:
|
||||
The following cluster size values are defined:
|
||||
|
||||
Header Value Chunk Size (bytes) Chunk Size (kilobytes)
|
||||
----------------------------------------------------------------
|
||||
Header Value Cluster Size (bytes) Cluster Size (kilobytes)
|
||||
--------------------------------------------------------------------
|
||||
0x00 16,384 16
|
||||
0x01 32,768 32
|
||||
0x02 65,536 64
|
||||
@@ -320,7 +363,7 @@ version: 1.0
|
||||
|
||||
6.1.3 Checksum
|
||||
A checksum of the tag data, calculated on the raw data as it appears
|
||||
on-disk, after any data processing layers (compression, encryption, etc)
|
||||
on-disk, after any Data Filters (compression, encryption, etc)
|
||||
have been applied. This checksum should be checked before the tag data is
|
||||
processed any further. The checksum is calculated using the algorithm
|
||||
described in Section 4.3
|
||||
@@ -346,9 +389,9 @@ version: 1.0
|
||||
Volume tags contain the filesystem tree and file/directory metadata for a
|
||||
single volume within the container.
|
||||
|
||||
6.2.2 CTAB: Chunk Table
|
||||
The Chunk Table contains the file data chunks for all volumes within the
|
||||
container.
|
||||
6.2.2 CTAB: Cluster Table
|
||||
The Cluster Table contains the file data clusters for all volumes within
|
||||
the container.
|
||||
|
||||
6.2.3 XATR: Extended Attributes Table
|
||||
The Extended Attributes table contains any extended attributes referenced
|
||||
@@ -390,46 +433,112 @@ version: 1.0
|
||||
6.3 Tag Flags
|
||||
-------------
|
||||
|
||||
A Tag can have a number of different flags set. A full list of these flags,
|
||||
including their values and meanings, is provided here.
|
||||
|
||||
6.3.1 0x00000001: Signed
|
||||
The data in this Tag is included in the Image's digital
|
||||
signature.
|
||||
|
||||
6.3.2 0x00000002: Compressed
|
||||
The data in this Tag is compressed. Note that, in most cases, this flag
|
||||
will not be enabled on the Cluster Table, as each Cluster is compressed
|
||||
separately.
|
||||
|
||||
6.3.3 0x00000004: Encrypted
|
||||
The data in this Tag is encrypted using the Image Key.
|
||||
|
||||
|
||||
6.4 Tag Identifiers
|
||||
-------------------
|
||||
|
||||
Every Tag in an Image must have a unique Identifier. The Identifier is a
|
||||
64-bit integer value, which can optionally be interpreted as a string of no
|
||||
more than 8 ASCII characters.
|
||||
|
||||
7 Manifest
|
||||
==========
|
||||
|
||||
8 Volumes
|
||||
=========
|
||||
|
||||
8.1 Filesystem Tree
|
||||
-------------------
|
||||
If no Identifier is specified for a Tag, a sequential Identifier should be
|
||||
assigned automatically.
|
||||
|
||||
|
||||
8.2 Clusters
|
||||
------------
|
||||
6.5 Data Filtering
|
||||
------------------
|
||||
|
||||
The different types of processing that can be performed on a Tag's data, such
|
||||
as encryption and compression, are called Filters. Filters are applied to a
|
||||
Tag's data as it is being written, and are applied in reverse order when the
|
||||
data is being read.
|
||||
|
||||
To facilitate multiple Filters being used together, the order in which
|
||||
Filters are applied to a particular Tag's data is strictly defined. When
|
||||
|
||||
It is critical that Filters are applied in the correct order to maximise
|
||||
effectiveness. For example, Tag data must be compressed BEFORE it is encrypted.
|
||||
Encrypting data greatly increases its entropy and "randomness", making it
|
||||
essentially uncompressable.
|
||||
|
||||
The types of Filters supported by EC3 are listed below, in the order they are
|
||||
applied when writing data to a Tag. When reading Tag data, the filters are
|
||||
applied in the reverse order.
|
||||
|
||||
6.3.1 Compression
|
||||
Tag data is compressed before being written to the Image to reduce
|
||||
file size. This is the only Filter that changes the amount of data that
|
||||
is written to a file.
|
||||
|
||||
Note that this Filter will reduce I/O performance and require that data
|
||||
is read sequentially from the Tag. Random access to compressed Tag data
|
||||
is not supported.
|
||||
|
||||
6.3.2 Encryption
|
||||
Tag data is encrypted using the specified encryption key before being
|
||||
written to disk.
|
||||
|
||||
6.3.3 Digital Signature
|
||||
Tag data is included in the set of data that makes up the Image's digital
|
||||
signature. Unlike the other Filters, this one does not modify the Tag
|
||||
data that is written to the Image, but rather specifies that the data is
|
||||
included as part of the whole Image's digital signature hash.
|
||||
|
||||
More information about how the Image Signature is calculated and verified
|
||||
can be found in Section 11.
|
||||
|
||||
|
||||
8.3 String Table
|
||||
----------------
|
||||
|
||||
|
||||
8.4 Extended Attributes
|
||||
-----------------------
|
||||
|
||||
|
||||
9 Binary Blobs
|
||||
7 String Table
|
||||
==============
|
||||
|
||||
|
||||
10 Embedded Executables
|
||||
8 Manifest
|
||||
==========
|
||||
|
||||
|
||||
9 Volumes
|
||||
=========
|
||||
|
||||
9.1 Filesystem Tree
|
||||
-------------------
|
||||
|
||||
|
||||
9.2 Clusters
|
||||
------------
|
||||
|
||||
|
||||
9.3 Extended Attributes
|
||||
-----------------------
|
||||
|
||||
|
||||
10 Binary Blobs
|
||||
===============
|
||||
|
||||
|
||||
11 Embedded Executables
|
||||
=======================
|
||||
|
||||
|
||||
11 Signature Verification
|
||||
12 Signature Verification
|
||||
=========================
|
||||
|
||||
|
||||
12 Encryption
|
||||
13 Encryption
|
||||
=============
|
||||
|
||||
|
||||
|
||||
BIN
doc/res/logo.png
Normal file
BIN
doc/res/logo.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 24 KiB |
Reference in New Issue
Block a user