diff --git a/doc/format.txt b/doc/format.txt index 90f714c..e5c3a03 100755 --- a/doc/format.txt +++ b/doc/format.txt @@ -38,7 +38,7 @@ version: 1.0 On top of this base, EC3 provides facilities for storing multiple whole filesystems within an image file. With support for extended attributes, a directory (or whole filesystem) can be accurately captured within an - EC3 image, while compression and chunk-based data de-duplication greatly + EC3 image, while compression and cluster-based data de-duplication greatly reduces the amount of disk space required. @@ -54,6 +54,49 @@ version: 1.0 that are used. + 1.3 Terminology + --------------- + + Several terms have particular meaning in the context of EC3. Those terms + and their meaning are listed here. + + + 1.1.1 Image + An Image is any EC3 file. An Image contains one or more Tags containing + binary data. + + 1.1.2 Tag + A Tag is a contiguous range of binary data, with an associated type and + identifier. The type of a Tag determines the format of the data and how + it should be interpreted, while the identifier can be used to distinguish + one Tag from another. + + 1.1.2 Container + A Container refers to an EC3 file that contains one or more Volumes. It + is analogous to a storage device that contains one or more formatted + partitions. Containers represent a subset of Images: while all Containers + are Images, not all Images are Containers. + + 1.1.3 Volume + A Volume is a structured collection of logical files and directories + stored within a Container. It is analogous to a partition of a storage + device. The data that makes up a Volume is stored across a set of Tags + within an Image. + + 1.1.4 Image Key + The Image Key is the symmetric cryptograpic key used to encrypt and + decrypt data within an Image. + + 1.1.5 Image Certificate + The Image Certificate is a cryptographic public key and certificate that + is embedded within an Image, and is used for digital signature + verification. + + 1.1.6 Image Signature + The Image Signature is the cryptographic signature that is calculated + from the data stored in the Image, and stored in a dedicated Tag. + + 2 Overview ========== @@ -81,21 +124,21 @@ version: 1.0 EC3 builds upon this concept by employing cross-volume data de-duplication. Every file that is stored within an EC3 image is split into a set of fixed- - size, content-addressed chunks. The size of these chunks is constant within - a container. A typical chunk size would be 32KB. So, if two files within - a container have the same contents, even if those files are in different - volumes, the files will reference the same range of chunks. Only one copy - of the file data is stored within the container. Even if the two files vary - to some degree, as long as at least one chunk's worth of data is identical, - some data can still be shared between the files. + size, content-addressed clusters. The size of these clusters is constant + within a container. A typical cluster size would be 32KB. So, if two files + within a container have the same contents, even if those files are in + different volumes, the files will reference the same range of clusters. Only + one copy of the file data is stored within the container. Even if the two + files vary to some degree, as long as at least one cluster's worth of data is + identical, some data can still be shared between the files. - Chunks can also be compressed to further reduce file size. The chunking + Clusters can also be compressed to further reduce file size. The clustering system provides some additional benefits when compression is in use. Seeking through a file is more performant, as you don't have to decompress the entire - file to reach the target offset. You can simply skip to the chunk that + file to reach the target offset. You can simply skip to the cluster that corresponds to the offset you're looking for. Editing files within a volume - is also easier as, again, you only have to decompress and re-write the chunk - that has changed. + is also easier as, again, you only have to decompress and re-write the + cluster that has changed. Alongside volumes, EC3 images can contain a range of other data, including: * Manifests @@ -186,7 +229,7 @@ version: 1.0 The Slow Hash function is optimised for minimal chance of hash collisions. It is intended to generate the content hashes used to uniquely identify data - chunks. The algorithm used for this purpose is the SHA-3 algorithm with a + clusters. The algorithm used for this purpose is the SHA-3 algorithm with a 256-bit digest size. @@ -223,7 +266,7 @@ version: 1.0 ---------------------------------------- 0x00 Signature uint32 0x04 Format Version uint16 - 0x06 Chunk Size uint16 + 0x06 Cluster Size uint16 0x08 Tag Table Offset uint64 0x10 Tag Count uint64 0x18 Application Magic uint64 @@ -247,22 +290,22 @@ version: 1.0 the minor version of the format version. For example, version 3.2 would be encoded as 0x0302. - 5.1.3 Chunk Size - This specifies the size of all data chunks stored within the image, before - any transformation operations such as compression or encryption are + 5.1.3 Cluster Size + This specifies the size of all data clusters stored within the image, + before any transformation operations such as compression or encryption are applied. - The following chunk size values are defined: + The following cluster size values are defined: - Header Value Chunk Size (bytes) Chunk Size (kilobytes) - ---------------------------------------------------------------- - 0x00 16,384 16 - 0x01 32,768 32 - 0x02 65,536 64 - 0x03 131,072 128 - 0x04 262,144 256 - 0x05 524,288 512 - 0x06 1,048,576 1,024 + Header Value Cluster Size (bytes) Cluster Size (kilobytes) + -------------------------------------------------------------------- + 0x00 16,384 16 + 0x01 32,768 32 + 0x02 65,536 64 + 0x03 131,072 128 + 0x04 262,144 256 + 0x05 524,288 512 + 0x06 1,048,576 1,024 5.1.4 Tag Table Offset This specifies the offset in bytes from the beginning of the image file @@ -320,7 +363,7 @@ version: 1.0 6.1.3 Checksum A checksum of the tag data, calculated on the raw data as it appears - on-disk, after any data processing layers (compression, encryption, etc) + on-disk, after any Data Filters (compression, encryption, etc) have been applied. This checksum should be checked before the tag data is processed any further. The checksum is calculated using the algorithm described in Section 4.3 @@ -346,9 +389,9 @@ version: 1.0 Volume tags contain the filesystem tree and file/directory metadata for a single volume within the container. - 6.2.2 CTAB: Chunk Table - The Chunk Table contains the file data chunks for all volumes within the - container. + 6.2.2 CTAB: Cluster Table + The Cluster Table contains the file data clusters for all volumes within + the container. 6.2.3 XATR: Extended Attributes Table The Extended Attributes table contains any extended attributes referenced @@ -390,46 +433,112 @@ version: 1.0 6.3 Tag Flags ------------- + A Tag can have a number of different flags set. A full list of these flags, + including their values and meanings, is provided here. + + 6.3.1 0x00000001: Signed + The data in this Tag is included in the Image's digital + signature. + + 6.3.2 0x00000002: Compressed + The data in this Tag is compressed. Note that, in most cases, this flag + will not be enabled on the Cluster Table, as each Cluster is compressed + separately. + + 6.3.3 0x00000004: Encrypted + The data in this Tag is encrypted using the Image Key. + 6.4 Tag Identifiers ------------------- + Every Tag in an Image must have a unique Identifier. The Identifier is a + 64-bit integer value, which can optionally be interpreted as a string of no + more than 8 ASCII characters. -7 Manifest -========== - -8 Volumes -========= - - 8.1 Filesystem Tree - ------------------- + If no Identifier is specified for a Tag, a sequential Identifier should be + assigned automatically. - 8.2 Clusters - ------------ + 6.5 Data Filtering + ------------------ + + The different types of processing that can be performed on a Tag's data, such + as encryption and compression, are called Filters. Filters are applied to a + Tag's data as it is being written, and are applied in reverse order when the + data is being read. + + To facilitate multiple Filters being used together, the order in which + Filters are applied to a particular Tag's data is strictly defined. When + + It is critical that Filters are applied in the correct order to maximise + effectiveness. For example, Tag data must be compressed BEFORE it is encrypted. + Encrypting data greatly increases its entropy and "randomness", making it + essentially uncompressable. + + The types of Filters supported by EC3 are listed below, in the order they are + applied when writing data to a Tag. When reading Tag data, the filters are + applied in the reverse order. + + 6.3.1 Compression + Tag data is compressed before being written to the Image to reduce + file size. This is the only Filter that changes the amount of data that + is written to a file. + + Note that this Filter will reduce I/O performance and require that data + is read sequentially from the Tag. Random access to compressed Tag data + is not supported. + + 6.3.2 Encryption + Tag data is encrypted using the specified encryption key before being + written to disk. + + 6.3.3 Digital Signature + Tag data is included in the set of data that makes up the Image's digital + signature. Unlike the other Filters, this one does not modify the Tag + data that is written to the Image, but rather specifies that the data is + included as part of the whole Image's digital signature hash. + + More information about how the Image Signature is calculated and verified + can be found in Section 11. - 8.3 String Table - ---------------- - - - 8.4 Extended Attributes - ----------------------- - - -9 Binary Blobs +7 String Table ============== -10 Embedded Executables +8 Manifest +========== + + +9 Volumes +========= + + 9.1 Filesystem Tree + ------------------- + + + 9.2 Clusters + ------------ + + + 9.3 Extended Attributes + ----------------------- + + +10 Binary Blobs +=============== + + +11 Embedded Executables ======================= -11 Signature Verification +12 Signature Verification ========================= -12 Encryption +13 Encryption ============= diff --git a/doc/res/logo.png b/doc/res/logo.png new file mode 100644 index 0000000..d28962a Binary files /dev/null and b/doc/res/logo.png differ