From 72da03cc86405d6ef4f529687e95c1a54f577c7d Mon Sep 17 00:00:00 2001 From: !antona Date: Thu, 22 Aug 2002 18:09:47 +0000 Subject: [PATCH] Add description of compression algorithm. (Logical change 1.5) --- doc/compression.txt | 153 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 153 insertions(+) diff --git a/doc/compression.txt b/doc/compression.txt index e69de29b..90d2f0c1 100644 --- a/doc/compression.txt +++ b/doc/compression.txt @@ -0,0 +1,153 @@ + +Description of the NTFS (de)compression algorithm (based on a modified LZ77 +algorithm) + +Copyright (c) 2001 Anton Altaparmakov. + +This document is published under the GNU General Public License. + +Credits: This is based on notes taken from various places (most notably from +Regis Duchesne's NTFS documentation and from various LZ77 descriptions) and +further refined by looking at a few compressed streams to figure out some +uncertainties. + +Note: You should also read the run list description with regards to compression +in linux-ntfs/include/layout.h. Just search for "Attribute compression". +FIXME: Should merge the info from there into this document some time. + +Compressed data is organized in logical "compression" blocks (cb). Each cb has +a size (cb_size) of 2^compression_unit clusters. In all versions of Windows, +NTFS (NT/2k/XP, NTFS 1.2-3.1), the only valid compression_unit is 4, IOW, each +cb is 2^4 = 16 clusters in size. + +We detect and warn about a compression_unit != 4 but we try to decompress the +data anyway. + +Compression is only supported for cluster sizes between 512 and 4096. Thus a +cb can be between 8 and 64kiB in size. + +Each cb is independent of the other cbs and is thus the minimal unit we have +to parse even if we wanted to decompress only one byte. + +Also, a cb can be totally uncompressed and this would be indicated as a sparse +cb in the run list. + +Thus, we need to look at the run list of the compressed data stream, starting +at the beginning of the first cb overlapping @page. So we convert the page +offset into units of clusters (vcn), and round the vcn down to a mutliple of +cb_size clusters. + +We then scan the run list for the appropriate position. Based on what we find +there, we decide how to proceed. + +If the cb is not compressed at all, and covers the whole of @page, we pretend +to be accessing an uncompressed file, so we fall back to what we do in +aops.c::ntfs_file_readpage(), i.e. we do: + return block_read_full_page(page, ntfs_file_get_block); + +If the cb is completely sparse, and covers the whole of @page, we can just +zero out @page and complete the io (set @page up-to-date, unlock it, and +finally return 0). + +In all other cases we initiate the decompression engine, but first some more +on the compression algorithm. + +Before compression the data of each cb is further divided into 4kiB blocks, we +call them "sub compression" blocks (sb), each including a header specifying +its compressed length. So we could just scan the cb for the first sb +overlapping @page and skip the sbs before that, or we could decompress the +whole cb injecting the superfluous decompressed pages into the page cache as a +form of read ahead (this is what zisofs does for example). + +In either case, we then need to read and decompress all sbs overlapping @page, +potentially having to decompress one or more other cbs, too. + +As soon as @page is completed we could either stop or continue until we finish +the current cb, injecting pages as we go along (again following the zisofs +example). + +Because the sbs follow each other directly, we need to actually read in the +whole cb in order to be able to scan through the cb to find the first sb +overlapping @page, so it does make sense to follow the zisofs approach of +decompressing the whole cb and injecting pages as we go along. So all +discussion from now on will assume that we are going to do that. Although it +might make sense not to decompress any sbs locate before @page because this +would be a kind of "read-behind" which is probably silly, unless someone is +reading the file backwards. Performing read-ahead by decompressing all sbs +following @page OTOH, is very likely to be a good idea. + +So, we read the whole cb from disk and start at the first sb. + +As mentioned above, each sb is started with a header. The header is 16 bits of +which the lower twelve bits (i.e. bits 0 to 11) are the length (L) - 3 of the +sb (including the two bytes for the header itself, or L - 1 not counting the +two bytes for the header). The higher four bits are set to 1011 (0xb) by the +compressor for a compressed block, or to 0000 for an uncompressed block, but +the decompressor only checks the most significant bit taking a 1 to signify a +compressed block, and a 0 an uncompressed block. + +So from the header we know how many compressed bytes we need to decompress to +obtain the next 4kiB of uncompressed data and if we didn't want to decompress +this sb we could just seek to the next next one using the length read from the +header. We could then continue seeking until we reach the first sb overlapping +@page. + +In either case, we will reach a sb which we want to decompress. + +Having dealt with the 16-bit header of the sb, we now have length bytes of +compressed data to decompress. This compressed stream is further split into +tokens which are organized into groups of eight tokens. Each token group (tg) +starts with a tag byte, which is an eight bit bitmap, the bits specifying the +type of each of the following eight tokens. The least significant bit (LSB) +corresponds to the first token and the most significant bit (MSB) corresponds +to the last token. + +The two types of tokens are symbol tokens, specified by a zero bit, and phrase +tokens, specified by a set bit. + +A symbol token (st) is a single byte and is to be taken literally and copied +into the sliding window (the decompressed data). + +A phrase token (pt) is a pointer back into the sliding window (in bytes), +together with a length (again in bytes), starting at the byte the back pointer +is pointing to. Thus a phrase token defines a sequence of bytes in the sliding +window which need to be copied at the current position into the sliding window +(the decompressed data stream). + +Each pt consists of 2 bytes split into the back pointer (p) and the length (l), +each of variable bit width (but the sum of the widths of p and l is fixed at +16 bits). p is at least 4 bits and l is at most 12 bits. + +The most significant bits contain the back pointer (p), while the least +significant bits contain the length (l). + +l is actually stored as the number of bytes minus 3 (unsigned) as anything +shorter than that would be at least as long as the 2 bytes needed for the +actual pt, so no compression would be achieved. + +p is stored as the positive number of bytes minus 1 (unsigned) as going zero +bytes back is meaningless. + +Note that decompression has to occur byte by byte, as it is possible that some +of the bytes pointed to by the pt will only be generated in the sliding window +as the byte sequence pointed to by the pt is being copied into it! + +To give a concrete example; a block full of the letter A would be compressed +by storing the byte A once as a symbol token, followed by a single phrase +token with back pointer -1 (p = 0, therefore go back by -(0 + 1) bytes) and +length 4095 (l=0xffc, therefore length 0xffc + 3 bytes). + +The widths of p and l are determined from the current position within the +decompressed data (cur_pos). We don't actually care about the widths as such +however, but instead we want the mask (l_mask) with which to AND the pt to +obtain l, and the number of bits (p_shift) by which to right shift the pt to +obtain p. These are determined using the following algorithm: + +for (i = cur_pos, l_mask = 0xfff, p_shift = 12; i >= 0x10; i >>= 1) { + l_mask >>= 1; + p_shift--; +} + +Note, that as usual in NTFS, the sb header, as well as each pt, are stored in +little endian format. +