GZIP is a lossless compression and decompression method.
GZIP supports the compression and decompression of a data stream to produce another data stream.
GZIP is based on DEFLATE algorithm that is a combination of LZ77 and Huffman coding.
Gzip format structure includes
┌──────────────┐
│ │
│ │
│ Header & │
│ Trailer │
│ │
│ │
├──────────────┤
│ │
│ Extra fields │
│ (optional) │
│ │
├──────────────┤
│ │
│ File name │
│ (optional) │
│ │
├──────────────┤
│ │
│ File comment │
│ (optional) │
│ │
├──────────────┤
│ │
│ CRC16 │
│ (optional) │
│ │
├──────────────┤
│ │
│ │
│ │
│ │
│ Compressed │
│ blocks │
│ │
│ │
│ │
│ │
│ │
├──────────────┤
│ │
│ │
│ CRC32 │
│ │
│ │
│ │
├──────────────┤
│ │
│ │
│ ISIZE │
│ │
│ │
│ │
└──────────────┘
┌──────────┬─────────────────────┐
│ │ ID1 (0x1f) │
│ ├─────────────────────┤
│ │ ID2 (0x8b) │
│ ├─────────────────────┤
│ │ Compression method │
│ ├─────────────────────┤
│ │ Flags │
│ ├─────────────────────┤
│ │ │
│ │ │
│ Header & │ │
│ Trailer │ │
│ │ Modification time │
│ │ │
│ │ │
│ │ │
│ │ │
│ │ │
│ ├─────────────────────┤
│ │ eXtra flags │
│ ├─────────────────────┤
│ │ Operating system │
├──────────┼─────────────────────┘
│ │
│ ... │
│ │
└──────────┘
┌──────────┬─────────┐
│ │ │
│ │ ... │
│ │ │
│ ├─────────┼─────────────────────┐
│ │ │ Bit 0 - FTEXT │
│ │ ├─────────────────────┤
│ │ │ Bit 1 - FHCRC │
│ │ ├─────────────────────┤
│ │ │ Bit 2 - FEXTRA │
│ │ ├─────────────────────┤
│ Header & │ │ Bit 3 - FNAME │
│ Trailer │ Flags ├─────────────────────┤
│ │ │ Bit 4 - FCOMMENT │
│ │ ├─────────────────────┤
│ │ │ Bit 5 - reversed │
│ │ ├─────────────────────┤
│ │ │ Bit 6 - reversed │
│ │ ├─────────────────────┤
│ │ │ Bit 7 - reversed │
│ ├─────────┼─────────────────────┘
│ │ │
│ │ ... │
│ │ │
└──────────┴─────────┘
If FLAG.FEXTRA, bit 2 in the header section, is enabled, and an extra fields section is added to the next section.
┌──────────┐
│ │
│ ... │
│ │
├──────────┼─────────────────────┐
│ │ SI1 │
│ ├─────────────────────┤
│ │ SI2 │
│ ├─────────────────────┤
│ │ │
│ │ LEN │
│ │ │
│ ├─────────────────────┤
│ │ │
│ │ │
│ Extra │ │
│ fields │ │
│ │ │
│ │ │
│ │ data │
│ │ │
│ │ │
│ │ │
│ │ │
│ │ │
│ │ │
│ │ │
├──────────┼─────────────────────┘
│ │
│ ... │
│ │
└──────────┘
Extra fields' structure has:
If FLAG.FNAME, bit 3 in the header section, is enabled, and a file name section is added to the next section.
The file name section is a series of bytes and it’s terminated by a zero byte. It follows ISO 8859-1 (LATIN-1) characters. If the source file is from stdin, there is no file name.
If FLAG.FCOMMENT, bit 4 in the header section, is enabled, and a file comment section is added to the next section.
The file comment section is a series of bytes and it’s terminated by a zero byte. It follows ISO 8859-1 (LATIN-1) characters.
If FLAG.FHCRC, bit 1 in the header section, is enabled, CRC16 section is added to the next section.
CRC16 is a Cyclic redundancy check 17 bits, an error-detecting code with 17 bits to detect accidental changes to digital data. In this case, it detects accidental changes in header section.
The CRC16 consists of the two least significant bytes of the CRC32 for all bytes of the gzip header up to and not including the CRC16.
The main part of the file is compressed blocks that use deflate algorithm (or anything else that’s defined in the header).
Deflate is designed as a stream of blocks that support lossless compression/decompression streaming.
Detail of please check reference: https://thuc.space/posts/deflate/
CRC32 is a Cyclic redundancy check 33 bit, an error-detecting code with 33 bits to detect accidental changes or wrong bit transmission of original uncompressed data.
This contains the size of the original (uncompressed) input data modulo 2^32
ISIZE = uncompressed_size % (2^32)