I'm currently in the middle of writing a small PNG image I/O library myself for learning purposes. My problem is the following:
I created a small PNG only 2 by 2 pixels in dimension and opened it in a hex editor to study its contents. This is the image I created using GIMP and stored with a compression of "9".
(Please note that this is an enlarged image of the original 2 by 2 pixel image ;) )
So I guess uncompressed, this would look something like this in memory:
00 00 00 FF 00 00 00 00 FF 00 FF 00
when stored without an alpha channel.
(I only stated this here for clarity. I know of compression and did not expect to see this byte pattern in the file).
I extracted the IDAT chunk and stripped the chunk ID ("IDAT") and the tailing CRC value and got this sequence of bytes:
08 D7 05 C1 01 01 00 00 00 80 10 FF 4F 17 10 48 06 0F FE 02 FE
Now the first two bytes 08 D7
contain information on the encoded block.
And the last four bytes 0F FE 02 FE
must be the ADLER32 checksum.
This ultimately leaves me with the following bytes:
05 C1 01 01 00 00 00 80 10 FF 4F 17 10 48 06
Written in binary representation these bytes are:
0000 0101 1100 0001 0000 0001 0000 0001
0000 0000 0000 0000 0000 0000 1000 0000
0001 0000 1111 1111 0100 1111 0001 0111
0001 0000 0100 1000 0000 0110
To understand DEFLATE better I tried to "unpack" this sequence by hand at least before I understand it well enough to write a small tool. But I got stuck really quickly.
RFC 1951 ("DEFLATE Compressed Data Format Specification") states that every encoded block begins with a three bit header. One bit indicating whether or not this is the last block and two more blocks indicating the compression method. Since I assume the encoder used only one block here (meaning the first block automatically is the last) and used a non static Huffmann tree, I am looking for the bit sequence "101" but I cannot find it (nor do I find other likely headers "100" or "110").
Also the RFC says that there must be two two byte values LEN and NLEN storing the length of the block where NLEN is the one's complement of LEN but again I am unable to find four such bytes that satisfy this condition. I won't even start on my luck of finding anything that could represend the two Huffmann trees.
I read RFCs 1951 and 1950 ("ZLIB Compressed Data Format Specification" as well as the Wikipedia articles on zlib, DEFLATE, LZ77 and Huffman coding, as well as several small and rather unhelpful articles on the web and a few answers on Stack Overflow, but none could help me with my lack of understanding.
I would be really grateful for any help or hint!
See Question&Answers more detail:
os