|Portada|Blog|Space| [ Go back to WG-S20 index page. ] ====== RAW images ====== After analyzing all the images there's always at the beginning an 8 byte header with two 16bit little endian integers with the following meaning: * offset=0, 16bits little endian integer: TODO * offset=2, 16bits little endian integer: TODO * offset=4, 512 bytes: Sizes in bytes for the layers. * offset=512, 512*n bytes: Image layer... * offset=512*(n+1), m*512 bytes: Another optional image layer... * ... === Sizes block === This sizes block contains for each layer a 4 byte little endian integer with its respective size in bytes. For instance FORM, IMPT and NOTE files only contain a single layer, and PAGE only two. The first layer being the marker, and the second one the pen. Since 512bytes / 4bytes = 128, most of the entries are just a bunch of nil bytes. In the same way that this 512bytes sizes block has 4 bytes offset, the remaining blocks also have a 512 bytes aligned offset. All blocks have the corresponding padding to achieve this. The padding is composed of just the needed amount of nil bytes. === Un-compressed Image layers === This files begin with a 1 in 32 bits little endian, meaning that the first byte of the layer is a 0x01 byte, followed by 3 0x00 bytes. Followed by 600*700/4 16-bit words in little endian as well. Each 16-bits word codes 4 pixels with a nibble for each pixel with the least significant nibble as the leftmost pixel and the most significant nibble as the rightmost one. As in the compressed format, the pixels are stored left to right and top to bottom (like the words in an English text). Each pixel has 4 bits, 0 being black, 0xf being white and grays in between. If the first 16 pixels from left to right have the values 0xf, 0xe, ..., 0x01, 0x0. Then the image layer begins with the bytes (expressed in base 16): 01 00 00 00 dc fe 98 ba 54 76 10 32 After all those 16-bit words, comes a final 16 bit word with something that I don't have the smallest idea of what it is. Probably a checksum of some kind. I have already tested the obvious algorithms for checksum: sum each byte, sum each word in LE & BE, crc16 & crc-x25, md5, sha1, sha256, sha512, among others. === Compressed Image layers === The images are stored in a non-ciphered semi compressed format. All the images have a resolution of 600x700 with 16 grays (4 bpp). Lines are stored from top to bottom, and within each line, pixels are stored from left to right. Each subblock is (maybe) separated with two consecutive 0xff bytes. The special case of a white subblock, is just the bytes 0x5f 0x79 repeated 16 times (ff ff 5f 79 5f 79 ... 5f 79). The solid black subblocks are composed by the bytes 0x0f 0x00 also repeated 16 times (ff ff 0f 00 0f 00 ... 0f 00). Here is a table with more examples: Table 1: solid colors. (fn=int(y*16/700)) +-----+-------+----------------------+ | #0 | 0f 00 | 0000 1111 0000 0000 | solid black | #1 | 8f e5 | 1000 1111 1110 0101 | solid very dark gray | #2 | 1f cb | 0001 1111 1100 1011 | ... | #3 | af b1 | 1010 1111 1011 0001 | | #4 | 3f 97 | 0011 1111 1001 0111 | | #5 | cf 7d | 1100 1111 0111 1101 | | #6 | 5f 63 | 0101 1111 0110 0011 | | #7 | ef 49 | 1110 1111 0100 1001 | +-----+-------+----------------------+ | #8 | 7f 2f | 0111 1111 0010 1111 | | #9 | 0f 15 | 0000 1111 0001 0101 | | #a | 8f fb | 1000 1111 1111 1011 | | #b | 1f e1 | 0001 1111 1110 0001 | | #c | af c7 | 1010 1111 1100 0111 | | #d | 3f ad | 0011 1111 1010 1101 | ... | #e | cf 93 | 1100 1111 1001 0111 | solid very light gray | #f | 5f 79 | 0101 1111 0111 1001 | solid white +-----+-------+----------------------+ Interesting bits: * Except solid black, all the other solid colors have a 1 in the less significant bit of the second byte. * The first byte has 1111 on the less significant nibble of the first byte. Table 2: Coding for images made with 600 intercalated black and white vertical bars. All share the same size as the solid color images. +------------+-------+----------------------+ | #0, #f | 5f 1a | 0101 1111 0001 1010 | fn=x%2*15 | #f, #0 | 1f a1 | 0001 1111 1010 0001 | fn=15-x%2*15 +------------+-------+----------------------+ | #0, #0, #f | 1f 00 | 0001 1111 0000 0000 | for the three vertical ones: \ | #0, #f, #0 | cf 0a | 1100 1111 0000 1010 | fn=and(2^(2-x%3), int(y*8/700))?15:0 | #0, #f, #f | df a0 | 1101 1111 0000 1010 | | #f, #0, #0 | 7f 91 | 0111 1111 1001 0001 | | #f, #0, #f | 8f 91 | 1000 1111 1001 0001 | | #f, #f, #0 | 4f 79 | 0100 1111 0111 1001 | +------------+-------+----------------------+ A row compressed of pair of pixels 0x12 get compressed as well, if we compare it to the rows of those two colors, we see this: +----+-------+----------------------+ | 11 | 8- e5 | 1000 ---- 1110 0101 | solid very dark gray | 22 | 1- cb | 0001 ---- 1100 1011 | solid almost as dark gray +----+-------+----------------------+ | 12 | f- b8 | 1111 ---- 1011 1000 | | 13 | 4- 5f | 0100 ---- 0101 1111 | | 21 | 4- d2 | 0100 ---- 1101 0010 | +----+-------+----------------------+ Interesting bits: Other interesting bits: * Between the big chunks of bars in the images, there are sequences of bytes like: 7f91 7f91 7f91 7991 8f91 8f91 8f91 .... Which makes me think that maybe that 'f' we've seen in all those pattern and solid colors mean the times that that color is repeated. In other words the "length" in RLE. * The compression method begins loosing efficiency when the vertical bars have a cycle length of 5. For instance when the cycle is #0#0#f#f#f, the pattern ends up being, which probably means that little to no compression is going on: 1000 df0a 009f 100f 9d55 7f91 009f 10ef 895f 7900 ff5f 799f Theories : * Regarding those ff ff: * [F] They are used to separate lines. False: because on pure white images there are ff pairs than lines. False also because on the image with 1-pixel wide white and black intercalated stripes (fn=y%2*15), the limit of the lines get disaligned as the lines appear. * [F] They are used to separate subblocks of a fixed number of bytes. When the image has more information, for instance when there's a pixel in the middle of those ff, then those ff get separated as more bytes are being used to encode that part of the image. * [?] They are used to separate a given number of pixels. Test images were made with the following command: gawk 'function writeimage(I,w,h){print"P2",w,h,15;s="";for(y=0;y<h;y++)for(x=0;x<w;x++){s=s (0+I[x,y])" ";if(length(s)>65){print s;s=""}}print s} BEGIN{for(y=0;y<700;y++)for(x=0;x<600;x++)I[x,y]=fn;writeimage(I,600,700)}' | ppmtobmp -bpp=4 > test.bmp