wiki: blog/breaking-hp200lx-memo-cipher



In this post we are trying to first try to deduce the encryption algorithm of
the Memo app in the HP 200LX handheld (which encrypts documents when a password
is used at save time) and then if possible find a way to deduce the password
from any encrypted file without any help.

The reason I'm doing this is because this handheld computer is very good as a
note taking machine given its size, and it would be good to know what attacks
can to perform to it.  Being its software closed source and written by
corporation (HP in this case) workers I was expecting quite a poor level of
security, and truth be told, from the beginning I saw the classic signs that
the password protection for this application was done by people that ended up
"Rolling Their Own Crypto".

The problem of reading a couple of books on cryptography and doing some
college courses is that you will feel confident about having knowledge on the
area, and thus tempted at using it in your own software.  Open source standard
protocols on the other hand have multiple years of research and cryptoanalysis
done to them, so even in the best case it's very likely that you are going to
make a few mistakes here and there ending up with an insecure piece of code.
Furthermore, if you only work with people that knows less crypto than you, your
work won't ever get properly reviewed.

This Memo app is in summary a simple WYSIWYG text editor with its own binary
format that supports a few extra attributes over standard plain text: bold and
underlined text, headers and footers, custom tab positions, custom margins,
multi level outlines with a couple of formats and print fields; and in
particular, it supports saving documents with passwords.

The first thing we should do is try to detect whether a stream cipher or a
block cipher is used.  Old apps such as this one, are usually written with
really insecure algorithms, as the developers involved in them tend have little
to no knowledge in cryptography.  We can see that for other apps like the
builtin Pocket Quicken or the Note Taker there are available crack programs
that can very easily obtain the passwords from the files.  Now it's time to
break Memo as well.

The simplest way to use a Stream cipher is to generate a stream of bytes from a
password and then XOR those generated bytes with the clear text.  On the other
hand block ciphers have many possible operation modes like ECB, CBC, etc.
Since them are in general a bit more difficult to properly implement it's less
likely to be used.

There's a simple way to detect if the simplest and insecure mode of operation
for a stream cipher was used: Given two files in clear text t1 and t2, a stream
cipher c, we then encrypt them with the same password, resulting in:

  e1 = c(password) XOR t1
  e2 = c(password) XOR t2

then by using the commutative, associative and existence of a unique zero
properties of XOR, together with x XOR x = 0 for all x, we obtain:

  e1 XOR e2
    = (c(password) XOR t1) XOR (c(password) XOR t2)
    = (t1 XOR c(password)) XOR (c(password) XOR t2)
    = t1 XOR (c(password) XOR c(password)) XOR t2
    = t1 XOR 0 XOR t2,
    = t1 XOR t2.

This actually happens with our Memo documents, which btw could easily have been
prevented by adding a unique "Nonce" initialization vector.  This IV has to be
unique and should never be reused (otherwise it would become meaningless).
An easy way to produce a most-likely-nonce is by hashing the date, the plain
text to cipher and real random numbers (not pseudo-random), eg: the number of
microseconds between succesive keystrokes.  Once we have this IV, the output
would become:

  e = IV || (c(IV XOR password) XOR t)

Then:
  e1 XOR e2 =
    = (IV1 || (c(IV1 XOR password) XOR t1)) XOR
      (IV2 || (c(IV2 XOR password) XOR t2))
    = (IV1 XOR IV2) || (
         c(IV1 XOR password) XOR t1 XOR
         c(IV2 XOR password) XOR t2
      )
    = (IV1 XOR IV2) || (
         c(IV1 XOR password) XOR t1 XOR
         c(IV2 XOR password) XOR t2
      )

which would equal (IV1 XOR IV2) || (t1 || t2) only if there is a collision in c
with different initialization vectors (this is why ideally we would like a
cryptographically secure function f), or if IV1=IV2 which should not happen by
definition.

In any case when we save different documents using the same key we have seen
that the same stream is generated every time and simply XORed with the plain
text.

Note that we can obtain the stream for a given password by saving a document
with and without password, and then:
  c(password)
    = c(password) XOR 0
    = c(password) XOR (t1 XOR t1)
    = (c(password) XOR t1) XOR t1
    = e1 XOR t1



Now that we can obtain c(password), The next easy step is to see if there's a
certain period in the stream s, defined as min_p such that exists x where for
all i>=x: s[i]=s[i+p].

It happened that independently of the password length there was actually a
small period: 2159.  Every time that we see a period we should decompose it
into powers of their prime factors.  In this case 2159 = 17 * 127.

When the period is a composite number it might be the case that the stream s is
built by XORing two streams of smaller periods, which here would be s = s1 XOR
s2 with respective periods of 17 and 127.  We can check this hypothesis by
testing:

  s1[1..127] =? s[1..127] XOR s[128..254].
  and checking if s1 has period 17.

This once again happens to be the case in the documents saved by our Memo app.
Furthermore we can see that s1 is just the password padded with null bytes to
length 17 and then repeated an infinite number of times:

  mysecurepassword\0mysecurepassword\0mysecurepassword\0

Now that we know s1 for our document, this also means that s2 can be obtained
from simply s2 = s XOR s1.  Additionally by testing with a different password
we observed that s2 does not depend on the content of the document.  It's the
same password dependent 127 bytes stream repeated an infinite number
of times, thus a very poor Key Derivation Function (KDF).

In summary we have:

      



So, in order to obtain our s2 stream for a given document we could open a
certain clear text document, save it encrypted with the password in to a
different file, then XOR both files between them together with the cycled
password, and so in order to gain more info about the KDF I've simply tried
looking at the streams being generated from different passwords, in some
cases obtaining interesting data:
  * KDF('@@@@@@@A') and KDF('@@@@@@A@') resulted in the same stream!  This
    meant that the function had serious problematic symmetries.
  * This same stream was also obtained by KDF('@@@@@A@@'), KDF('@@@@A@@@'),
    KDF('@@@A@@@@'), KDF('@@A@@@@@') and KDF('@A@@@@@@'), but KDF('A@@@@@@@')
    generated a different ones.
  * The streams for KDF('123') and KDF('132') were different, so a permutation
    of the password not always kept the same stream.
  * The stream for KDF('122') was different than KDF('123') which shown that
    all of the password bytes were being considered.

Then I looked into the patterns with passwords of length 1 to see what
difference was obtained, basically an informal kind of basic differential
cryptoanalysis, showing that the probabilities of (specially the first few)
output bytes of the stream were heavily dependent on the password bits.

In particular for a password of length 1, I could easily observe that for
most values of b:
  * KDF(string.char(b+1))[1] - KDF(string.char(b))[1] = 5
  * KDF(string.char(b+1))[2] - KDF(string.char(b))[2] = 1
  * KDF(string.char(b+1))[3] - KDF(string.char(b))[3] = 40,
  * 10, 94, 91, 73, 44, 16, 29, 198, 224, 239, 35, 199, ...
  * Smelling really bad like a LCG:
    https://en.wikipedia.org/wiki/Linear_congruential_generator

But there was a particularly curious relationship between KDF('7') and
KDF('8'), they generated mostly the same bytes!, not only that they differed
only every 17 bytes!!! 😮

>  for i=1,127 do
      io.write(('%02x '):format(
         bit.band(secret_at0x38[i] - secret_at0x37[i], 0xff)
      ))
      if i%16==0 then print'' end
   end
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 f1 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 f5 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 f7 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 03 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 03 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 f9 00 00 00 00 00 00 00

Then I came back to the HP 200LX and tried opening that very same file (the
one encrypted with the password '7'), and as suspected, an invalid password
error was thrown; so this has to be a bug somewhere in the application,
although I currently don't know where...  I suspect it might be related to the
fact that when I saving the files with a passwords one by one, I made a few
mistakes here and there like non-matching passwords (the app asks for
confirmation) or opening previous files and/or re-saving them.

Tired of doing guesswork I changed tactics.



Before anything we need the executable that we want to decompile, which in the
case of the built-in HP 200LX apps is not so easy, as they are not visible on
the ROM drive, and no, they are not present as hidden files.

So there are two reasonable ways:

1. Dumping the RAM while the program is running.

This is doable in the HP 200LX because it lets you start a DOS shell while
other builtin apps are running, just start the Memo app, leave it open and
open DEBUG in the DOS shell, if you set bx=10 and cx=0, you can dump the
addressable 10 first 64KB segments of RAM (1MB), to do this just run:
  r bx
  10
  r cx
  0
  n RAMDUMP.BIN
  w 0:0
  q

The problem is that this DUMP will not contain the code of the Memo app, not
unless we tell the HP 200LX Memory Controller that it should make the Memo
app available.  For more information of what I mention here, it can be greatly
helpful the LXREF reference manual available at:
  http://mizj.com/

In particular the part of our 3MB ROM that contains the MEMO app is located
between 0x260000 and 0x280000 (128kB), so we need to map those address ranges
to the memory controller available pages: 1x64KB (page 0 at C0000h) or
8x16KB (pages 4..B at D0000h, D4000h, ..., EC000h).
This can be done by using INT 63h (AH=0) with:
  AL=page number (0, 4..B)
  BX=hardware page#
  CX=number of consecutive 16KB blocks to map
  DX=device select code (0=ROM, 1..4=RAM, 5=Plug-in slot 0, 6=Unused)
Return code is left in AH, being 0=ok, 83h=invalid DX, 8Ah=invalid BX,
8Bh=invalid AL or A4h=invalid CX.

Don't run the following DEBUG commands inside the built-in environment,
and also avoid using page 0 as it will hang-up the system, and you'll need
to do a hard reset (Control-Shift-ON).

So going back to DEBUG, we then need to map the RAM:
  a
  mov ax,4
  mov bx,98
  mov cx,1
  mov dx,0
  int 63
  int 20
  (blank line)
  g
  n DUMP98.bin
  r bx
  0
  r cx
  4000
  w d000:0
  q

Rinse and repeat for the 8 pages 98h to 9Fh.  I suggest saving the commands
into a script and then loading them with: debug <dump.txt

2. The easier alternative: downloading CMEMO.EXE directly.

When reading about this handheld computer in multiple backups and forums I've
learned that at some time HP did provide a version of the built-in software
as a software package to be run on any standard PC.  This is called CPACK200
or "HP 200LX Connectivity Pack", available after free registration on
VetusWare.com:

https://vetusware.com/download/200LX%20Connectivity%20Pack%20__/?id=8545

That's a "RAR" file containing 3 disk images, that can be mounted with:
mkdir /tmp/1; sudo mount HP200lx_1of3.IMA /tmp/1;  Inside of the first disk
image there's a file called APPS.ARJ (that can be extracted with 7z x), and
within it there's CMEMO.EXE, our desired executable.

Since I already did all of this, you can get this file from /hp200lx/cmemo

3.  Reverse Engineering CMEMO

To do this I've used ghidra:

  https://ghidra-sre.org/

Where I created a new project, imported CMEMO.EXE, analyzed the file, then at
"Window -> Memory Map" I created a new Memory RW entry at 0000:0000 with length
0x10000; and finally started adding annotations at the detected functions.

One easy one to start with is "fclose", which you can find by searching at
the bytes b3 3e (MOV AH, 0x3e) cd 21 (INT 0x21), being located at 1000:3244.

int fclose(int file_dos_handler)

{
  code *pcVar1;
  int iVar2;
  undefined in_CF;
  
                    /* close file (bx=handle) */
  pcVar1 = (code *)swi(0x21);
  (*pcVar1)();
  iVar2 = 0;
  if ((bool)in_CF) {
    iVar2 = -1;
  }
  return iVar2;
}


Then we search for the write syscall INT 21h with AH=40h, same as before we
can see at 1000:3278, write_to_file(int handler, char *buffer, int nbytes):

int write_to_file(int dos_file_handler,char *buffer,int nbytes)

{
  code *pcVar1;
  int iVar2;
  
                    /* Write To File or Device Using Handle */
  pcVar1 = (code *)swi(0x21);
  iVar2 = (*pcVar1)();
  return iVar2;
}


If we right click on the function address, then select "References -> Show
Call Tree", we will see 3 functions, one of them being "write_block", that
as we can see it's in charge of calling "write_to_file", yet if we look before
that we can see that it runs some extra stuff before saving the block
depending on whether a certain string is empty!  This is more or less where
the bytes to save are XORed with the content of two streams, one of 17 bytes
and another of 127 bytes!

Now we know that not only the PRNG KDF function was generating a weak stream
of bytes, it was also truncated to 127 bytes and then looped forever!, here
I added symbols and a bunch of comments:

int write_block(int dos_file_handler,char *buffer,int nbytes)

{
  byte *pbVar1;
  int *piVar2;
  int iVar3;
  undefined2 unaff_DS;
  char *buffer_00;
  int local_6;
  byte *local_4;
  
  if (*(char *)PASSWORD == '\0') {
    buffer_00 = (char *)((ulong)buffer & 0xffff | (ulong)buffer._2_2_ << 0x10);
  }
  else {

                    /* assume DS=0, here 0x51ee is TMP_WRITE_BUFFER */

    memcpy((char *)CONCAT22(unaff_DS,0x51ee),
           (char *)((ulong)buffer & 0xffff | (ulong)buffer._2_2_ << 0x10),nbytes);
    local_4 = (byte *)TMP_WRITE_BUFFER;
    for (local_6 = 0; local_6 < nbytes; local_6 = local_6 + 1) {
      pbVar1 = local_4;
      *pbVar1 = *pbVar1 ^ **(byte **)&PASSWORD_CURSOR ^ **(byte **)&PRNG_CURSOR;
      local_4 = local_4 + 1;
      piVar2 = (int *)&PASSWORD_CURSOR;
      *piVar2 = *piVar2 + 1;

/* easier to read as if (PASSWORD_CURSOR > (uint)PASSWORD + 0x10U),
   which is equivalent to (PASSWORD_CURSOR >= (uint)PASSWORD + 17) */

      if ((int)PASSWORD + 0x10U < *(uint *)&PASSWORD_CURSOR) {
        *(undefined2 *)&PASSWORD_CURSOR = (int)PASSWORD;
      }
      piVar2 = (int *)&PRNG_CURSOR;
      *piVar2 = *piVar2 + 1;

        /* OMG, this is: if (PRNG_CURSOR >= (int)PRNG_START+0x7f): */

      if ((int)PRNG_START + 0x7eU < *(uint *)&PRNG_CURSOR) {
        *(undefined2 *)&PRNG_CURSOR = (int)PRNG_START;
      }
    }
        /* assume DS=0, passing TMP_WRITE_BUFFER(=0x51ee) a write_to_file */
    buffer_00 = (char *)CONCAT22(unaff_DS,0x51ee);
  }
  iVar3 = write_to_file(dos_file_handler,buffer_00,nbytes);
  if (iVar3 != nbytes) {
    return 1;
  }
  return 0;
}


Once again, if we go to the callers, we will see the following snippet,
letting us know that this is the actual "save" function.  Note as well that
I also included an annotation for "init_prng" on the only function being
called from here that writes on top of the PASSWORD global variable:

[...]
    strcpy((char *)CONCAT22(unaff_DS,0x61f0),(char *)CONCAT22(unaff_DS,0x4b00));
    init_prng();
    pbVar1 = (byte *)0x3e81;
    *pbVar1 = *pbVar1 | 0x80;
    FUN_1000_0323(0x455d);
    local_8 = *(undefined2 *)0x3eb2;
    local_10 = *(undefined2 *)0x3eb4;
    iVar5 = *(int *)0x3eba;
    local_236 = (undefined *)*(int *)0x3eb8;
    iVar6 = FUN_1000_31f6(param_1,unaff_DS,0x8302);
    if (iVar6 == -1) {
      *(undefined *)PASSWORD = 0;
      return false;
    }
    FUN_1000_0323(0x456b);
    piVar4 = *(int **)0x3e72;
    dos_file_handler = *piVar4;
    local_a = piVar4[1];
    fp = (uint *)CONCAT22(local_a,dos_file_handler);
    local_c = dos_file_handler;
    iVar6 = write_to_file(dos_file_handler,(char *)fp,5);
    local_22a = (uint)(iVar6 != 5);
    local_c = local_c + 5;
    local_232 = (*(uint *)0x3ebe >> 0xc) + *(int *)0x3ebc * 4;
    uVar7 = *(uint *)0x3ebe & 0xfff;
    if (uVar7 < 5) {
      local_22e = uVar7 + 0xffb;
      local_232 = local_232 - 1;
    }
    else {
      local_22e = uVar7 - 5;
    }
    for (local_228 = 0; (iVar6 = (int)fp, local_22a == 0 && (local_228 < local_232));
        local_228 = local_228 + 1) {
      local_22a = write_block(local_c,(char *)CONCAT22(local_a,local_c),0x1000);
      fp = (uint *)CONCAT22(unaff_SS,&local_c);
      dos_file_handler = 0x212f;
      add_offset_to_fp(fp,0x1000);
    }
    if ((local_22a == 0) && (local_22e != 0)) {
      buffer = (char *)CONCAT22(local_a,local_c);
      dos_file_handler = local_c;
      local_22a = write_block(local_c,buffer,local_22e);
      iVar6 = (int)buffer;
    }
    FUN_1000_0323(0x456a);
    uVar10 = 0x2171;
    iVar8 = fclose(iVar6);
    if (iVar8 != 0) {
      local_22a = local_22a + 1;
    }
    *(undefined *)PASSWORD = 0;
    unaff_SS = iVar5;
[...]


Finally we have "init_prng", showing us the actual chunk of code that is used
to build the KDF stream.  Here we can also see that the generator is a
Lehmer RNG, an even weaker LCG variant:

  https://en.wikipedia.org/wiki/Lehmer_random_number_generator

With the state in a uint16_t variable PRNG_VAL_3b12, using 0x105 as the
multiplier and 0xfff1 as the prime modulus in order to achieve the maximum
period, m-1 = 0xfff0 (almost 64kB):

void init_prng(void)

{
  int *piVar1;
  uint *puVar2;
  int iVar3;
  undefined2 unaff_DS;
  uint local_4;
  
  if (*(char *)PASSWORD != '\0') {
    *(undefined2 *)&PRNG_VAL_3b12 = 0;
    for (local_4 = 0; ((int)local_4 < 0x11 && (*(char *)((int)PASSWORD + local_4) != '\0'));
        local_4 = local_4 + 1) {
      if ((local_4 & 1) == 0) {
        piVar1 = (int *)&PRNG_VAL_3b12;
        *piVar1 = *piVar1 + *(int *)((int)PASSWORD + local_4);
      }
      else {
        puVar2 = (uint *)&PRNG_VAL_3b12;
        *puVar2 = *puVar2 ^ *(uint *)((int)PASSWORD + local_4);
      }
    }
    for (; (int)local_4 < 0x11; local_4 = local_4 + 1) {
      *(undefined *)((int)PASSWORD + local_4) = 0;
    }
    if (*(int *)&PRNG_VAL_3b12 == 0) {
      *(undefined2 *)&PRNG_VAL_3b12 = 0x4b45;
    }
    local_4 = 0;
    do {
      iVar3 = (int)(((ulong)*(uint *)&PRNG_VAL_3b12 * 0x105) % 0xfff1);
      if (iVar3 == 0) {
        iVar3 = 0x4b45;
      }
      *(int *)&PRNG_VAL_3b12 = iVar3;
      *(undefined2 *)((int)PRNG_START + local_4) = *(undefined2 *)&PRNG_VAL_3b12;
      local_4 = local_4 + 2;
    } while ((int)local_4 < 0x80);
    *(undefined2 *)&PRNG_CURSOR = (int)PRNG_START;
    *(undefined2 *)&PASSWORD_CURSOR = (int)PASSWORD;
  }
  return;
}


Here is an easy to understand re-implementation in lua:

function password_seed(password)
   -- super dumb HP 200LX Memo's PRNG!
   local seed = 0 
   local function get_int16le(pos)
      local lo, hi = password:byte(pos, pos+1)
      return lo + (hi or 0)*256
   end
   for i = 1, #password do
      if i%2 == 1 then -- first, third, fifth... chars
         seed = seed + get_int16le(i)
      else -- second, fourth, sixth.. chars
         seed = bit.bxor(seed, get_int16le(i))
      end
   end
   seed = bit.band(seed, 0xffff)
   if seed == 0 then seed = 0x4b45 end
   return seed
end

function gensecret(seed)
   -- let's build 127 pseudo-random bytes
   local res = {}
   for i = 1, 127 do
      seed = seed * 0x105 % 0xfff1
      if seed == 0 then seed = 0x4b45 end
      res[#res+1] = bit.band(seed, 0xff) -- lo byte first
      res[#res+1] = bit.rshift(seed, 8)
   end
   return res
end

function kdf(password)
   return gensecret(password_seed(password))
end


And to fully decrypt the file we only need to tie the loose ends:

function crypt(text, password, secret)
   assert(#password<17)
   local p = (password..('\0'):rep(16)):sub(1,17)
   return text:sub(1,5)..text:sub(6):gsub('()(.)',
      function(i,c)
         return string.char(bit.bxor(
            c:byte(),
            secret[(i-1)%127+1],
            p:sub((i-1)%17+1):byte()))
      end)
end

>  crypt(plaintext, 'mypassword', kdf'mypassword') == encrypted
true
>  crypt(encrypted, 'mypassword', kdf'mypassword') == plaintext
true

Loading error: [string " return then 'COULD WE CRACK IT?' "]:1: unexpected symbol near 'then'

So finally we can present a few ideas on how we could crack the password with
only the encrypted file:

   Note that positions 0x32..0x35 are shared with 0xba..0xbd (because their
   indexes differ by 17*8), thus:
      secret[0x32-4]^secret[0xba] =
         = (c:byte(0x32+1)^0xff^p) ^ (c:byte(0xba+1)^0x87^p)
         =  c:byte(0x32+1)^0xff    ^  c:byte(0xba+1)^0x87, a known value!

   So we only need to hash the seeds by their xors at these shared positions
   to obtain the possible seeds, and if there's only one seed[*] then we can
   already deduce 4 characters of the password.

   [*] And there will be only one seed because the KDF dumps the full state
   every two bytes (there's no hidden state).

   Additionally the password stream has a null byte every 17 bytes, and some
   of them fall in positions that always have a known value independent of
   the content or parameters of the file, so we could use those to additionally
   filter out the state.

   Then by using the known fixed values of any file and we have obtained the
   kdf seed, we can obtain the password by XORing certain positions of the
   ciphertext with the kdf stream and the known plaintext values.

   This is an hex dump of a bare-bones example document where I enclosed
   in brackets the constant values (there may be more):

00000000: ff 32 00 45 4b 09 00 f9 00 00 00 00 00 02 00 03  .2.EK...........
00000010: 00 00 80 80 80 80 80 80 80 80 80 80 80 80 80 80  ................
00000020: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80  ................
00000030: 80(31 ff ff 88 01 00 00)00(00)00 00 00 00 00 00  .1..............
00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
000000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
000000b0: 00 00 00 00 00 00 00 00 00 00(87 ff ff 88 02 00  ................
000000c0: 00)00(00)00 00 00 00 00 00 00 00 00 00 00 00 00  ................
000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000140: 00 00 00(87 ff)74 68 69 73 20 69 73 20 65 6e 63  .....this is enc
00000150: 72 79 70 74 65 64 20 74 65 78 74 41 41 41 41 41  rypted textAAAAA
00000160: 41 41 41 41 41 41                                AAAAAA

   With this we can obtain the following characters: "yy3y567890yyyyyy"

   Additionally if either the header or the footer have less than 128 bytes
   then we can use the padded null bytes to continue obtaining more and more
   characters from the password.  Furthermore, if the tabs have not been
   changed on columns 200..248 (and I don't think nobody would mess with the
   tabs, especially on these columns) then we could also crack the password
   deterministically.

   And finally if that wasn't enough, then after obtaining the KDF seed and
   XORing its stream into the ciphertext; what would remain is just a simple
   Vigenère of length 17 (with the minor exception of having the addition
   done on GF(2) instead of Z(mod 256))...  We can finally perform basic
   frequency analysis (which is particularly easy to do when we have
   standard English text with spaces in between the words).



TLDR: Don't Roll Your Own Crypto.

Even though in this blog post I broke this joke of a cryptosystem (it took me
only a few days, less than a week), there are experts all around the world
that know what they're doing and can even perform attack on real world
protocols and implementations.

Always use popular open source implementations.  It is easier to check the
quality of the code and see if the coding practices are good, than to implement
your own library.  Crypto code can be surprisingly tricky, and even when the
source code is public and reviewed thousands of times the bugs can still be
hidden.  On closed source software this is even worse.

Always use open standards.  Most protocols in modern cryptography go through
serious review processes before being published, and after that it is
very common to see the scientific community finding attacks to them.

Always use initialization vectors with stream ciphers (with most ciphers btw),
and be very careful in how you obtain those IVs, as the security of the system
can be dependent on these values being used only once.

Be also careful on the boring details: protecting a piece of software from
side channel attacks (cache, timing, power, etc) is extremely difficult even
for experts on the subject.

How you use the algorithms is also very important.  Modes of operation,
padding construction, choosing and validation of session parameters, number of
rounds discarded and performed, among many other protocol characteristics
usually play a crucial role in the whole security and failing to properly
implement them can break the whole system.  Example:

  https://medium.com/@c0D3M/bleichenbacher-attack-explained-bc630f88ff25
  https://en.wikipedia.org/wiki/Padding_oracle_attack

Thank you for reading this blog post and feel free to leave any comments,
corrections are specially welcomed!



                                                                              π