In this post we are trying to first try to deduce the encryption algorithm of
the Memo app in the HP 200LX handheld (which encrypts documents when a password
is used at save time) and then if possible find a way to deduce the password
from any encrypted file without any help.
The reason I'm doing this is because this handheld computer is very good as a
note taking machine given its size, and it would be good to know what attacks
can to perform to it. Being its software closed source and written by
corporation (HP in this case) workers I was expecting quite a poor level of
security, and truth be told, from the beginning I saw the classic signs that
the password protection for this application was done by people that ended up
"Rolling Their Own Crypto".
The problem of reading a couple of books on cryptography and doing some
college courses is that you will feel confident about having knowledge on the
area, and thus tempted at using it in your own software. Open source standard
protocols on the other hand have multiple years of research and cryptoanalysis
done to them, so even in the best case it's very likely that you are going to
make a few mistakes here and there ending up with an insecure piece of code.
Furthermore, if you only work with people that knows less crypto than you, your
work won't ever get properly reviewed.
This Memo app is in summary a simple WYSIWYG text editor with its own binary
format that supports a few extra attributes over standard plain text: bold and
underlined text, headers and footers, custom tab positions, custom margins,
multi level outlines with a couple of formats and print fields; and in
particular, it supports saving documents with passwords.
The first thing we should do is try to detect whether a stream cipher or a
block cipher is used. Old apps such as this one, are usually written with
really insecure algorithms, as the developers involved in them tend have little
to no knowledge in cryptography. We can see that for other apps like the
builtin Pocket Quicken or the Note Taker there are available crack programs
that can very easily obtain the passwords from the files. Now it's time to
break Memo as well.
The simplest way to use a Stream cipher is to generate a stream of bytes from a
password and then XOR those generated bytes with the clear text. On the other
hand block ciphers have many possible operation modes like ECB, CBC, etc.
Since them are in general a bit more difficult to properly implement it's less
likely to be used.
There's a simple way to detect if the simplest and insecure mode of operation
for a stream cipher was used: Given two files in clear text t1 and t2, a stream
cipher c, we then encrypt them with the same password, resulting in:
e1 = c(password) XOR t1
e2 = c(password) XOR t2
then by using the commutative, associative and existence of a unique zero
properties of XOR, together with x XOR x = 0 for all x, we obtain:
e1 XOR e2
= (c(password) XOR t1) XOR (c(password) XOR t2)
= (t1 XOR c(password)) XOR (c(password) XOR t2)
= t1 XOR (c(password) XOR c(password)) XOR t2
= t1 XOR 0 XOR t2,
= t1 XOR t2.
This actually happens with our Memo documents, which btw could easily have been
prevented by adding a unique "Nonce" initialization vector. This IV has to be
unique and should never be reused (otherwise it would become meaningless).
An easy way to produce a most-likely-nonce is by hashing the date, the plain
text to cipher and real random numbers (not pseudo-random), eg: the number of
microseconds between succesive keystrokes. Once we have this IV, the output
would become:
e = IV || (c(IV XOR password) XOR t)
Then:
e1 XOR e2 =
= (IV1 || (c(IV1 XOR password) XOR t1)) XOR
(IV2 || (c(IV2 XOR password) XOR t2))
= (IV1 XOR IV2) || (
c(IV1 XOR password) XOR t1 XOR
c(IV2 XOR password) XOR t2
)
= (IV1 XOR IV2) || (
c(IV1 XOR password) XOR t1 XOR
c(IV2 XOR password) XOR t2
)
which would equal (IV1 XOR IV2) || (t1 || t2) only if there is a collision in c
with different initialization vectors (this is why ideally we would like a
cryptographically secure function f), or if IV1=IV2 which should not happen by
definition.
In any case when we save different documents using the same key we have seen
that the same stream is generated every time and simply XORed with the plain
text.
Note that we can obtain the stream for a given password by saving a document
with and without password, and then:
c(password)
= c(password) XOR 0
= c(password) XOR (t1 XOR t1)
= (c(password) XOR t1) XOR t1
= e1 XOR t1
Now that we can obtain c(password), The next easy step is to see if there's a
certain period in the stream s, defined as min_p such that exists x where for
all i>=x: s[i]=s[i+p].
It happened that independently of the password length there was actually a
small period: 2159. Every time that we see a period we should decompose it
into powers of their prime factors. In this case 2159 = 17 * 127.
When the period is a composite number it might be the case that the stream s is
built by XORing two streams of smaller periods, which here would be s = s1 XOR
s2 with respective periods of 17 and 127. We can check this hypothesis by
testing:
s1[1..127] =? s[1..127] XOR s[128..254].
and checking if s1 has period 17.
This once again happens to be the case in the documents saved by our Memo app.
Furthermore we can see that s1 is just the password padded with null bytes to
length 17 and then repeated an infinite number of times:
mysecurepassword\0mysecurepassword\0mysecurepassword\0
Now that we know s1 for our document, this also means that s2 can be obtained
from simply s2 = s XOR s1. Additionally by testing with a different password
we observed that s2 does not depend on the content of the document. It's the
same password dependent 127 bytes stream repeated an infinite number
of times, thus a very poor Key Derivation Function (KDF).
In summary we have:
So, in order to obtain our s2 stream for a given document we could open a
certain clear text document, save it encrypted with the password in to a
different file, then XOR both files between them together with the cycled
password, and so in order to gain more info about the KDF I've simply tried
looking at the streams being generated from different passwords, in some
cases obtaining interesting data:
* KDF('@@@@@@@A') and KDF('@@@@@@A@') resulted in the same stream! This
meant that the function had serious problematic symmetries.
* This same stream was also obtained by KDF('@@@@@A@@'), KDF('@@@@A@@@'),
KDF('@@@A@@@@'), KDF('@@A@@@@@') and KDF('@A@@@@@@'), but KDF('A@@@@@@@')
generated a different ones.
* The streams for KDF('123') and KDF('132') were different, so a permutation
of the password not always kept the same stream.
* The stream for KDF('122') was different than KDF('123') which shown that
all of the password bytes were being considered.
Then I looked into the patterns with passwords of length 1 to see what
difference was obtained, basically an informal kind of basic differential
cryptoanalysis, showing that the probabilities of (specially the first few)
output bytes of the stream were heavily dependent on the password bits.
In particular for a password of length 1, I could easily observe that for
most values of b:
* KDF(string.char(b+1))[1] - KDF(string.char(b))[1] = 5
* KDF(string.char(b+1))[2] - KDF(string.char(b))[2] = 1
* KDF(string.char(b+1))[3] - KDF(string.char(b))[3] = 40,
* 10, 94, 91, 73, 44, 16, 29, 198, 224, 239, 35, 199, ...
* Smelling really bad like a LCG:
https://en.wikipedia.org/wiki/Linear_congruential_generator
But there was a particularly curious relationship between KDF('7') and
KDF('8'), they generated mostly the same bytes!, not only that they differed
only every 17 bytes!!! đŸ˜®
> for i=1,127 do
io.write(('%02x '):format(
bit.band(secret_at0x38[i] - secret_at0x37[i], 0xff)
))
if i%16==0 then print'' end
end
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 f1 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 f5 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 f7 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 03 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 03 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 f9 00 00 00 00 00 00 00
Then I came back to the HP 200LX and tried opening that very same file (the
one encrypted with the password '7'), and as suspected, an invalid password
error was thrown; so this has to be a bug somewhere in the application,
although I currently don't know where... I suspect it might be related to the
fact that when I saving the files with a passwords one by one, I made a few
mistakes here and there like non-matching passwords (the app asks for
confirmation) or opening previous files and/or re-saving them.
Tired of doing guesswork I changed tactics.
Before anything we need the executable that we want to decompile, which in the
case of the built-in HP 200LX apps is not so easy, as they are not visible on
the ROM drive, and no, they are not present as hidden files.
So there are two reasonable ways:
1. Dumping the RAM while the program is running.
This is doable in the HP 200LX because it lets you start a DOS shell while
other builtin apps are running, just start the Memo app, leave it open and
open DEBUG in the DOS shell, if you set bx=10 and cx=0, you can dump the
addressable 10 first 64KB segments of RAM (1MB), to do this just run:
r bx
10
r cx
0
n RAMDUMP.BIN
w 0:0
q
The problem is that this DUMP will not contain the code of the Memo app, not
unless we tell the HP 200LX Memory Controller that it should make the Memo
app available. For more information of what I mention here, it can be greatly
helpful the LXREF reference manual available at:
http://mizj.com/
In particular the part of our 3MB ROM that contains the MEMO app is located
between 0x260000 and 0x280000 (128kB), so we need to map those address ranges
to the memory controller available pages: 1x64KB (page 0 at C0000h) or
8x16KB (pages 4..B at D0000h, D4000h, ..., EC000h).
This can be done by using INT 63h (AH=0) with:
AL=page number (0, 4..B)
BX=hardware page#
CX=number of consecutive 16KB blocks to map
DX=device select code (0=ROM, 1..4=RAM, 5=Plug-in slot 0, 6=Unused)
Return code is left in AH, being 0=ok, 83h=invalid DX, 8Ah=invalid BX,
8Bh=invalid AL or A4h=invalid CX.
Don't run the following DEBUG commands inside the built-in environment,
and also avoid using page 0 as it will hang-up the system, and you'll need
to do a hard reset (Control-Shift-ON).
So going back to DEBUG, we then need to map the RAM:
a
mov ax,4
mov bx,98
mov cx,1
mov dx,0
int 63
int 20
(blank line)
g
n DUMP98.bin
r bx
0
r cx
4000
w d000:0
q
Rinse and repeat for the 8 pages 98h to 9Fh. I suggest saving the commands
into a script and then loading them with: debug <dump.txt
2. The easier alternative: downloading CMEMO.EXE directly.
When reading about this handheld computer in multiple backups and forums I've
learned that at some time HP did provide a version of the built-in software
as a software package to be run on any standard PC. This is called CPACK200
or "HP 200LX Connectivity Pack", available after free registration on
VetusWare.com:
https://vetusware.com/download/200LX%20Connectivity%20Pack%20__/?id=8545
That's a "RAR" file containing 3 disk images, that can be mounted with:
mkdir /tmp/1; sudo mount HP200lx_1of3.IMA /tmp/1; Inside of the first disk
image there's a file called APPS.ARJ (that can be extracted with 7z x), and
within it there's CMEMO.EXE, our desired executable.
Since I already did all of this, you can get this file from /hp200lx/cmemo
3. Reverse Engineering CMEMO
To do this I've used ghidra:
https://ghidra-sre.org/
Where I created a new project, imported CMEMO.EXE, analyzed the file, then at
"Window -> Memory Map" I created a new Memory RW entry at 0000:0000 with length
0x10000; and finally started adding annotations at the detected functions.
One easy one to start with is "fclose", which you can find by searching at
the bytes b3 3e (MOV AH, 0x3e) cd 21 (INT 0x21), being located at 1000:3244.
int fclose(int file_dos_handler)
{
code *pcVar1;
int iVar2;
undefined in_CF;
/* close file (bx=handle) */
pcVar1 = (code *)swi(0x21);
(*pcVar1)();
iVar2 = 0;
if ((bool)in_CF) {
iVar2 = -1;
}
return iVar2;
}
Then we search for the write syscall INT 21h with AH=40h, same as before we
can see at 1000:3278, write_to_file(int handler, char *buffer, int nbytes):
int write_to_file(int dos_file_handler,char *buffer,int nbytes)
{
code *pcVar1;
int iVar2;
/* Write To File or Device Using Handle */
pcVar1 = (code *)swi(0x21);
iVar2 = (*pcVar1)();
return iVar2;
}
If we right click on the function address, then select "References -> Show
Call Tree", we will see 3 functions, one of them being "write_block", that
as we can see it's in charge of calling "write_to_file", yet if we look before
that we can see that it runs some extra stuff before saving the block
depending on whether a certain string is empty! This is more or less where
the bytes to save are XORed with the content of two streams, one of 17 bytes
and another of 127 bytes!
Now we know that not only the PRNG KDF function was generating a weak stream
of bytes, it was also truncated to 127 bytes and then looped forever!, here
I added symbols and a bunch of comments:
int write_block(int dos_file_handler,char *buffer,int nbytes)
{
byte *pbVar1;
int *piVar2;
int iVar3;
undefined2 unaff_DS;
char *buffer_00;
int local_6;
byte *local_4;
if (*(char *)PASSWORD == '\0') {
buffer_00 = (char *)((ulong)buffer & 0xffff | (ulong)buffer._2_2_ << 0x10);
}
else {
/* assume DS=0, here 0x51ee is TMP_WRITE_BUFFER */
memcpy((char *)CONCAT22(unaff_DS,0x51ee),
(char *)((ulong)buffer & 0xffff | (ulong)buffer._2_2_ << 0x10),nbytes);
local_4 = (byte *)TMP_WRITE_BUFFER;
for (local_6 = 0; local_6 < nbytes; local_6 = local_6 + 1) {
pbVar1 = local_4;
*pbVar1 = *pbVar1 ^ **(byte **)&PASSWORD_CURSOR ^ **(byte **)&PRNG_CURSOR;
local_4 = local_4 + 1;
piVar2 = (int *)&PASSWORD_CURSOR;
*piVar2 = *piVar2 + 1;
/* easier to read as if (PASSWORD_CURSOR > (uint)PASSWORD + 0x10U),
which is equivalent to (PASSWORD_CURSOR >= (uint)PASSWORD + 17) */
if ((int)PASSWORD + 0x10U < *(uint *)&PASSWORD_CURSOR) {
*(undefined2 *)&PASSWORD_CURSOR = (int)PASSWORD;
}
piVar2 = (int *)&PRNG_CURSOR;
*piVar2 = *piVar2 + 1;
/* OMG, this is: if (PRNG_CURSOR >= (int)PRNG_START+0x7f): */
if ((int)PRNG_START + 0x7eU < *(uint *)&PRNG_CURSOR) {
*(undefined2 *)&PRNG_CURSOR = (int)PRNG_START;
}
}
/* assume DS=0, passing TMP_WRITE_BUFFER(=0x51ee) a write_to_file */
buffer_00 = (char *)CONCAT22(unaff_DS,0x51ee);
}
iVar3 = write_to_file(dos_file_handler,buffer_00,nbytes);
if (iVar3 != nbytes) {
return 1;
}
return 0;
}
Once again, if we go to the callers, we will see the following snippet,
letting us know that this is the actual "save" function. Note as well that
I also included an annotation for "init_prng" on the only function being
called from here that writes on top of the PASSWORD global variable:
[...]
strcpy((char *)CONCAT22(unaff_DS,0x61f0),(char *)CONCAT22(unaff_DS,0x4b00));
init_prng();
pbVar1 = (byte *)0x3e81;
*pbVar1 = *pbVar1 | 0x80;
FUN_1000_0323(0x455d);
local_8 = *(undefined2 *)0x3eb2;
local_10 = *(undefined2 *)0x3eb4;
iVar5 = *(int *)0x3eba;
local_236 = (undefined *)*(int *)0x3eb8;
iVar6 = FUN_1000_31f6(param_1,unaff_DS,0x8302);
if (iVar6 == -1) {
*(undefined *)PASSWORD = 0;
return false;
}
FUN_1000_0323(0x456b);
piVar4 = *(int **)0x3e72;
dos_file_handler = *piVar4;
local_a = piVar4[1];
fp = (uint *)CONCAT22(local_a,dos_file_handler);
local_c = dos_file_handler;
iVar6 = write_to_file(dos_file_handler,(char *)fp,5);
local_22a = (uint)(iVar6 != 5);
local_c = local_c + 5;
local_232 = (*(uint *)0x3ebe >> 0xc) + *(int *)0x3ebc * 4;
uVar7 = *(uint *)0x3ebe & 0xfff;
if (uVar7 < 5) {
local_22e = uVar7 + 0xffb;
local_232 = local_232 - 1;
}
else {
local_22e = uVar7 - 5;
}
for (local_228 = 0; (iVar6 = (int)fp, local_22a == 0 && (local_228 < local_232));
local_228 = local_228 + 1) {
local_22a = write_block(local_c,(char *)CONCAT22(local_a,local_c),0x1000);
fp = (uint *)CONCAT22(unaff_SS,&local_c);
dos_file_handler = 0x212f;
add_offset_to_fp(fp,0x1000);
}
if ((local_22a == 0) && (local_22e != 0)) {
buffer = (char *)CONCAT22(local_a,local_c);
dos_file_handler = local_c;
local_22a = write_block(local_c,buffer,local_22e);
iVar6 = (int)buffer;
}
FUN_1000_0323(0x456a);
uVar10 = 0x2171;
iVar8 = fclose(iVar6);
if (iVar8 != 0) {
local_22a = local_22a + 1;
}
*(undefined *)PASSWORD = 0;
unaff_SS = iVar5;
[...]
Finally we have "init_prng", showing us the actual chunk of code that is used
to build the KDF stream. Here we can also see that the generator is a
Lehmer RNG, an even weaker LCG variant:
https://en.wikipedia.org/wiki/Lehmer_random_number_generator
With the state in a uint16_t variable PRNG_VAL_3b12, using 0x105 as the
multiplier and 0xfff1 as the prime modulus in order to achieve the maximum
period, m-1 = 0xfff0 (almost 64kB):
void init_prng(void)
{
int *piVar1;
uint *puVar2;
int iVar3;
undefined2 unaff_DS;
uint local_4;
if (*(char *)PASSWORD != '\0') {
*(undefined2 *)&PRNG_VAL_3b12 = 0;
for (local_4 = 0; ((int)local_4 < 0x11 && (*(char *)((int)PASSWORD + local_4) != '\0'));
local_4 = local_4 + 1) {
if ((local_4 & 1) == 0) {
piVar1 = (int *)&PRNG_VAL_3b12;
*piVar1 = *piVar1 + *(int *)((int)PASSWORD + local_4);
}
else {
puVar2 = (uint *)&PRNG_VAL_3b12;
*puVar2 = *puVar2 ^ *(uint *)((int)PASSWORD + local_4);
}
}
for (; (int)local_4 < 0x11; local_4 = local_4 + 1) {
*(undefined *)((int)PASSWORD + local_4) = 0;
}
if (*(int *)&PRNG_VAL_3b12 == 0) {
*(undefined2 *)&PRNG_VAL_3b12 = 0x4b45;
}
local_4 = 0;
do {
iVar3 = (int)(((ulong)*(uint *)&PRNG_VAL_3b12 * 0x105) % 0xfff1);
if (iVar3 == 0) {
iVar3 = 0x4b45;
}
*(int *)&PRNG_VAL_3b12 = iVar3;
*(undefined2 *)((int)PRNG_START + local_4) = *(undefined2 *)&PRNG_VAL_3b12;
local_4 = local_4 + 2;
} while ((int)local_4 < 0x80);
*(undefined2 *)&PRNG_CURSOR = (int)PRNG_START;
*(undefined2 *)&PASSWORD_CURSOR = (int)PASSWORD;
}
return;
}
Here is an easy to understand re-implementation in lua:
function password_seed(password)
-- super dumb HP 200LX Memo's PRNG!
local seed = 0
local function get_int16le(pos)
local lo, hi = password:byte(pos, pos+1)
return lo + (hi or 0)*256
end
for i = 1, #password do
if i%2 == 1 then -- first, third, fifth... chars
seed = seed + get_int16le(i)
else -- second, fourth, sixth.. chars
seed = bit.bxor(seed, get_int16le(i))
end
end
seed = bit.band(seed, 0xffff)
if seed == 0 then seed = 0x4b45 end
return seed
end
function gensecret(seed)
-- let's build 127 pseudo-random bytes
local res = {}
for i = 1, 127 do
seed = seed * 0x105 % 0xfff1
if seed == 0 then seed = 0x4b45 end
res[#res+1] = bit.band(seed, 0xff) -- lo byte first
res[#res+1] = bit.rshift(seed, 8)
end
return res
end
function kdf(password)
return gensecret(password_seed(password))
end
And to fully decrypt the file we only need to tie the loose ends:
function crypt(text, password, secret)
assert(#password<17)
local p = (password..('\0'):rep(16)):sub(1,17)
return text:sub(1,5)..text:sub(6):gsub('()(.)',
function(i,c)
return string.char(bit.bxor(
c:byte(),
secret[(i-1)%127+1],
p:sub((i-1)%17+1):byte()))
end)
end
> crypt(plaintext, 'mypassword', kdf'mypassword') == encrypted
true
> crypt(encrypted, 'mypassword', kdf'mypassword') == plaintext
true
Loading error: [string " return then 'COULD WE CRACK IT?' "]:1: unexpected symbol near 'then'
So finally we can present a few ideas on how we could crack the password with
only the encrypted file:
Note that positions 0x32..0x35 are shared with 0xba..0xbd (because their
indexes differ by 17*8), thus:
secret[0x32-4]^secret[0xba] =
= (c:byte(0x32+1)^0xff^p) ^ (c:byte(0xba+1)^0x87^p)
= c:byte(0x32+1)^0xff ^ c:byte(0xba+1)^0x87, a known value!
So we only need to hash the seeds by their xors at these shared positions
to obtain the possible seeds, and if there's only one seed[*] then we can
already deduce 4 characters of the password.
[*] And there will be only one seed because the KDF dumps the full state
every two bytes (there's no hidden state).
Additionally the password stream has a null byte every 17 bytes, and some
of them fall in positions that always have a known value independent of
the content or parameters of the file, so we could use those to additionally
filter out the state.
Then by using the known fixed values of any file and we have obtained the
kdf seed, we can obtain the password by XORing certain positions of the
ciphertext with the kdf stream and the known plaintext values.
This is an hex dump of a bare-bones example document where I enclosed
in brackets the constant values (there may be more):
00000000: ff 32 00 45 4b 09 00 f9 00 00 00 00 00 02 00 03 .2.EK...........
00000010: 00 00 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ................
00000020: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ................
00000030: 80(31 ff ff 88 01 00 00)00(00)00 00 00 00 00 00 .1..............
00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
000000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
000000b0: 00 00 00 00 00 00 00 00 00 00(87 ff ff 88 02 00 ................
000000c0: 00)00(00)00 00 00 00 00 00 00 00 00 00 00 00 00 ................
000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000140: 00 00 00(87 ff)74 68 69 73 20 69 73 20 65 6e 63 .....this is enc
00000150: 72 79 70 74 65 64 20 74 65 78 74 41 41 41 41 41 rypted textAAAAA
00000160: 41 41 41 41 41 41 AAAAAA
With this we can obtain the following characters: "yy3y567890yyyyyy"
Additionally if either the header or the footer have less than 128 bytes
then we can use the padded null bytes to continue obtaining more and more
characters from the password. Furthermore, if the tabs have not been
changed on columns 200..248 (and I don't think nobody would mess with the
tabs, especially on these columns) then we could also crack the password
deterministically.
And finally if that wasn't enough, then after obtaining the KDF seed and
XORing its stream into the ciphertext; what would remain is just a simple
Vigenère of length 17 (with the minor exception of having the addition
done on GF(2) instead of Z(mod 256))... We can finally perform basic
frequency analysis (which is particularly easy to do when we have
standard English text with spaces in between the words).
TLDR: Don't Roll Your Own Crypto.
Even though in this blog post I broke this joke of a cryptosystem (it took me
only a few days, less than a week), there are experts all around the world
that know what they're doing and can even perform attack on real world
protocols and implementations.
Always use popular open source implementations. It is easier to check the
quality of the code and see if the coding practices are good, than to implement
your own library. Crypto code can be surprisingly tricky, and even when the
source code is public and reviewed thousands of times the bugs can still be
hidden. On closed source software this is even worse.
Always use open standards. Most protocols in modern cryptography go through
serious review processes before being published, and after that it is
very common to see the scientific community finding attacks to them.
Always use initialization vectors with stream ciphers (with most ciphers btw),
and be very careful in how you obtain those IVs, as the security of the system
can be dependent on these values being used only once.
Be also careful on the boring details: protecting a piece of software from
side channel attacks (cache, timing, power, etc) is extremely difficult even
for experts on the subject.
How you use the algorithms is also very important. Modes of operation,
padding construction, choosing and validation of session parameters, number of
rounds discarded and performed, among many other protocol characteristics
usually play a crucial role in the whole security and failing to properly
implement them can break the whole system. Example:
https://medium.com/@c0D3M/bleichenbacher-attack-explained-bc630f88ff25
https://en.wikipedia.org/wiki/Padding_oracle_attack
Thank you for reading this blog post and feel free to leave any comments,
corrections are specially welcomed!