One of the main workhorses of modern cryptography are collision resistant hash functions, such as MD5. Given an (almost) arbitrarily long input M, they produce a fixed-size output H(M). Collision resistance means that it is infeasible to find two different inputs M and M' with the same hash H(M)=H(M'). Note that many collisions exist, but it has to be infeasible to actually find even a single collision!
Hash functions are almost omnipresent in today's cryptography, e.g. in digital signatures. Instead of signing a long message M, you simply sign its hash H(M). This is useful and simplifies many issues ... but if H(M) is identical to H(M') then the signature is also valid for M'.
Alice has been an intern, working some weeks in Rome at the office of, say, Julius Caesar. Depending on the point of view, the story develops quite differently.
At the day Alice is supposed to leave, Caesar writes a letter of recommendation for Alice -- on paper. The same day, she asks Caesar to digitally sign the letter. For his convenience she presents an electronic copy of the document. Caesar opens the document -- it looks exactly like the original document. So he signs the document.
Months later, Caesar discovers that there has been a breach of secrecy with his French affair files. Will he ever find out who tricked him and how?
Being an intern, Alice does not have any access to secret documents. Not enough for her ...
... tricky Alice decides to fool Caesar. Because Caesar is still relying on the widely used MD5 hash function, she implements the attack from Wang and Yu [WY05] to find MD5 collisions. When she receives her letter of recommendation (on paper), she prepares two postscript files with the same MD5 hash:
Now she asks Caesar to sign the letter ... who has no obvious reason to decline.
Due to the hash collision, Caesar's signature for the letter of recommendation is valid for the order, as well. She presents order and digital signature to the person in charge of Caesar's files ... and finally gains access to Caesar's secret documents!
Based on [WY05], we implemented an attack to find random collisions for the MD5 compression function. It took just a few hours on a customary PC. Using the length extension property present in MD5 and all other hash functions following the (almost omnipresent) Merkle-Damgaard design principle, we constructed a pair of documents to display either the letter or the order.
We are preparing a paper on this.
If you look into postscript documents, e.g. by opening them in a text or hex editor, you'll get an idea about what is going on. (We did not try to obfuscate the postscript Code.)
Oh, you can also look at the slides of our presentation at the Eurocrypt 2005 rump session.
Recently, the world of cryptographic hash functions has turned into a mess. A lot of researchers announced algorithms ("attacks") to find collisions for common hash functions such as MD5 and SHA-1 (see [B+, WFLY, WY, WYY-a, WYY-b]). For cryptographers, these results are exciting - but many so-called "practitioners" turned them down as "practically irrelevant". The point is that while it is possible to find colliding messages M and M', these messages appear to be more or less random - or rather, contain a random string of some fixed length (e.g., 1024 bit in the case of MD5). If you cannot exercise control over colliding messages, these collisions are theoretically interesting but harmless, right? In the past few weeks, we have met quite a few people who thought so.
With this page, we want to demonstrate how badly wrong this kind of reasoning is! We hope to provide convincing evidence even for people without much technical or cryptographical background.
There have already been a few exploits of the collision-finding attacks against MD5: Kaminski [Ka] and Mikle [Mi] presented different executables with the same MD5 hash. One of Kaminski's executables was quite harmless, the other one very harmful. Mikle's executables were self-extracting archives, extracting different stuff. Lenstra, Wang and de Weger [LWdW,LdW] constructed colliding X.509 certificates.
These exploits seemed to be rather technical to us. We wanted real documents to collide, which you just open with some standard browser or viewer. Also, we wanted to present some use case for such collisions -- the story of Alice and her boss. We hope, you liked our little story. :-)
Many thanks to Frederik Armknecht and Dirk Stegemann for useful comments and discussions on this work and for helping with the computations to find the necessary collision. Many thanks also to Benne de Weger, who provided some interesting comments and links to related work, and to Ernst Giessmann, who revealed Caesar's real address in (Via Appia instead of Main Road) and gave us a hint how to improve the robustness of our postscript code. Finally, many thanks to Andreas Lucks for his advice on this presentation and the drawing.
Magnus Daum
(CITS Research Group, RUB)
Stefan Lucks
(University of Mannheim)