md5sum vs. PHP's md5() function

Posted by sam Tue, 26 Jan 2010 22:06:00 GMT

I recently had cause to reproduce a client’s bespoke PHP-based site internally as part of a conversion process to the CMS that my employer sells. The site was a custom hand-rolled effort by a web design company, with all of the usual problems that accompany home-grown PHP-based code. Did I mention that I hate PHP?

The developer at least had applied some clue by hashing the passwords before they were stored in the database. No sign of a salt, but hey - at least some effort was made to keep things secure. As we needed a logon to this site pretty quickly, I did a quick and dirty grep of the code and discovered how the passwords were being stored.

grep -i password `find . -name "*.php"`
...
./site/update-profile.php:                      $password = md5($userPassword);

Cool. Thinking it would be a simple matter of cooking up an insert statement based on what I had learnt about the database schema I decided I needed to hash me a password:

$ echo "password" | md5sum
286755fad04869ca523320acce0dc6a4  -

The hash was duly inserted and the first logon … failed. Under time pressure the simplest thing to do seemed to be to use the same function as the site itself to generate the hash:

$ php -r 'echo md5("password");'
5f4dcc3b5aa765d61d8327deb882cf99

Sure enough, the credentials worked and our team could continue porting the site away from the nasty PHP hack (Did I mention? Oh, never mind …).

The md5 issue was bugging me though. Surely an md5 hash is an md5 hash is an md5 hash. Time to examine some assumptions (those pesky assumptions will always come back to bite you!).

For the same input we had different md5 sums being generated. This cannot be true if the sums are both calculated correctly, so the input must be different. The shell in this case was bash(1); whether we’re using the builtin(1) or independent binary version of echo, we should remember that without some work, newlines are added implicity. Despite being just one byte (or more, depending on a whole host of things), any difference between inputs will produce wildy different hashes.

Let’s check our theory:

$ php -r 'echo md5("password");'
5f4dcc3b5aa765d61d8327deb882cf99
$ /bin/echo -e "password\c" | md5sum
5f4dcc3b5aa765d61d8327deb882cf99 -

The switch and \c in the echo command there enable the interpretation of backslash escapes and suppress the trailing newline respectively.

Theory proven! After that I had to recompile PHP with –enable-calendar so that the right Gregorian-Julian date conversion functions were present. Apart from that it was quite painless, and we’re well on the way to completing the customer conversion.