Understanding Password Hashing

30 September 2019

Password hashing is a required security measure to protect our users. In this article, I aim to give a brief overview of why that is the case, and the best practices around doing so, particularly around salting.

You see, our brains can only hold so much information. So it’s no wonder internet users today continue to use the same password for multiple websites. They often use weak passwords and only strengthen it when validation rules require it.

It’s easy to scoff and scorn at them as you continue to punch in your 2FA codes and use your password manager (you are using one, right?). However, you are not the average internet user so it requires us to have a little empathy towards the general public.

As with any sensitive data, it is our responsibility to ensure it is secure, particularly if the users themselves are not doing so.

Storing Passwords as Plain Text

We have moved on from the days of storing passwords in plain text and is something that should never be done.

If someone has read access to user records in our database that stores plain text passwords, then it’s game over. They have instant access to the credentials with no other measures in place to prevent them from obtaining them.

We might say “well our server is so secure, that would never happen!” and whilst I’m sure we have taken every reasonable measure to ensure security, we can never be certain. It takes a single vulnerability, mistake or person with read access to our database for that statement to be false.

So with the assumption that our server is not 100% secure, let’s talk about password hashing.

What is Hashing?

Hashing is a deterministic algorithm that takes data of arbitrary length and turns it into a fixed length value simply called a hash. There are several hashing algorithms, with the most popular being MD5, SHA-1, SHA-2 and SHA-3.

One typical use of a hash is to check the integrity of a file when downloading. You might have noticed somewhere on the download page of a website that an md5 hash is shown. The intention is that once your file is downloaded, you can compute the hash on your local machine, and compare the two. The MD5 hash function is preinstalled on MacOS, so it’s as simple as running md5 ./downloaded_program.dmg. If they do not match, then the file may been tampered or not been delivered successfully. In either case, you probably don’t want to execute that program.

Why Hashing?

Hashing is one-way, meaning it cannot be reversed. Therefore an attacker cannot take a hash and reverse the process in order to get the value. The only way to obtain the value is by computing hashes and making comparisons using clever guesswork.

Types of Password Attacks

Brute Force

In a brute force attack, the attacker tries every possible password combination to try to obtain the password.

It could be done online where a request is made to the server, or offline where if a single hash is known, they would generate a hash from an input and compare the two.

An incredibly naive approach might be to first check each character chronologically, a, b, c and then followed by 2 characters aa, ab etc. This probably wouldn’t be smart because it’s highly likely the password validation requires a minimum length. So to speed things up, an attacker would consider the validation requirements or assume if they are unable to retrieve them.

Truthfully, brute forcing passwords has, and always will be an unsolved problem. However I say problem lightly, because there are ways we can make it really, really hard.

Dictionary Attack

In a dictionary attack, an attacker creates a list of common passwords and the hashed values of them before even looking at your user table. With this list created, lookups are done to see if any of the hashes generated match the hashes in the user table. What makes a dictionary attack so easy is that the precomputed table could be used on various hacked user tables that use a simple hashing algorithm.

Rainbow Table Attack

A rainbow table attack is usually an offline-only attack. A table is created and hashes are precomputed based on what is known about the system like the hashing algorithm methods used or the password validation. It is similar to a brute force attack except that everything is pregenerated before comparison and is used to crack multiple passwords.

Best Practices

Now we know about the attacks we are dealing with and what hashing is, let’s go over some best practices.

Use a Slow Hash Function

As engineers we strive for fast code so this may sound counterintuitive, but a property of what makes a good hash function is one that is slow.

The reason is that the attacker will also be using the same algorithm. In our server we only have to run the algorithm once when either creating a user, updating their password or authenticating a user. However for an attacker they will need put their CPU to serious work and run the algorithm a multitude of times. With less password hashes being computed per second, the less chance the attacker obtaining the password.

This is why the popular algorithms MD5, SHA-1, SHA-2 and SHA-3 are not suited because they are all fast by design. So instead, use a slow algorithm like PBKDF2, bcrypt and scrypt. bcrypt appears to be the most widely used as the computation cannot be accelerated on GPUs as easily.

Salting

A salt is a random piece of data used as an additional input to the hash function. Typically the salt is just appended to the end of the password before the main hashing algorithm begins. A longer salt is best to increase complexity.

For the sake of simplicity, we could opt for a single salt for the whole system. This is great because a dictionary attack cannot occur as the system is now unique in the way it hashes passwords.

However, we do not prevent a rainbow table attack. This is because if a rainbow table is created with lots of hashes, this can be used on multiple user records to crack multiple passwords without further computation per-user. It also means that in a brute force attack where a single password is obtained, the same hash will also be stored on a user record if that user also has the same password. So the attacker might indeed actually obtain more passwords that just the one.

The way to solve this issue is to have a unique salt per user, it means that a rainbow table attack would only be able to be used for a single password.

There’s a couple of ways you may store this salt. In this some approaches you add the salt column in your users table. However in some cryptographic hash functions, the salt is actually included as part of the hash itself. For example bcrypt appends the salt to the value so there is no need for you to store it seperately.

Peppering

A pepper is a secret key that is used as a third parameter to the hash function. A secret key is not something you would store in your user records or expose for obvious reasons.

This is great if our attacker has limited access to our server. However, if someone has access to our database, it’s likely they will have access to the disk and possibly memory, where the pepper would be stored. In this case, a compromised secret key results in no increase in security. For this reason peppers are not commonly implemented.

Increasing Password Validation

Password hashing is not an alternative to forcing users to use strong passwords.

By adding some serious password validation, we increase the complexity of user passwords greatly which means more hashes will need to be computed. This will prevent common passwords which will often help against dictionary attacks. However they can still be prone to them as users often get around the validation by doing the bare minimum, e.g. password becomes password123.