Almost everything I know about passwords

We all have at least half a dozen passwords that bring us access to a numberless contents and services but what is behind the sign in forms, how are or should be stored this credentials, how are or should be they checked. In this article I will cover my knowledge on passwords.

Let’s start with how are the credentials stored

Basically there are three ways to store the password: unciphered, symmetric and asymmetric ciphering. Do not be scared by the terms, we will easly explain them.

Storing unciphered passwords is the easiest and most vulnerable way, besides being illegal in Spain because it violate the requirements of the Law on Protection of Personal Data. However, whether because of laziness or ignorance of the programmer, it is still used on some occasions.

As a part of the sign up, the website just stores in the database the username and password as it was written by the user. We will depict it so:

User   Password
------ ---------
its-me mine-one

Obviously who steals this information, immediately has the passwords of all the users, without any extra effort, what doesn´t mean that it is easy to steal.

After the sign up, when the user signs in, the system has two ways to check up if the user has written his password, the right one and the even-more-vulnerable.

In the right way, or let’s say that at least don´t make the scene worse, the programming language tells the database language (SQL) something that we will translate for humans like «give me the record in the database whose field username is ‘its-me’» and then the programming language checks up if the field password contains the same text that the user has written in the sign up form. For nerds I will write some pseudocode:

if record[password] equals to form[password] then
login is true
else
login is false

In the wrong way, the negligence allows the attacker to get identified as any user without knowing the password. It can also be used when the password was stored ciphered, but I explain it now that we are in the worst scenario.

In this case the programming language tells the database language (SQL) something than we translate for humans like «give me the record in the database whose field username is ‘its-me’ and password is ‘mine-one’». The problem here is that we are giving the supposed user password to the database without any validation. So what happens if I say that my password is this «’ or 1=1», that the programming language tells the database language (SQL) something that we translate for humans like «give me the record in the database whose field username is ‘its-me’ and password is » or 1=1» (we changed where we wrote ‘mine-one’ by this maliciously and twisted written password) which is always true because although the password is not empty, 1 is always equal to 1, so the programming language gets the record and believes that the identification was legitimate.

One way to identify whether the web is so shabby, I mean unsafe, is to click on ‘remember password’. If they send it by mail, that is, if they know what is your password, cancel your account immediately.

Storing the password with a symmetric ciphering is a fairly reliable method but I think almost nobody uses it because if someone thinks about using ciphering, directly uses the asymmetric ciphering unless there is a real reason to be availe to know your password.

During the sign up, the website writes down the username and the result of a cipher algorithm over the password chosen by the user. The symmetric algorithm has this name because it can cipher and also decipher if you know the key with which it was ciphered. We will depict it so:

CipheredPassword = symmetric_cipher(UserPassword, websitekey)
UserPassword = symmetric_decipher(CipheredPassword, websitekey)

For simplicity, we will assume that symmetric ciphering simply shifts every letter with a subsequent, and web key is +1, meaning that change each letter to the next letter, so the result would be ‘njof-pof’ (subsequent to the ‘m’ is the ‘n’, subsequent to the ‘i’ is ‘j’, subsequent to the ‘n’ is the ‘o’, etc.)

User   Password
------ ---------
its-me njof-pof

Now, if someone steals this information, he does not know what are the passwords, in order to know it, he will need two things: to know the ciphering algorithm and to know what is the key of the web. So it’s a lot safer because it would have to steal data but also code. Even if the algorithm were so simple, knowing one (the attacker’s own) could infer both the algorithm and the encryption key. Luckily there are far more effective.

During subsequent identification when checking if the user has entered his password, the most probably is opting for the safe method that earlier explained, where the programming language tells the database language (SQL) something that we translate for humans lik «give me the registration table field users whose username is ‘its-me’» and then it can do two things: either decipher the password that was stored and compare it with the password entered by the user or cipher the password entered by the user and check if it matches the one previously stored. For nerds I will write some pseudo code of the first case:

if symmetric_decipher(record[user_password], web_password) equals to form[user_password] then
login is true
else
login es false

In this scenario could also happen that when pressing ‘remember password’, they mail it to you and you could think that you are in the previous situation. It would be a little bit unfair with them but we can not be sure.

To store the password with asymmetric ciphering iis the most reliable method that we will cover.

During the sign up, the website stores the username and the result of a cipher algorithm over the password chosen by the user. The asymmetric algorithm has this name because it can cipher but no decipher. We will depict it so:

CipheredPassword = symmetric_cipher(UserPassword)

The asymmetric algorithm gives a result that, although it is not unique, it is highly improbable that matches another. In case it uses the algorithm MD5, the number of possible combinations is 680 sixtrillions of possible results. Of course, the number of possible passwords is infinite so there could be some collition but, it is not vital.

User   Password
------ --------------------------------
its-me 04fc6d58f3a7f1968dd8a89d1f44e511

If you cannot uncipher, how is it possible to know if the given password is truly his password. You know the answer because the solution is one of the options we saw in the previous scenario and therefore I exemplify the other one, to do this one here.

During the subsequent sign in, when it is time to check up whether the user has given his password, the programming language tells the database language (SQL) something that we will translate for humans like «give me the record in the database whose field username is ‘its-me’» and then the programming language cipher again the user password and checks if it matches with the stored value. For nerds I will write some pseudocode:

if record[userpassword] equals to asymmetric_encrypt(form[userpassword]) then
login is true
else
login es false

Ways to attack

The most common way is just a person making reasonable combinations of your personal data. If for example your name is Ainara and you were born on December 1, 1980, it would be reasonable to think that your password is ainara80 or 011280 or any more stupid combinations. I will not spend more words, if you usually do things like that, read the ‘Healthy Passwords’ section.

A classical way is called brute force. Circulates through internet immense lists of common words in passwords. This attack consists essentially in trying all of them until one matches, if any does.

Nowadays a widely used way is called phishing, which makes you think that you are where you are not. Somehow, usually by e-mail, it induces you to visit a website, usually a bank, but really the link you click goes to a server where you see a very similar website which expects that you naively enter your real identification data. After saving them, you are notified of some unspecified issue due to you can not continue.

The most complex way is called ‘man in the middle’ because it involves spying communication between your computer and the destination server for ‘watching’ your password. In principle this is solved when the form is protected by a security certificate that makes data traveling through an encrypted communication environment; but I would not swear that this can not be violated in any case, anyway it’s the best solution we have.

What means that passwords of a website have been stolen?

Now you can get an idea of what means that the passwords from a website have been stolen. They may have a bunch of unciphered passwords, either because they were stored so or because they have been intercepted, in which case all users should be really worried; they may have a lot of ciphered passwords and the method to decipher them, in which case all users should be really worried; or maybe they have a bunch of ciphered passwords with no way to decipher them, case that we are going to cover now.

Many sites put some additional barriers to improve safety such as a maximum number of failed attempts to sign in, after which you can not try it again until some time later or additionally have to solve a captcha. So why could be usefull to have a lot of ciphered keys, to try a brute force attack in an external computer bypassing these limitations, and once they have the right password, they can use it.

Data encryption

Just because I have promised to tell everything I know, I will also go into this but if I’m not being exciting, you can jump to the next block.

Being purists we should encrypt something more than your password, all your data that is also not public. This should be mandatory in all web containing sensitive information like medical information. Obviously, in this scenario, only symmetric algorithms can be used. The ciphering key would be your identification password and therefore they will have access to this information only while you are using the web and the unciphered password is stored temporarily in memory.

Let’s suppose a field with information on diseases with a value like ‘cardiac arrhythmia’, this would be stored ciphered with your password (‘mine-one’) and therefore its value would be different from the value of the same disease with the password of another user (‘his-one’). So, you not only stay protected from attacks (you don´t want anybody knows your password, but you also don´t want that anybody knows your diseases if the information is stolen), with this protection you are also safe from a statistical use your data, preventing that -for example- someone get a list of how many persons with the same surname, have also a disease in common. This wouldn´t be possible because the diseases have different values depending on every user pasword and won´t be possible to know if they are the same.

User: yo
Password: 04fc6d58f3a7f1968dd8a89d1f44e511 ('mine-one' algoritmo MD5)
Diseases: Gko7dOAmH9xiB59PJQMJk9n0aLHkWRnyo38H0IpEyticyNZzDmWm1iIvb3hvLgtkpA3r6KLs9hQ
('cardiac arrhythmia' XTEA algorithm using 'mine-one' as key)
User: el
Password: fd2819e43489d1f419464b1a1e27d8eb ('his-one' algoritmo MD5)
Diseases: wBIF5aTKPwHvQxNbPOcPo8ujGyd6LBYzX3IwJMzBArJYg+nTUZmYMaVnFAbE6VRdbyppIUqYuLU
('cardiac arrhythmia' XTEA algorithm using 'his-one' as key)

Although it visible that your diseases are ciphered with your password, as they do not know your password, they can not know your diseases, only while you are using the Web, which is the time that you consent that your password is stored in memory.

This layer of protection has an overcost for the web because when you change your password, all your information must be unciphered and ciphered again.

Healthy passwords

Many webs require a minimum-length of the password and certain level of ‘chaos’ mixing letters and numbers, upper and lowercase, … Creativity composing the rules can even make impossible to find a password reasonably easy to remember that matches all the requirements, so some pages allow you freedom and simply report how much secure seems to be your password, although if tomorrow a partial theft happen to them, they will be the ones with all the ignominy.

That outbreak originality of ch4ng1ng l3tt3rs by numb3rs was a great way for many people to adopt a good practice. Surely this makes grow exponentially the combinations to be tested, either manually or brute force.

A common mistake is to use the same password in all websites because obviously once they have gotten your password in one site, they have access to a lot of more other sites. Of course, we can not remember a different password for each site, or we can … An easy solution to this problem is to have passwords that mix a fixed part and a variable part. The fixed part is that common part that we find easy to remember and the variable part is something we think is significant in that web, when our fixed part is ainara80, our password to gmail could be ainara80ggl (ggl stands for google), ainara80gml (gml stands for gmail) or … so, the attacker must not only know the fixed part but guess what you highlighted and how did you do the variable part. It’s not hackerproof, but make harder attacker’s work.