Mathematical permutations and statistical probabilities frequently pose a problem for befuddled high school students, struggling to cope with repetitive calculations and weird symbols. This often occurs because, behind the figures, there’s an underlying logic (which often seems to be anything but logical) that makes what appears on the surface to be an obvious situation or conclusion actually seem anything but.

And when the apparent disconnect between logic and science spills over into the “real” world, the results can have serious consequences. One such example concerns what’s known as the **Birthday Paradox**.

## The Birthday Paradox

Consider this: In order for there to be a 50% probability that someone in the same room with you has the same birthday as you, there need to be 253 people in the room.

BUT: In order for there to be a probability of 50% or more that two people in the same room share the same birthday, you only need 23 people to be in the room.

### The Underlying Theory of the Birthday Paradox

If you’re still scratching your head over that one, the answers lie in probability theory and the principles of number matching.

In the case of our Birthday Paradox, probability theory concerns the likelihood that in a group of randomly chosen people in a room, some pair of them will share the same birthday. The so-called “pigeonhole principle” of probability sets that likelihood at 100% if there are 367 people in the room – since in a standard or leap year (365 or 366 days) there are only 366 possible birthdays.

These likelihoods or probabilities are the result of simple third-party number matching: As if an outside observer is looking over all the numbers to find matching pairs. Under this condition – and assuming that each day of the year except February 29th is equally likely as a birth date – there’s a 99.9% probability that a match is observed with a group of 70 people and a 50% probability with 23.

Introduce the viewpoint of someone in the first person, looking through the room to find someone with a matching birthday to their own, and the situation goes paradoxical, requiring there to be 253 people in the room. This happens because the birthday matches are based on pairs – and it would take 253 pairings to achieve a 50% probability of matching to a single (the first person’s) birthday.

But if each person in the room is allowed to link with every other person to find a matching pair (without the single birthday limitation), the number of people required in the room to achieve a 50% likelihood falls to 23.

### The Actual Mathematics

Given a supply of random inputs, if some mathematical function returns one from a total of k equally likely values, then if that same function is evaluated repeatedly for different inputs, we can expect to obtain the same output after approximately 1.2k1/2 attempts. This is sometimes written in the form of 1.2(vk)1.2(k) evaluations.

In our Birthday Paradox case, the value of k may be taken as 365 (as in, days of the year – the leap year 366 not affecting the overall value significantly).

### The Gist of It – In a Nutshell

For any given set of independently generated numbers, finding matching pairs isn’t as hit or miss as you might think.

### Hash Functions and the Collision Problem

In cryptography, hash functions are mathematical processes used to reduce larger strings of data into small strings of fixed length (known as hash values or keys) that represent the larger string. For an ideal encryption scenario, each hash value is unique, and when decrypted can be traced back to the one specific string which produced it.

But encryption may be applied over very large data sets, and for a given hash function, the possibility exists that two different inputs may be hashed down to the same output value. When this happens, collision has been said to occur – and this duplication in hash values (also known as checksums, fingerprints, or digests) may have security implications.

### Birthday Attacks

Birthday attacks are a specialized form of brute force assault used to find collisions in a cryptographic hash function. The attacks exploit the underlying probability theory and mathematics of the Birthday Paradox situation to help reduce the number of data samples and iterations on a target required to find matching pairs.

In accordance with the mathematical equation above, one of k equally likely values can be expected to return the same output after around 1.2(vk)1.2(k) evaluations have been performed on different inputs.

And the underlying principle exploits the difference between the first and third-person approaches to the birthday matching problem, through repeated attempts to locate two inputs that hash to the same value, rather than taking the harder course of trying to find something that collides with a specific hash value.

### The Problem with Digital Signatures

Digital signatures may become vulnerable to birthday attacks.

Under normal conditions, a message can be digitally signed by computing a cryptographic hash function for it, then signing it with an associated secret key. But an attacker could produce two versions of a message: One that looks legitimate, but has been altered with (for example) extra commas, blank lines, or spaces to pad it out to a certain size, and another (fraudulent) document, also padded out.

Through trial and error (easy enough, with the right software and computing power), a digital signature’s hash function may be applied to both the legitimate (but padded) and fraudulent document until they return the same hash value. Any number of scams involving document switches and/or getting recipients to sign one or other version of the document may follow.

### Counter-Measures

Mathematical protection against a birthday attack may be derived by making the output length of the hash function for a digital signature scheme so large that a brute force birthday assault becomes impractical for the assailant. Typically, this would need to be about twice as many bits as would be required to counter an ordinary brute force attack.

The person signing a document may also provide some measure of protection by making some random changes of their own (like commas, and spaces) that don’t affect the overall form of the message – and keeping a personal record of these alterations and a copy of the signed contract, as proof of both their own signature, and of the alterations.

Share this Post