The same class of mathematical techniques used in databases to index or retrieve items may be employed in cryptography, for the encryption of messages and data. These **“hashing algorithms”** take a number of forms, and for encryption purposes, they cover a range of strengths and weaknesses.

We’ll begin by dealing with some terminology.

## What is Hashing?

As the name loosely suggests, hashing is a process for chopping up and mixing information. In a typical data operation, a string of characters will be transformed into a shorter string of fixed length which represents the original character string. This reduced-length value is also known as a key.

For database indexing and retrieval, it’s usually much quicker and easier to search for individual records on the basis of their shorter hashed keys, rather than their original values. With small enough data sets, it’s possible to generate a unique hash key for each record. And for encryption purposes, hashing a long string of characters (such as a message or data set) produces a shortened key that may be as unique as a fingerprint.

### Hash Values

In a hashing operation, each character set is reduced to its corresponding hash value according to a specific formula which is applied to all character strings in the data set or message.

Converting varied strings of characters into reduced keys of fixed length has the virtue of reducing complexity when searching for specific database entries, or when storing and transmitting the cryptographic form of an encrypted message.

A search for any entry begins with the calculation of its hash value, then sifting through the data to find a match. For cryptography, hashing is used to encrypt and decrypt the digital signatures which authenticate the senders and receivers of different messages. The hash values here (also known as message-digests) are the reduced forms of the digital signatures.

### Hash Functions

Hash functions or hashing algorithms are the mathematical procedures used in computing hash values from base input numbers and character strings. They may be highly complex, and can produce a hash value that’s almost impossible to derive from the original input data without knowing the applied hash function. For this reason, the keys used in public key encryption may be derived from such algorithms.

Hash functions are used for indexing original data string values or keys, and employed at each subsequent occasion when the information associated with a particular hash value or key needs to be retrieved. In this sense, the functions are one-way operations: They don’t need to be “reverse engineered” from the hash values derived – and hash functions are designed to be resistant to this kind of analysis.

### Collision

A well-designed hashing algorithm should not derive the same hash value from two different sets of input – a phenomenon known as collision. In data handling, hash functions are typically employed to determine the position of a particular data string in an array, from a search for its hash value or key. If two or more different keys were to hash to the same position in the data array, confusion (and perhaps a resultant system crash) would be the result.

So hashing algorithms must be designed so as to minimize the risk of collisions.

### The Division-remainder Method

This is a relatively basic hashing algorithm in which the total number of items in a data array is estimated. This figure is then divided into each data string value to give a quotient (which is disregarded) and a remainder – which is taken as the hash value. Clearly, with a large number of figures, many duplicate remainders emerge, leading to the possibility of multiple collisions. So search algorithms for such data sets must take this flaw into account.

### The Folding Method

For numeric strings, the folding method takes the original digits and divides them into several parts, adds these together, then uses a pre-determined number of the last digits resulting as the key or hash value.

### The Radix Transformation Method

Here the number base or radix (base 10, base 2, etc.) of a numeric data set is altered to give a different sequence of numbers. A binary (base 2) data set might for instance be transformed to decimal (base 10) or hexadecimal (base 16), giving a shorter form. Hash values of uniform length may be imposed by allowing higher order digits in the results to be discarded.

### The Digit Rearrangement Method

This method uses a simple substitution in which a pre-determined part of the original data string is reversed, and taken as the hash value for the string.

### Applications in Encryption

As noted earlier, hashing algorithms are used to scramble digital signatures. These are first transformed with the algorithm to give a message-digest, then sent to the recipient of a coded transmission along with the associated hash value or key using separate transmissions.

By employing the same hashing algorithm as the sender, the recipient obtains a message-digest from the unaltered digital signature which may be compared with the message-digest that came in the accompanying transmission. If they match, the transmission was secure.

### The Rivest Message-Digest Algorithms

Several cryptographic hashing algorithms were developed by Ronald Rivest (the “R” in RSA encryption). In 1989 he created the MD2 Message-Digest Algorithm, which produces a 128-bit hash value from a message of arbitrary length. Messages are first padded so that their length in bytes forms a multiple of 16. A checksum of 16 bytes is added to the padded message, and a hash value is calculated from the result.

MD2 was optimized for 8-bit computers, and specified in RFC 1319. The algorithm was designed for deployment with digital signatures, in transactions where a large file has to be condensed down securely before being signed with a secret or private key under a public key encryption system like RSA. On its first appearance, it was suggested that with MD2 it would be practically impossible to generate two messages with the same message-digest, or to reconstruct an original message from its digest alone. But hacking techniques quickly evolved, and new algorithms were required.

MD4 (developed by Rivest in 1990) and MD5 (1991) soon followed. MD5 was an improvement on its predecessor, with an algorithm requiring a four-part calculation to produce a message-digest of 128 bits, which is optimized for use on 32-bit computer systems.

Researchers uncovered some major weaknesses in the MD5 hashing algorithm in December 2008, while staging a successful attack on MD5-based signatures in X.509 digital certificates. In their simulated assault, the attackers were able to alter several aspects of the digital certificate including its validity period, serial number and subject line – without changing its apparent hash value. This signaled a need for stronger encryption algorithms.

### Secure Hash Algorithm (SHA)

The Secure Hash Algorithm (SHA) is an algorithm designed by the United States National Security Agency (NSA). Its first revision (SHA-1) was actually published in 1994 by the NIST and is now a U.S. Federal Information Processing Standard. Unlike the MD class of hashing functions, SHA-1 produces a 160-bit (or 20 byte) message-digest, which is more resistant than MD5 to brute force attacks.

The standard has since progressed to SHA-2, which is gaining acceptance and integration with a number of open source platforms and methods of secure transfer. Beyond its applications with digital signatures, the HMAC versions of SHA-1 and SHA-2 extend the range of functions, when used in conjunction with a password.

### Some Resources

To see some hashing algorithms in action, there are a number of free online tools for generating hash values – including the algorithm that’s used for mining Bitcoins.

Share this Post