Hash Function and MD5 Algorithm
A hash function is a mathematical function that takes a variable-length input message and produces a fixed-length output (called hash value or message digest) that acts as a digital fingerprint of the message.
Properties of Hash Function
- Accepts variable-length input
- Produces fixed-length output
- One-way — computationally infeasible to reverse
- Collision-resistant — hard to find two different inputs with the same hash
- Even a small change in input produces a drastically different output
MD5 Algorithm (Message Digest 5)
MD5 is a hash algorithm designed by Ron Rivest that takes an input message of arbitrary length and produces a 128-bit (16-byte) message digest.
Steps to Generate 128-bit Hash Value in MD5
Step A: Append Padding Bits
- The message is padded so that its length becomes congruent to 448 mod 512 (i.e., 64 bits less than a multiple of 512).
- Padding is always added, even if the message is already the correct length.
- Padding consists of a single '1' bit followed by necessary '0' bits.
Step B: Append Length
- A 64-bit representation of the original message length (before padding) is appended.
- Now the total message length is an exact multiple of 512 bits.
Step C: Initialize MD Buffer (Chaining Variables)
- Four 32-bit registers (A, B, C, D) are initialized with fixed values:
- A =
67452301
- B =
EFCDAB89
- C =
98BADCFE
- D =
10325476
- These form the 128-bit initial hash value (4 × 32 = 128 bits).
Step D: Process Each 512-bit Block
- The padded message is divided into 512-bit blocks.
- Each block is further divided into 16 sub-blocks of 32 bits each.
- MD5 performs 4 rounds, each round having 16 operations (total = 64 operations per block).
- Each round uses a different non-linear function:
- Round 1: F(B,C,D)=(B∧C)∨(¬B∧D)
- Round 2: G(B,C,D)=(B∧D)∨(C∧¬D)
- Round 3: H(B,C,D)=B⊕C⊕D
- Round 4: I(B,C,D)=C⊕(B∨¬D)
- Each operation involves: left circular shift, modular addition, addition of a constant T[i] (derived from sine function), and addition of a message sub-block.
Step E: Output
- After processing all blocks, the final values of A, B, C, D are concatenated.
- The result is the 128-bit message digest.
Conclusion
MD5 processes any variable-length message through padding, appending length, initializing buffers, and 4 rounds of 16 operations on each 512-bit block to produce a unique 128-bit hash value. Though MD5 is now considered cryptographically weak due to collision vulnerabilities, it remains important for understanding hash function design.