Updated on 15 Jan, 202611 mins read 79 views

What Is Hashing?

Hashing is the process of converting input data of arbitrary size into a fixed-size output, known as a hash value or digest, using a hash function. Unlike encoding or encryption, hashing is designed to be one-way, meaning the original data cannot be reconstructed from the hash.

In system design, hashing is widely used for data integrity, fast lookups, secure comparisons, and scalability. A well-designed hash function produces seemingly random outputs while maintaining consistency.

A process of converting data into another irreversible form by using a hashing algorithm is called hashing.

Properties of a Good Hash Function

A reliable hash function must satisfy several critical properties:

  • Deterministic: The same input always produces the same hash
  • Fixed output size: Regardless of input size
  • Fast computation: Efficient for large-scale systems
  • Uniform distribution: Minimizes collisions
  • Collisions resistance: Difficult to find two inputs with the same hash
  • Avalanche effect: Small input changes cause larger output changes

These properties are essential for both security and performance.

Hashing is Defined by 3 Main Features

  1. Irreversible: Hashing is one way
  2. Deterministic: Same input always produces the same hash
  3. Fixed Length: Output is always of the same size

Hashing vs Checksums

While both hashing and checksums detect data corruption, they differ significantly:

AspectHashingChecksum
SecurityStrongWeak
Collision resistanceHighLow
Use casesSecurity, integrityError detection
ExamplesSHA-256CRC32

Checksums are faster but unsuitable for security-critical applications.

Common Hashing Algorithms

MD5

  • Fast but cryptographically broken
  • Vulnerable to collisions
  • Should not be used for security

SHA Family

  • SHA-1: Deprecated due to collision attacks
  • SHA-256 / SHA-512: Widely used and secure
  • SHA-3: Newer standard with different design principles

Non-Cryptographic Hash Functions

  • MurmurHash
  • CityHash
  • xxHash

Used in databases, caches, and hash tables for performance rather than security.

Hash Collisions

A collision occurs when two different inputs produce the same hash.

Implications:

  • Security vulnerabilities
  • Data overwrites
  • Incorrect lookups

Design strategies to handle collisions

  • Use larger hash sizes
  • Collisions resolution techniques

Collisions cannot be completely eliminated, only minimized.

Hash for Data Integrity

Hash is commonly used to verify data integrity:

  • File downloads
  • Message validation
  • Data synchronization

Workflow:

  1. Generate hash before transmission
  2. Transmit data and hash
  3. Recompute hash on receiver side
  4. Compare hashes

If hashes differ, data has been altered.

Hashing for Secure Password Storage

Storing plaintext passwords is a major security risk.

Best practices:

  • Store hashes passwords
  • Use salts to prevent rainbow table attacks
  • Use slow hashing algorithms (e.g., bcrypt, scrypt, Argon2)

Password hashing prioritizes security over speed.

Limitations of Hashing

  • Hashes cannot be reversed
  • Vulnerable to brute-force attacks without proper salting
  • Not suitable for confidentiality

 

Buy Me A Coffee

Leave a comment

Your email address will not be published. Required fields are marked *