How Hashing Actually Works (and Why You Can't Un-Hash a Password)
DEV Community

How Hashing Actually Works (and Why You Can't Un-Hash a Password)

A Hash Is a Fingerprint, Not a Container

Feed any input - a three-character password, a 4 GB video, the entire text of War and Peace - into a hash function like SHA-256, and you get back a fixed-size output: exactly 256 bits, always, written as 64 hex characters. "a" and a movie file both come out the same length.

That fact alone tells you reversal is impossible. There are infinitely many possible inputs and only 2ยฒโตโถ possible outputs. By the pigeonhole principle, countless different inputs must map to the same hash. So even in principle, a hash can't contain enough information to reconstruct its input - the input was squeezed through a one-way funnel and most of it was thrown away.

A hash isn't a locked box holding your data; it's a fingerprint of your data. You can confirm a fingerprint matches a person, but you can't grow the person back from the fingerprint.

Encryption is the opposite deal: it's designed to be reversed by whoever holds the key. Hashing has no key and no reverse. If someone tells you they "decrypted the hash," they didn't - they guessed an input that produces the same hash. That distinction is the whole story of password security below.

The Four Properties That Make It Useful

A cryptographic hash function is defined by four behaviors, and each one maps to something you actually rely on:

  • Deterministic - the same input always yields the same output. This is why hashes work as cache keys, git object IDs, and deduplication fingerprints. Same bytes in, same ID out, every time, on every machine.
  • Fast to compute, one-way to invert - computing hash(x) is cheap; finding an x for a given hash is infeasible. (For passwords this "fast" part is actually a problem - see below.)
  • Avalanche effect - flip a single bit of the input and roughly half the output bits flip. There's no gradual similarity: "hello" and "hellp" produce hashes with no visible relationship. This is why you can't "walk toward" the right input by getting warmer.
  • Collision-resistant - it's infeasible to find two different inputs with the same hash. Collisions must exist (pigeonhole again), but a good function makes finding one astronomically hard.

That third property is easy to see for yourself. Hash "hello", then hash "hellp", and compare - the outputs share nothing. A quick way to eyeball the avalanche effect is to paste both into a hash generator and watch how one-letter changes scramble the entire digest. It's the most intuitive demonstration of why hashes leak nothing about their input.

Why "Fast" Is the Wrong Property for Passwords

Here's where the interesting mistake lives. SHA-256 was designed to be fast - modern hardware computes billions of SHA-256 hashes per second. That's great for verifying a 4 GB download. It's a disaster for passwords.

If a database of SHA-256(password) values leaks, the attacker doesn't try to "reverse" anything. They do this instead:

for each guess in [huge wordlist of common passwords]:
    if sha256(guess) == stolen_hash:
        cracked!

Because SHA-256 is so fast, a commodity GPU rips through billions of guesses per second. Every password in every common wordlist falls in seconds. The one-way property held perfectly - the attacker never reversed anything - and it didn't matter, because guessing was cheap.

Two defenses close this gap, and both are worth understanding:

  • Salt. Before hashing, prepend a unique random value to each password: hash(salt + password). Now two users with the same password get different hashes, and an attacker can't precompute one giant lookup table ("rainbow table") that cracks everyone at once. They have to attack each password separately. The salt isn't secret - it's stored alongside the hash - its only job is to make each hash unique.
  • Slowness on purpose. Use a hash built to be slow and memory-hard: bcrypt, scrypt, or Argon2. These deliberately take, say, 100ms and a chunk of RAM per hash. A tenth of a second is invisible to your one logging-in user, but it drags a billion-guesses-per-second attack down to a few thousand per second - a millions-fold tax on the attacker.

This is why "just SHA-256 the password" is wrong even with a salt: SHA-256 is the wrong tool because it's too fast.

So the rule that trips people up resolves cleanly: don't design your own password scheme, and don't use a general-purpose hash for it. Use a real password hash (Argon2/bcrypt) with a per-user salt. And on the front end, help users pick inputs worth protecting in the first place - a long random string from a password generator beats a memorable word, because length is what actually defeats guessing, salted or not.

Where Hashing Shines (and It's Everywhere)

Password storage gets the headlines, but most hashing in your stack has nothing to do with secrecy:

  • Integrity checks. A download page lists a SHA-256 next to the file. You hash your copy and compare. Match means not one bit was corrupted or tampered with in transit - because of the avalanche effect, any change at all produces a totally different digest.
  • Git. Every commit, tree, and blob is named by the hash of its contents. That's how git detects corruption and how it knows two files are identical without comparing them byte by byte.
  • Deduplication & caching. Storage systems and CDNs hash content to generate a key; identical content hashes to the same key, so it's stored or fetched once. Deterministic-plus-collision-resistant is exactly what you want here.
  • Digital signatures & blockchains. You sign the hash of a document, not the whole thing - smaller and fixed-size - and collision resistance is what makes that safe.

None of these need reversibility. They need a stable, tamper-evident fingerprint, which is precisely what a hash is.

The One-Sentence Version

A hash function is a one-way funnel that turns any input into a fixed-size fingerprint: identical inputs always match, any change scrambles everything, and you can verify an input but never reconstruct it. Encryption is a locked box you can reopen with a key; hashing is a fingerprint you can only compare. Use general-purpose hashes (SHA-256) for integrity and identity, and purpose-built slow hashes (Argon2/bcrypt) with a salt for passwords - and the "you can't un-hash it" fact stops being a gotcha and starts being the feature you were relying on all along.

Comments

No comments yet. Start the discussion.