Skip to main content
Security & Privacy

Hashing vs Encryption vs Encoding: The Three Things People Keep Confusing

Encoding, encryption, and hashing get mixed up constantly, and the mistakes break real auth code. Here is the difference and when to use which.

By · · 13 min read

I have reviewed a lot of authentication code, and the same category error shows up again and again. Someone Base64-encodes a password and calls it “encrypted.” Someone hashes a credit card number when they actually needed to decrypt it later. Someone reaches for SHA-256 to store passwords because it sounds secure. These are not small mistakes. They are the difference between a system that protects users and one that leaks everything the first time a database gets dumped. The fix starts with knowing that encoding, encryption, and hashing are three separate things that solve three separate problems.

The one-sentence version of each

Before the detail, here is the whole thing compressed:

  • Encoding changes the format of data so a different system can read it. Reversible, no key, zero security value.
  • Encryption scrambles data so only someone with the key can read it. Reversible with the key, gives you confidentiality.
  • Hashing turns data into a fixed-size fingerprint. One-way, not reversible, used for integrity and verification.

The trap is that all three produce output that looks like random gibberish, so they get treated as interchangeable. They are not. The question that tells them apart is simple: do you ever need the original data back, and if so, who is allowed to get it?

Encoding: format, not protection

Encoding exists because data has to travel through systems that only accept certain byte ranges. Email bodies, URLs, and JSON payloads all have characters they cannot carry safely, so you re-express the raw bytes in a friendlier alphabet. Base64 maps arbitrary bytes onto 64 printable characters. Hex maps each byte to two characters in 0-9a-f. URL-encoding turns a space into %20 so it survives inside a query string.

The thing to internalize: encoding has no key and no secret. Anyone holding the encoded string can recover the original instantly. There is nothing to crack because nothing was locked.

echo "aGVsbG8=" | base64 -d
hello

That is the entire “attack.” One command, no key, done. You can prove it to yourself with the Base64 Encode/Decode tool by pasting any encoded string and watching it come straight back.

Encoding is genuinely useful. It is how binary files get embedded in JSON, how data URIs work, how basic-auth headers are formatted. The problem is purely the misunderstanding. I have seen Authorization: Basic <base64> headers described as “encrypted credentials” in design docs. They are not encrypted at all. Basic auth Base64 is reversible by anyone who sees the request, which is exactly why it must only ever ride over TLS.

If you take one rule from this section: encoding is never a security control. If your threat model includes someone who should not read the data, encoding does nothing for you.

Encryption: reversible, but only with the key

Encryption is where secrets actually enter the picture. You take plaintext, combine it with a key, and produce ciphertext that is useless without that key. The whole point is that the transformation is reversible, but only by the party holding the right key. That property is called confidentiality.

You use encryption when you need the original data back later. Storing a user’s OAuth refresh token, encrypting a file before uploading it to storage you do not fully trust, protecting a session payload in a cookie: all of these need the plaintext eventually, so they need encryption, not hashing.

Modern symmetric encryption means AES, almost always AES-256 in an authenticated mode like GCM. “Authenticated” matters: AES-256-GCM gives you both confidentiality and a built-in integrity tag, so if someone flips a byte of the ciphertext, decryption fails loudly instead of handing you garbage. Plain AES-CBC without a separate authentication step does not give you that, and the gap has caused real vulnerabilities. You can experiment with the round trip in the AES Encrypt/Decrypt tool: encrypt a string, then watch that the exact same passphrase is required to get it back, and a wrong one gives you nothing.

Symmetric vs asymmetric in one section

There are two families, and the split is about how many keys are involved.

Symmetric encryption uses one shared key for both locking and unlocking. AES is the standard here. It is fast and handles large data well. The catch is key distribution: both sides need the same secret, and getting that secret to the other party safely is its own problem.

Asymmetric encryption uses a key pair: a public key that anyone can hold and a private key that stays secret. Anything encrypted with the public key can only be decrypted with the private key. RSA-2048 (or larger) and elliptic-curve schemes live here. This solves the distribution problem because you can hand out the public key freely. The cost is speed, so in practice asymmetric crypto is rarely used to encrypt bulk data. TLS, for example, uses asymmetric crypto only to agree on a shared symmetric key, then switches to AES for the actual traffic. You get the distribution benefit of one and the speed of the other.

A quick way to remember which is which: symmetric is one key both ways, asymmetric is a public/private pair. If you ever need to give someone the ability to encrypt without giving them the ability to decrypt, you need asymmetric.

Hashing: one-way fingerprints

A cryptographic hash takes any input and produces a fixed-size output. SHA-256 always returns 256 bits, which is 64 hex characters, whether you feed it one letter or a 4 GB video file. The output is deterministic (same input, same hash every time) and one-way (you cannot run it backwards to recover the input).

input:  "hello"
sha256: 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

Change a single character and the output looks completely unrelated:

input:  "Hello"
sha256: 185f8db32271fe25f561a6fc938b2e264306ec304eda518007d1764826381969

That avalanche effect is what makes hashing useful for integrity. Download a file, hash it, compare against the published hash, and you know whether a single byte changed in transit. Hashing is also how you verify something without storing it: you keep the fingerprint, and later you can check whether a new input matches without ever holding the original. You can generate and compare SHA-256 output with the Hash Generator .

Two important notes on which algorithm. MD5 and SHA-1 are broken for any security purpose. Attackers can construct two different inputs that produce the same MD5 hash, and SHA-1 collisions have been demonstrated in practice. If you see either being used to verify integrity in a security context, that is a finding. Use SHA-256 or SHA-3. MD5 is acceptable only for non-adversarial checksums, like detecting accidental file corruption, where nobody is actively trying to fool you.

One thing hashing is not: encryption. You cannot “decrypt” a hash because there is no key and the original information is genuinely gone. The output is smaller and fixed-size; the input was arbitrary-size. The mapping cannot be inverted. What an attacker can do is guess inputs, hash each guess, and look for a match, which brings us to the part where most auth code goes wrong.

Password storage: where SHA-256 becomes a liability

Here is the most expensive confusion in the whole topic. People learn that hashing is one-way, conclude that hashing a password is safe, reach for SHA-256, and ship it. That is a vulnerability, and I have flagged it in code review more times than I can count.

The problem is that SHA-256 is fast. That is a feature for integrity checks and a disaster for passwords. When a database leaks, the attacker has every password hash. They do not need to reverse anything. They run guesses through the same hash function and compare. A modern GPU rig can compute billions of SHA-256 hashes per second. Most human passwords fall in hours.

It gets worse with plain hashing. If two users pick the same password, they get the same hash. An attacker hashes a dictionary of common passwords once, then matches that precomputed table against every leaked hash at the same time. These precomputed tables are why you cannot just hash and walk away.

Two things fix this: salt and slowness.

Salt

A salt is a random value, unique per user, mixed into the input before hashing. Now two users with the same password get different stored hashes, which kills precomputed tables outright. The attacker has to attack each password individually instead of all at once. The salt is not secret; it gets stored alongside the hash. Its only job is to make every hash unique.

A deliberately slow KDF

Salt alone is not enough, because the attacker can still grind a single salted password fast if the function is fast. So you stop using a fast hash and switch to a key derivation function that is built to be slow and expensive on purpose. The current good choices:

  • Argon2id - the modern default. Tunable on memory, time, and parallelism, and the memory cost specifically defeats GPU and ASIC attacks. If you are choosing today, choose this.
  • scrypt - also memory-hard, a solid choice and widely available.
  • bcrypt - older but still respectable. It has a cost factor (often 10 to 12) that you raise as hardware gets faster. Note its quiet limitation: it only uses the first 72 bytes of input.
  • PBKDF2 - the most conservative option. It is not memory-hard, so it resists GPUs less well, but it is FIPS-approved and everywhere, which is why regulated environments still use it with a high iteration count. You can see how iteration count and salt feed into derived output with the PBKDF2 Hash Generator .

The mental model that keeps you out of trouble: a fast hash like SHA-256 is for integrity, where you want it to be quick. A slow KDF is for passwords, where slowness is the entire point. Tune the cost so a legitimate login takes something like 100 to 250 ms on your hardware. That is unnoticeable to a user logging in once and brutal to an attacker trying billions of guesses.

HMAC: hashing plus a key, for authenticity

There is a fourth thing people run into, and it sits between hashing and encryption. HMAC is a keyed hash. You combine a secret key with the message and a hash function (HMAC-SHA-256, say), and the output proves two things at once: the message was not modified, and it came from someone who holds the key.

Why not just hash the message? Because a plain hash proves nothing about who produced it. Anyone can recompute a SHA-256 hash, so anyone can tamper with the message and recompute a matching hash. With HMAC, an attacker who does not have the key cannot produce a valid tag, so they cannot forge or alter the message undetected.

This is what signs webhook payloads, what protects API request signatures, and what backs the signature part of tokens. When a payment provider sends you a webhook with an X-Signature header, you recompute the HMAC with your shared secret and compare. If it matches, the message is authentic. You can try the construction with the HMAC Generator by changing the key and watching the tag change completely.

HMAC gives you integrity and authenticity. It does not give you confidentiality. The message itself is still readable; HMAC only proves it was not tampered with. If you need the contents hidden too, you encrypt as well.

The decision table

This is the part to bookmark. Start from what you actually want, then pick the tool.

I want to…UseNot
Send binary data through text-only channelsEncoding (Base64, hex)Anything called “encryption”
Hide data but read it back later myselfSymmetric encryption (AES-256-GCM)Hashing, encoding
Let others encrypt to me without sharing a secretAsymmetric encryption (RSA, ECC)Symmetric
Store user passwordsSlow KDF (Argon2id, bcrypt, scrypt)SHA-256, MD5, encoding
Check a downloaded file was not corrupted or alteredFast hash (SHA-256)A slow KDF
Verify a message came from a trusted sender unchangedHMAC (HMAC-SHA-256)A plain hash
Detect accidental, non-malicious file corruptionChecksum (CRC32, even MD5)A slow KDF

If you can answer “do I need the original back, and who gets it,” the table picks itself.

Real failures I have actually seen

These are not hypotheticals. Each one came out of real code.

  1. Base64 passwords called “encrypted.” A service stored user passwords Base64-encoded and labeled the column encrypted_password. One database export and every password was plaintext, because Base64 is not encryption. This is the classic encoding-mistaken-for-security error, and it is depressingly common.

  2. MD5 for password storage. A legacy app hashed passwords with unsalted MD5. After a breach, attackers ran the leaked hashes against precomputed tables and recovered the bulk of them within a day. MD5 is fast and broken; both problems compound for passwords.

  3. Encrypting when they meant to hash. A team encrypted passwords with AES so they could “verify” logins by decrypting and comparing. That means the system can recover every plaintext password, which is exactly what you do not want. You never need a password back. You only need to check whether a login attempt matches. That is a hashing job, specifically a slow-KDF job, not an encryption job. The presence of a decrypt path is the bug.

  4. Hashing data they needed to recover. The mirror image of the last one. A system SHA-256’d account numbers it later needed to display, then discovered hashes are one-way and the data was unrecoverable. That data needed encryption, because they needed it back.

  5. Plain hash where HMAC belonged. A webhook endpoint verified payloads by hashing the body and comparing to a header, with no secret involved. Since anyone can compute that hash, anyone could forge a valid-looking request. The fix was HMAC with a shared secret so only the real sender could produce a matching tag.

The pattern across all five is the same: someone picked a tool by how the output looked instead of by what problem they were solving.

Summary

Three tools, three jobs. Encoding reformats data and protects nothing; if your concern is a reader who should not see the data, encoding does zero work. Encryption hides data so only a key holder can read it, and you reach for it whenever you need the original back later (AES-256-GCM for shared-key cases, RSA or ECC when you cannot share a secret). Hashing produces a one-way fixed-size fingerprint for integrity and verification, with SHA-256 as the workhorse and MD5 and SHA-1 retired from anything security-related.

Passwords are the special case that trips people up most. Never plain-hash them, never encrypt them, never encode them. Salt each one and run it through a slow KDF like Argon2id, bcrypt, or scrypt, tuned so a single login takes a fraction of a second and a billion-guess attack takes forever. And when you need to prove a message is both unaltered and from a trusted source, that is HMAC: a hash with a key. Pick by the problem, not by how the gibberish looks.

Tools mentioned in this article

Related articles