Where the Problem Started

Password storage is not simply encrypting a string. Even if two users choose the same password, the stored values should differ, and a database leak should not make the original password recoverable. This post organizes hashing and salting as a security boundary rather than a vocabulary exercise.

When building a user information system, the most important element is protecting user passwords.

The technique used here is called salting, and its etymology is supposedly https://stackoverflow.com/questions/244903/why-is-a-password-salt-called-a-salt.

There is no especially satisfying answer, but the origins of common placeholder terms like foo and bar are often more arbitrary than they look.

Start with hashing. In short, hashing is a one-way process that transforms a string into another string. One-way means the hashed value cannot be restored to the original input.

By contrast, in two-way encryption, there is a method shared by both encryption and decryption, such as a key, that can infer the original string and the changed string.

Various algorithms are used for hashing. There are Phpass, libsodium, sha256, sha512, and so on, but look up which algorithms are safe and officially recognized.

MD5, SHA1, SHA2, and similar algorithms are based on the Merkle-Damgard construction and can be exposed to length extension attacks, so they should not be used for password storage.

The password authentication flow is easier to reason about step by step.

  1. The user creates an account.
  2. The password is stored in the DB after hashing.
  3. When the user tries to log in, the password entered by the user is encrypted using the hashing algorithm and compared with the password in the DB.
  4. If the hashes match, authentication passes.

In most login systems, when step 4 fails, the service does not reveal whether the ID or password was wrong. That is not just a frustrating UX choice; it is a small security measure that makes username-aware brute force attacks harder.

Is security that has gone through hashing safe? Of course not.

If the hashing algorithm used by a service is identified, then after the user database is breached, inferring the password is not necessarily difficult.

Even if the hashing algorithm is not known at first, an attacker can still brute force likely algorithms and candidate passwords. With enough preparation, they may also use lookup tables or rainbow tables.

https://en.wikipedia.org/wiki/Rainbow_table#:~:text=A%20rainbow%20table%20is%20a,form%2C%20but%20as%20hash%20values.

How I Verified It

Understanding and Caveats Around Salting and Hashing screenshot 01

In short, it is a key-value style table that maps every string hashed by a hashing algorithm back to its original string, so naturally the table size is huge.

This is because complex hashing algorithms do not replace each character with some other string. They perform a complex form of encryption intertwined across the characters, so the encryption of “abcdef” and “abcdee” is completely different.

That is where salt appears.

The operating principle of lookup tables and rainbow tables is that one string is replaced with exactly one other string.

For example, suppose there is a password called “password.”

When one service encrypts “password,” no matter how complex the hashing algorithm is, every encrypted “password” corresponds to “password.” That is why a special string is appended to “password” before hashing, and that string is called salt.

Understanding and Caveats Around Salting and Hashing screenshot 02

There are several caveats when using salt.

First is reusing salt. The moment you harm salt uniqueness, there is little difference from the security state before adding salt to the string. After the attacker figures out what the salt is, they can simply append the salt to the password and use a reverse lookup table.

Salt must be different for each user. You could use a language-provided random function to generate each character, but use a safer CSPRNG instead.

https://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator I do not really know the principle, but it is a cryptographically safer random function. CSPRNG libraries also exist in most languages. For example, Java’s java.security.SecureRandom and Python’s secrets.

Next is a small salt string size. For example, if a salt uses only 4 ASCII characters, 95^4 = 81450625 salts can be generated. It may look like a lot, but given the speed and storage capacity of modern computers, it is nowhere near enough.

Experts say the salt should be at least as large as the hashed string after salt is added. For example, a string hashed with SHA256 has a size of 256bits == 32bytes == 64 characters. If you use SHA512, the minimum salt size is 128 char.

Then hash the salted string.

There are fast and safe algorithms, or algorithms once considered safe, such as SHA256, SHA512, and WHIRLPOOL. There are also algorithms such as PBKDF2, bcrypt, and scrypt, which are a bit slower but aligned with modern security.

It may be overengineering, so choose appropriately based on the scale and capability of the service.

https://cs.opensource.google/go/x/crypto/+/refs/tags/v0.22.0:scrypt/scrypt.go This is the Go library source code for scrypt, and seeing how extremely complex it is, it certainly looks safe.

The following is a short Python function from creating salt to password hashing. As expected, short functions? Python.

Implementation Path

class EncryptPassword:

    @staticmethod

    def salt_password(password):

        salt = os.urandom(64) # salt 생성

        salt_hex = salt.hex() # 저장을 위한 salt > hex 로 변환

		# PBKDF2 (SHA-512 의 100,000 반복) 으로 해싱
        dk = hashlib.pbkdf2_hmac('sha512', password.encode(), salt, 100000, dklen=64)

        hashed_password_hex = dk.hex() # 저장을 위한 비밀번호 > hex 로 변환

        return hashed_password_hex, salt_hex

Security Takeaway

Knowing the words hash and salt is different from designing safe password storage. Algorithm choice, salt generation, work factor, verification flow, and migration strategy all matter. Security features should be judged not just by whether they work, but by what information an attacker can still obtain.