Cryptography Fundamentals
Cryptography is the science of securing information through techniques like encryption, hashing, and key exchange, ensuring properties like confidentiality, integrity, and authenticity. Below, I explain each concept in detail, building from basics to advanced applications, with examples for clarity.
Hashing Algorithm
A hashing algorithm (or hash function) is a one-way mathematical transformation that converts input data (of any size) into a fixed-length output string (hash/digest). It is deterministic (same input = same output) but irreversible (can't reconstruct input from hash).
Key Properties
- Deterministic: Identical inputs produce identical hashes.
- Fixed Size: Output always the same length (e.g., SHA-256 = 256 bits/32 bytes).
- Avalanche Effect: Small input change → drastically different hash.
- Collision Resistance: Computationally infeasible to find two inputs with the same hash.
- Pre-Image Resistance: Hard to find input for a given hash.
- Second Pre-Image Resistance: Hard to find different input with same hash as given input.
Common Algorithms
| Algorithm | Output Size | Security Level | Use Case |
|---|---|---|---|
| MD5 | 128 bits | Broken (collisions easy) | Legacy checksums |
| SHA-1 | 160 bits | Weak (collisions found) | Phasing out |
| SHA-256 | 256 bits | High (SHA-2 family) | Password hashing, digital signatures |
| SHA-3 | 256–512 bits | Highest (Keccak sponge) | Post-quantum candidate |
Example (Python):
import hashlib
data = "Hello, World!"
hash_obj = hashlib.sha256(data.encode())
print(hash_obj.hexdigest()) # a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e
- Change to "Hello, World?!" → Completely different hash.
Applications: Password storage (salted hashes), file integrity (checksums), blockchain (Merkle trees).
Message Integrity
Message Integrity ensures data has not been altered during transmission or storage. Cryptography achieves this via hashing or Message Authentication Codes (MACs), verifying the receiver gets the exact sender's content.
Key Concepts
- Tamper Detection: Any change (even 1 bit) invalidates the hash/MAC.
- Hash vs MAC: Hash is one-way (no key); MAC uses shared secret for authenticity + integrity.
- Threats Mitigated: Eavesdropping (passive alteration), MITM (active tampering).
Mechanisms
- Hash-Based (Integrity Only): Sender appends hash of message; receiver recomputes and compares.
- Limitation: No authenticity (anyone can generate hash).
- MAC (Integrity + Authenticity): HMAC (Hash-based MAC) = hash(message + key).
- HMAC-SHA256: Secure, fast.
- Digital Signatures: Asymmetric (private key signs hash which means hash of message + key is encrypted using private key) for non-repudiation. Signing is typically performed on the hash of a message rather than the message itself, primarily because the message may be large, and cryptographic operations like modular exponentiation are computationally expensive when applied directly to large data
Process
- Sender: Compute hash/MAC; append to message.
- Receiver: Recompute; match? Valid.
Example (Python HMAC):
import hmac
import hashlib
key = b'secret-key'
message = b"Hello, World!"
mac = hmac.new(key, message, hashlib.sha256).digest()
# Verify: hmac.new(key, message, hashlib.sha256).digest() == mac
Applications: File verification (SHA checksums), API requests (HMAC in AWS signatures), blockchain blocks.
Confidentiality
Confidentiality (or privacy) ensures data is accessible only to authorized parties, preventing unauthorized reading (e.g., via encryption).
Key Concepts
- Plaintext vs Ciphertext: Original data (plaintext) → encrypted (ciphertext).
- Encryption Strength: Key length (e.g., AES-256 = 256-bit key, brute-force infeasible).
- Threats Mitigated: Eavesdropping, data breaches.
Mechanisms
- Encryption Algorithms: Transform data with key.
- Modes: ECB (simple, insecure), CBC (chaining), GCM (authenticated).
- Padding: PKCS#7 to block size.
Applications: HTTPS (TLS encrypts traffic), disk encryption (BitLocker), VPNs.
Symmetric Encryption
Symmetric Encryption uses the same key for encryption and decryption (shared secret).
Key Concepts
- Stream Ciphers: Encrypt bit-by-bit (e.g., RC4, ChaCha20).
- Block Ciphers: Encrypt fixed blocks (e.g., AES-128/192/256).
- Key Exchange Problem: Securely share key (solved by asymmetric).
AES Example (Block Cipher)
- Process: Input block (128 bits) + key → output via rounds (10 for AES-128).
- Modes: CBC (IV + chaining for security).
Python Example:
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.backends import default_backend
key = b'16-byte-key!!'
iv = b'16-byte-iv!!'
cipher = Cipher(algorithms.AES(key), modes.CBC(iv), backend=default_backend())
encryptor = cipher.encryptor()
ciphertext = encryptor.update(b'plaintext') + encryptor.finalize()
Pros: Fast, simple. Cons: Key distribution risk.
Asymmetric Encryption
Asymmetric Encryption (public-key cryptography) uses a key pair: public (shareable) for encryption, private (secret) for decryption.
Key Concepts
- Trapdoor Functions: Easy one way, hard reverse (e.g., factoring large primes).
- Public Key Infrastructure (PKI): Certs bind keys to identities.
Algorithms
| Algorithm | Basis | Use |
|---|---|---|
| RSA | Prime factorization | Encryption/signing. |
| ECC (Elliptic Curve) | Elliptic math | Efficient mobile. |
Pros: Secure key exchange. Cons: Slower than symmetric.
Using Asymmetric Keys
- Encryption: Sender uses receiver's public key to encrypt; receiver decrypts with private.
- Signing: Sender signs hash with private; receiver verifies with public.
- Key Exchange: Diffie-Hellman for shared secret.
Nuance: Asymmetric for small data (keys, hashes); symmetric for bulk.
Authentication
Authentication verifies identity (who you are), using credentials or tokens.
Types
- Something You Know: Passwords (hashed, salted).
- Something You Have: Tokens (JWT, API keys).
- Something You Are: Biometrics.
Mechanisms
- Symmetric: Shared secret (e.g., HMAC).
- Asymmetric: Sign with private, verify with public.
- Zero-Knowledge: Prove knowledge without revealing (e.g., SRP).
Applications: Login, API auth.
Anti-Replay
Anti-Replay prevents attackers from reusing intercepted messages (replay attacks).
Mechanisms
- Timestamps: Include
expclaim (e.g., JWT); server rejects old. - Nonces: Unique random value per message; server tracks used nonces (short-lived).
- Sequence Numbers: Monotonic counters; reject out-of-order.
- Challenges: Server sends random challenge; client signs/response includes it.
Example (JWT): Include iat (issued-at) and nonce; validate server-side.
Nuance: Combine with TLS for freshness.
RSA Example
RSA (Rivest-Shamir-Adleman, 1977) is a public-key algorithm for encryption/signing based on prime factorization difficulty.
Key Generation
- Choose primes p, q (e.g., 1024-bit each).
- n = p × q (modulus, 2048 bits).
- φ(n) = (p-1)(q-1) (Euler's totient).
- e = public exponent (e.g., 65537, coprime to φ(n)).
- d = private exponent (d × e ≡ 1 mod φ(n)).
- Public Key: (n, e); Private Key: (n, d).
Encryption/Decryption
- Ciphertext c = m^e mod n (encrypt with public).
- Plaintext m = c^d mod n (decrypt with private).
Signing/Verification
- Hash message → sign hash^d mod n (with private).
- Verify: (signed hash)^e mod n == original hash (with public).
Python Example (RSA Signing):
from cryptography.hazmat.primitives.asymmetric import rsa, padding
from cryptography.hazmat.primitives import hashes, serialization
# Generate keys
private_key = rsa.generate_private_key(public_exponent=65537, key_size=2048)
public_key = private_key.public_key()
message = b"Hello, RSA!"
# Sign
signature = private_key.sign(
message,
padding.PSS(mgf=padding.MGF1(hashes.SHA256()), salt_length=padding.PSS.MAX_LENGTH),
hashes.SHA256()
)
# Verify
try:
public_key.verify(
signature,
message,
padding.PSS(mgf=padding.MGF1(hashes.SHA256()), salt_length=padding.PSS.MAX_LENGTH),
hashes.SHA256()
)
print("Signature valid")
except:
print("Invalid signature")
Nuance: RSA for small data (use hybrid with AES for large).
Diffie-Hellman Key Exchange
Diffie-Hellman (DH) is a method for two parties to agree on a shared secret key over insecure channels, without exchanging the key directly (1976, Diffie/Hellman).
How It Works (Modular Exponentiation)
- Public Parameters: Shared prime p (large, e.g., 2048-bit) and generator g (primitive root mod p).
- Private Keys: Alice chooses a (secret); Bob chooses b (secret).
- Public Values: Alice computes A = g^a mod p, sends A. Bob computes B = g^b mod p, sends B.
- Shared Secret: Alice computes K = B^a mod p = (g^b)^a mod p = g^(ab) mod p.
Bob computes K = A^b mod p = g^(ab) mod p. - Result: Both have identical K (shared secret), used for symmetric encryption (e.g., AES key).
Security
- Discrete Log Problem: Hard to compute a from g, p, A.
- Variants: ECDH (Elliptic Curve DH, faster); ephemeral (per-session keys).
Nuance: Vulnerable to MITM (use authenticated DH, e.g., with signatures); ephemeral for forward secrecy.