Plaintext
Also known as : Clear text · Source text · Cleartext
The plaintext (French: clair) refers to the message in its initial form, directly readable by anyone who knows the language. It’s the “starting point” of an encryption operation and the “endpoint” of a successful decryption. All of cryptography revolves around that duality: transforming plaintext into ciphertext for transport, recovering it as plaintext at the destination.
Notation and vocabulary
In the cryptographic literature, plaintext is traditionally written P (plaintext) or M (message). Ciphertext is C. The key is K. An encryption operation reads C = E(K, P), its inverse P = D(K, C). This notation hides an important nuance: plaintext is not necessarily human language. It can be a binary image, an audio file, JSON, source code — any byte sequence. Modern cryptography treats everything as bits; the plaintext/ciphertext boundary is purely definitional, not linguistic.
Normalization: why plaintext isn’t raw
On CipherChronicle, the plaintext is also called source text in the cipher workshop. Before it enters a historical cipher, it goes through several non-trivial transformations:
- Stripping diacritics: “éclair” becomes
ECLAIR. Classical ciphers work on the 26-letter Latin alphabet; an “é” has no slot in a Caesar table. - Conversion to uniform uppercase — the convention of nearly every paper cipher.
- Keeping or stripping spaces and punctuation, depending on the cipher. Traditional Vigenère strips them; modern pedagogical versions keep them for legibility. Atbash and Caesar work fine either way.
- Block segmentation of N characters for Playfair (digrams), Hill (n-grams sized to the matrix), ADFGVX (pairs).
- Padding when needed: if the plaintext isn’t a multiple of N, a filler letter (often
X) completes the last block.
This normalization has an under-appreciated pedagogical cost: a solver who decodes ATTACKATDAWN has to re-segment mentally into ATTACK AT DAWN to understand. The solution stored on CipherChronicle is always the normalized form — that’s what gets hashed.
Why plaintext leaks into ciphertext
Here’s where cryptography gets interesting. Ideally, ciphertext should look like random noise: no information about the plaintext should be extractable without the key. In practice, many older ciphers preserve properties of the plaintext:
- Caesar preserves length, spaces, and letter frequency (just shifted). A frequent letter stays frequent.
- Vigenère preserves length but flattens frequency — that’s why it held for three centuries.
- Transposition preserves every frequency identically (you permute positions, not letters).
- Polygraphic substitution (Playfair, Hill) preserves frequencies at the n-gram level rather than the individual letter level.
Each preservation is a leak. Classical cryptanalysis consists precisely of exploiting those leaks. Modern ciphers (AES, ChaCha20) preserve nothing visible: the ciphertext is statistically indistinguishable from a random sequence.
The golden rule: plaintext is never stored
On CipherChronicle, plaintext is never stored in the database on the puzzle side. When an author publishes a puzzle, we compute a SHA-256 hash of the plaintext (prefixed with the puzzle ID, which defeats universal rainbow tables) and we store only that hash. Solvers compare their attempt locally: if SHA-256(attempt + puzzleId) == storedHash, it’s accepted. Server-side, nobody — not even platform administrators — can reveal a puzzle’s solution after publication.
The same rule applies broadly. Any system that stores passwords in plaintext is broken: database leaks happen, and a plaintext password is immediately compromised (no cracking needed). Serious databases store salted hashes (Argon2, bcrypt, scrypt) — derivatives of the same idea.
Plaintext attacks: known, chosen, distinguishing
The “plaintext” concept also defines whole families of cryptanalytic attacks, ordered by how much access the adversary has:
- Ciphertext-only: the attacker only sees ciphertexts. Hardest scenario for the attacker. Classical ciphers fall to this with frequency analysis.
- Known-plaintext: the attacker knows some plaintext-ciphertext pairs. Bletchley Park’s German weather-bulletin cribs fall in this category and contributed to breaking Enigma.
- Chosen-plaintext: the attacker can encrypt arbitrary plaintexts and observe the ciphertexts. Far stronger; modern ciphers must resist this (CPA security).
- Chosen-ciphertext: the attacker can decrypt chosen ciphertexts and observe the plaintexts. Even stronger (CCA security). The gold standard for production crypto.
A modern cipher like AES-GCM is designed to be CCA-secure: even an attacker who can both encrypt and decrypt arbitrary inputs (except the secret target) learns nothing meaningful about other ciphertexts.
Key takeaways:
- Plaintext means the message in its readable form — could be text, but also an image, a file, any data.
- Cryptographic notation: P or M for plaintext, C for ciphertext, K for the key.
- Normalization (uppercase, diacritic stripping, segmentation) is often unavoidable for historical ciphers. It changes how the final result reads.
- Antonym: ciphertext (French: chiffré) — the result of applying the cipher to the plaintext.
- Golden rule: a sensitive plaintext (password, puzzle solution, business secret) is never stored as plaintext.
- Attack classes by adversary access: ciphertext-only, known-plaintext, chosen-plaintext, chosen-ciphertext. Modern ciphers target CCA security.