Plaintext

Also known as : Clear text · Source text · Cleartext

The plaintext (French: clair) refers to the message in its initial form, directly readable by anyone who knows the language. It’s the “starting point” of an encryption operation and the “endpoint” of a successful decryption. All of cryptography revolves around that duality: transforming plaintext into ciphertext for transport, recovering it as plaintext at the destination.

Notation and vocabulary

In the cryptographic literature, plaintext is traditionally written P (plaintext) or M (message). Ciphertext is C. The key is K. An encryption operation reads C = E(K, P), its inverse P = D(K, C). This notation hides an important nuance: plaintext is not necessarily human language. It can be a binary image, an audio file, JSON, source code — any byte sequence. Modern cryptography treats everything as bits; the plaintext/ciphertext boundary is purely definitional, not linguistic.

Normalization: why plaintext isn’t raw

On CipherChronicle, the plaintext is also called source text in the cipher workshop. Before it enters a historical cipher, it goes through several non-trivial transformations:

Stripping diacritics: “éclair” becomes ECLAIR. Classical ciphers work on the 26-letter Latin alphabet; an “é” has no slot in a Caesar table.
Conversion to uniform uppercase — the convention of nearly every paper cipher.
Keeping or stripping spaces and punctuation, depending on the cipher. Traditional Vigenère strips them; modern pedagogical versions keep them for legibility. Atbash and Caesar work fine either way.
Block segmentation of N characters for Playfair (digrams), Hill (n-grams sized to the matrix), ADFGVX (pairs).
Padding when needed: if the plaintext isn’t a multiple of N, a filler letter (often X) completes the last block.

This normalization has an under-appreciated pedagogical cost: a solver who decodes ATTACKATDAWN has to re-segment mentally into ATTACK AT DAWN to understand. The solution stored on CipherChronicle is always the normalized form — that’s what gets hashed.

Why plaintext leaks into ciphertext

Here’s where cryptography gets interesting. Ideally, ciphertext should look like random noise: no information about the plaintext should be extractable without the key. In practice, many older ciphers preserve properties of the plaintext:

Caesar preserves length, spaces, and letter frequency (just shifted). A frequent letter stays frequent.
Vigenère preserves length but flattens frequency — that’s why it held for three centuries.
Transposition preserves every frequency identically (you permute positions, not letters).
Polygraphic substitution (Playfair, Hill) preserves frequencies at the n-gram level rather than the individual letter level.

Each preservation is a leak. Classical cryptanalysis consists precisely of exploiting those leaks. Modern ciphers (AES, ChaCha20) preserve nothing visible: the ciphertext is statistically indistinguishable from a random sequence.

The golden rule: plaintext is never stored

On CipherChronicle, plaintext is never stored in the database on the puzzle side. When an author publishes a puzzle, we compute a SHA-256 hash of the plaintext (prefixed with the puzzle ID, which defeats universal rainbow tables) and we store only that hash. Solvers compare their attempt locally: if SHA-256(attempt + puzzleId) == storedHash, it’s accepted. Server-side, nobody — not even platform administrators — can reveal a puzzle’s solution after publication.

The same rule applies broadly. Any system that stores passwords in plaintext is broken: database leaks happen, and a plaintext password is immediately compromised (no cracking needed). Serious databases store salted hashes (Argon2, bcrypt, scrypt) — derivatives of the same idea.

Plaintext attacks: known, chosen, distinguishing

The “plaintext” concept also defines whole families of cryptanalytic attacks, ordered by how much access the adversary has:

Ciphertext-only: the attacker only sees ciphertexts. Hardest scenario for the attacker. Classical ciphers fall to this with frequency analysis.
Known-plaintext: the attacker knows some plaintext-ciphertext pairs. Bletchley Park’s German weather-bulletin cribs fall in this category and contributed to breaking Enigma.
Chosen-plaintext: the attacker can encrypt arbitrary plaintexts and observe the ciphertexts. Far stronger; modern ciphers must resist this (CPA security).
Chosen-ciphertext: the attacker can decrypt chosen ciphertexts and observe the plaintexts. Even stronger (CCA security). The gold standard for production crypto.

A modern cipher like AES-GCM is designed to be CCA-secure: even an attacker who can both encrypt and decrypt arbitrary inputs (except the secret target) learns nothing meaningful about other ciphertexts.

Key takeaways:

Plaintext means the message in its readable form — could be text, but also an image, a file, any data.
Cryptographic notation: P or M for plaintext, C for ciphertext, K for the key.
Normalization (uppercase, diacritic stripping, segmentation) is often unavoidable for historical ciphers. It changes how the final result reads.
Antonym: ciphertext (French: chiffré) — the result of applying the cipher to the plaintext.
Golden rule: a sensitive plaintext (password, puzzle solution, business secret) is never stored as plaintext.
Attack classes by adversary access: ciphertext-only, known-plaintext, chosen-plaintext, chosen-ciphertext. Modern ciphers target CCA security.

← Whole glossary