Not All Words Are Created Equal

When your child needs to verify that it's really Grandma on the phone — not an AI voice clone — the safeword has to come instantly. No hesitation, no "was it crimson or scarlet?", no fumbling with a word they can't quite remember. The difference between a word that works and one that fails comes down to decades of cognitive science research.

We rebuilt our word lists from the ground up using evidence from psycholinguistics, memory research, and security studies. Here's what the science says — and how it shaped every word in our generator.

The Two Groups That Matter Most

Safewords need to work for everyone, but two groups face the highest stakes: elderly adults (65+), who are the primary targets of voice-cloning scams, and children (ages 6–12), who need pickup verification codes. These groups have specific cognitive profiles that demand specific word properties.

  • Elderly adults experience tip-of-the-tongue (TOT) states more frequently — they know the word but can't retrieve it. Words learned early in life resist TOT failures.
  • Children have smaller vocabularies. A word like "cobalt" or "nebula" might not be in their mental dictionary at all.
  • Both groups have lower working memory capacity, making longer or more complex words harder to hold and repeat.
  • Phone-based verification adds noise, time pressure, and stress — all of which degrade recall for difficult words.

Criterion 1: Concreteness — Can You Picture It?

The most powerful predictor of whether someone will remember a word is concreteness — how easily it evokes a mental image. "Dog" creates an instant mental picture. "Quantum" does not. This is explained by dual-coding theory (Paivio, 1971): concrete words are stored in both verbal and visual memory systems, giving your brain two retrieval paths instead of one.

recall advantage — concrete words are remembered roughly twice as well as abstract words in free-recall experiments

Brysbaert and colleagues (2014) rated 40,000 English words for concreteness on a 1–5 scale. We prioritized words scoring 4.0 or higher. Every animal, food, and household item in our lists creates a vivid mental image. We removed abstract terms like "quantum," "spectrum," "digital," and "cosmic" — they score below 3.0 on concreteness ratings.

Criterion 2: Age of Acquisition — The Earlier, The Stickier

Words learned early in life are stored more deeply and resist age-related retrieval failures. This is called the age-of-acquisition (AoA) effect, and it's one of the most robust findings in psycholinguistics. Kuperman and colleagues (2012) collected AoA ratings for 30,000 words.

For elderly adults, this is critical. When stress hormones flood the brain during a scam call, early-learned words survive while later-learned words become inaccessible. A 78-year-old can always retrieve "dog" or "apple" — but "mandolin" or "pavilion" might slip away under pressure.

We prioritized words with an age-of-acquisition rating of 6.0 or lower — meaning they are typically learned by age 6. This ensures both children and elderly adults share a common, deeply-rooted vocabulary.

Criterion 3: The Animacy Advantage

Living things are remembered better than non-living things. Across multiple studies, animate words (animals, people) consistently outperform inanimate words in free recall, even when matched for frequency, concreteness, and imageability. Aka, Phan, and Kahana (2021) demonstrated this "animacy advantage" in large-scale memory experiments.

105
animal words in our noun list — the largest category, because animate words are the most memorable

This is why our noun list leads with animals: dog, cat, horse, bear, penguin, dolphin, eagle, turtle. These aren't just familiar — they activate a deep evolutionary recognition system. Your brain evolved to notice and remember living things, and that advantage persists even under stress.

Criterion 4: Phone Safety — Say It Aloud

A safeword that looks fine on paper can fail completely over a phone call. Miller and Nicely's classic 1955 study mapped which consonants get confused in noise: b/d, m/n, p/t, f/s. Modern research on NATO phonetic alphabet design confirms that certain sound patterns are inherently clearer over degraded audio channels.

  • Homophones: "cymbal" sounds like "symbol," "palette" like "palate," "kernel" like "colonel"
  • Variable pronunciation: words like "depot" and "pecan" that different regions pronounce differently
  • Hard to spell from hearing: foreign-origin words like "focaccia," "brioche," and "lychee"
  • Minimal pairs: "knotty" too easily confused with "naughty" on the phone

Criterion 5: No Synonym Traps

Cognitive research on recall errors shows that similar words interfere with each other. If your safeword uses "crimson" but your list also contains "scarlet," "maroon," and "ruby," the brain may retrieve the wrong synonym under pressure. This is called "recall substitution" — the right concept, wrong word.

We systematically de-duplicated synonym clusters. From four "dark red" words, we kept one. From three "brave" synonyms, we kept one. From four "running" verbs, we kept one. The rule: maximum one word per concept.

Criterion 6: Basic-Level Categories

Cognitive psychologist Eleanor Rosch demonstrated in 1976 that humans naturally think in "basic-level" categories. You see a dog — not a "mammal" (too abstract) or a "golden retriever" (too specific). Basic-level words are recognized fastest, learned earliest, and used most frequently across cultures.

We replaced category-specific words with basic-level equivalents. Instead of "condor" or "osprey," our list uses "hawk" and "eagle" — the basic-level birds. Instead of "parsnip" and "lentil," we use "carrot" and "potato." Basic-level words are the fastest path from concept to word.

Why Power-of-2 List Sizes?

Our lists contain exactly 256 adjectives, 512 nouns, and 128 verbs. These aren't arbitrary — they're powers of two, chosen for a technical reason. When our mobile app derives words from a time-based code (TOTP), it extracts bytes from a cryptographic hash. If you divide a random byte by a non-power-of-2 number, some words get slightly more likely than others (modulo bias). Power-of-2 sizes eliminate this bias completely.

11.8 million
possible standard safeword combinations (256 × 512 × 90) — roughly 23.5 bits of entropy

What We Removed — And Why

  • Science/space cluster (18 words): "pulsar," "quasar," "quantum," "isotope," "photon" — inaccessible to both children and elderly. We kept 9 concrete space words like "rocket," "comet," and "eclipse."
  • Archaic objects (19 words): "flagon," "rampart," "scepter," "spindle" — medieval objects that most people have never seen in real life.
  • Foreign-specialty foods (10 words): "focaccia," "brioche," "lychee" — hard to spell from hearing, unfamiliar to many.
  • Exotic animals (12 words): "condor," "gibbon," "osprey," "narwhal" — replaced with universally-known animals like "dog," "cat," and "horse."
  • Synonym clusters (30+ words): thinned to one word per concept across all three lists.

What We Added — The Missing Basics

The most surprising finding from our audit: the word lists were missing the most basic, universally-known words in the English language. No "red" or "blue." No "dog" or "cat." No "apple" or "banana." No "spoon" or "chair." These are the words that every human knows, that every child learns first, and that every elderly adult can retrieve effortlessly — and they weren't in the list.

  • Basic colors: red, blue, green, yellow, orange, pink, white, black — the colors every toddler knows
  • Common animals: dog, cat, horse, bear, frog, duck, owl, whale, monkey — universally recognized
  • Everyday foods: apple, banana, bread, cheese, pizza, cookie — things in every kitchen
  • Household items: spoon, fork, clock, chair, cup, bowl, door, key — objects you touch daily
  • Body actions: running, walking, eating, sleeping, laughing — things every person does

The 10-Point Validation Test

Every word in our lists passed a 10-point validation checklist. This isn't a scoring system — it's a pass/fail gate. A single failure removes the word.

  • Can a 7-year-old picture it?
  • Would a 6-year-old know this word?
  • Is it a common everyday word?
  • Say it aloud — any confusion risk over a phone call?
  • Can someone spell it after hearing it once?
  • No synonym already in the list?
  • No homophone already in the list?
  • Not culturally exclusive?
  • Positive or neutral — not scary, violent, or negative?
  • 1–3 syllables, 3–8 characters preferred?

Research We Built On

Our word selection draws on converging evidence from multiple fields. These aren't cherry-picked studies — they represent decades of replicated findings in cognitive psychology, security research, and applied linguistics.

  • Brysbaert, Warriner & Kuperman (2014) — Concreteness ratings for 40,000 English words. Published in Behavior Research Methods.
  • Kuperman, Stadthagen-Gonzalez & Brysbaert (2012) — Age-of-acquisition norms for 30,000 English words.
  • Aka, Phan & Kahana (2021) — Predicting word memorability with a focus on the animacy advantage.
  • Rosch (1976) — Basic-level categories: why "dog" is faster to recognize than "beagle" or "animal."
  • SUBTLEX-US — Word frequency norms derived from 51 million words of American film subtitles.
  • EFF Diceware (2016) — The Electronic Frontier Foundation's improved word list criteria for secure passphrases.
  • Miller & Nicely (1955) — Consonant confusion patterns in noisy conditions, foundational for phone-safe word design.
  • Shay et al. (2012, CMU SOUPS) — Passphrase memorability through scene construction.
  • NCMEC KidSmartz — National Center for Missing & Exploited Children's family code word guidelines.

Our word lists are versioned and frozen for the mobile app's time-based code system. Every safeword generated today will be verifiable years from now, even offline. The science behind the word choices ensures they'll remain memorable across generations.