Operation Wordlists

I recently posted about Operation Codename which generates fun codenames in a few styles. In this post we look at the scripts used to generate the wordlists.

Why Wordlists Matter

If you examine the codename script, you’ll notice that it’s deceptively simple. Given the wordlists the logic is mostly about arg handling and a few lines to combine adjectives and nouns with some random chances for single vs double-word names. The goal was to spit out plausible codenames in a variety of styles. Pulling from the system dictionary won’t work since it is a just a flat list of words. I had a choice between hand curating a list of cool sounding words, which felt lame, or building them from the system dictionary (/usr/share/dict) with some added ability to parse parts of speech. That’s where nltk comes in.

Enter NLTK

Luckily there is a Swiss Army knife for working with language data. NLTK is a powerful Python library that comes with a ton of built-in corpora, dictionaries, and tools for things like tokenizing text or tagging parts of speech. It’s ideal for building our word lists.

For Codename’s wordlists I used three key pieces:

words.words() — a flat dictionary of English words, hundreds of thousands of entries.
brown.words() — the classic Brown Corpus, which adds a sense of how words actually show up in real text.
pos_tag() — a simple but powerful part‑of‑speech tagger that can tell an adjective (silent) from a noun (lantern).

Once I could reliably separate adjectives from nouns, I could start shaping lists that made sense — trimming weird outliers, filtering for words of a certain length (no “antidisestablishmentarianism”), and skipping words that almost never appear in normal writing.

The Generator Scripts

I didn’t want Codename.py itself to do any of the heavy lifting. The main script should be fast — grab two words, mash them together, print. The heavy lifting lives in /scripts.

There are two helpers there:

generate_wordlists.py builds adjectives.txt and nouns.txt. It pulls in words from NLTK, tags them, filters them, and writes out clean, lowercase lists.
generate_animals.py builds animals.txt using WordNet’s “animal” synset and its hyponyms — essentially every creature NLTK knows about.

generate_animals_wordlist.py

Here’s the core of how generate_animals.py works:

# Step 1: Get all animal synsets from WordNet
animal_synsets = list(wn.synset('animal.n.01').closure(lambda s: s.hyponyms()))

# Step 2: Extract and clean lemma names
animal_names = set()
for syn in animal_synsets:
    for lemma in syn.lemmas():
        name = lemma.name().replace('_', ' ')
        if name.isalpha() and 3 <= len(name) <= 12:
            animal_names.add(name.lower())

# Step 3: Save to file
with open("animals.txt", "w") as f:
    f.write('\n'.join(sorted(animal_names)) + '\n')

print(f"✅ Saved {len(animal_names)} animal names to animals.txt")

generate_codename_wordlist.py

For the main English wordlists, the script uses NLTK’s part‑of‑speech tagging to separate adjectives from nouns.

Here’s the heart of that logic from generate_wordlists.py:

# Break the big wordlist into chunks for tagging
adjectives, nouns = [], []
for i in range(0, len(wordlist), 1000):
    tagged = pos_tag(wordlist[i:i+1000])
    for word, tag in tagged:
        if tag.startswith('JJ'):   # Adjectives like "silent"
            adjectives.append(word)
        elif tag.startswith('NN'): # Nouns like "lantern"
            nouns.append(word)

This chunk feeds thousands of words through pos_tag() and then deposits them into adjectives.txt and nouns.txt. The filtering step above trims out weird suffixes, apostrophes, and absurdly long words–so you end up with useful, relatively interesting lists instead of random words.

What’s Inside the Lists?

Currently, we have 5,267 nouns, 2,314 adjectives and 2,533 animals in our files–all from the system dictionary.

The nouns range from grounded terms like fortress and summit to linguistic rareities like oubliette or cymbalom; the animals list includes expected entries like wolf and sparrow as well as delightfully rare ones like aardwolf.

Room to Grow

I’m currently working on adding Latin for a –legion mode that will create plausibly cool Roman and Warhammer 40k sounding codenames. That should be possible with the Classical Language Toolkit, Whitaker’s words and possibly the Persesus Digital Library.

Conclusion

Questions, ideas, or wordlist suggestions? I’d love to hear from you: feedback@adminjitsu.com

Why Wordlists Matter#

Enter NLTK#

The Generator Scripts#

generate_animals_wordlist.py#

generate_codename_wordlist.py#

What’s Inside the Lists?#

Room to Grow#

Links#

Conclusion#