Introduction
If you run a Hugo site, you’ve probably ended up with a messy tag list at some point.
Duplicate tags, one-off singletons, forgotten categories… it all builds up over time and clutters your taxonomies. Hugo won’t clean them for you.
I wanted a way to see exactly which tags and categories I’ve used, how often, and where — so I built a quick little scanner. It’s great for spring cleaning your frontmatter—but fast, and without the sneezing.
Elevator pitch:
Tag Audit is a no-frills Python CLI that:
- Counts tag and category usage across your content tree
- Flags singletons (items used only once)
- Shows which posts use which tag or category
- Outputs in text, markdown, CSV, or JSON
Quickstart
# Install dep
python3 -m pip install pyyaml
# Basic run (from your Hugo repo root)
python3 tag-audit.py
# One file only
python3 tag-audit.py --file content/posts/metaclean.md
# Show which files use each tag (markdown output)
python3 tag-audit.py --by-tag --format markdown
# Top 20 tags/categories by count, with mappings
python3 tag-audit.py --top 20 --by-tag --by-cat
# Ignore drafts, only include items used ≥ 2 times
python3 tag-audit.py --ignore-drafts --min-count 2
Features
- Scan entire Hugo
content/
tree or a single file - Totals for tags and categories
- Singleton detection (used only once)
- Inverse mappings: files grouped by tag or category
- Multiple output formats: text, markdown, csv, json
- Filters:
--min-count
,--top
,--ignore-drafts
,--ext
Usage
tag-audit.py [OPTIONS]
Options (common):
--dir PATH
— Path to Hugo content (default:./content
)--file FILE
— Scan a single file--ignore-drafts
— Skip drafts--per-file
— Show per-file usage--by-tag
— Show files grouped by tag--by-cat
— Show files grouped by category--format
—text
(default),markdown
,csv
,json
--sort
— Sort bycount
(default) oralpha
--min-count N
— Only include items with count ≥ N--top N
— Limit to top N items
Sample Output
Tags
====
count name
----- ----------------
21 cli
14 python
13 utilities
10 bash
5 unix
4 fun
4 sysadmin
3 cromulent
3 gibberish
3 homelab
3 productivity
3 scripting
2 hugo
2 linux
2 star-trek
1 adminjitsu
1 automation
1 commodore-64
1 zsh
[...]
Total tags: 171
Categories
==========
count name
----- ---------------
22 projects
13 tools
3 fun
3 trivia
2 guides
2 workflow
1 retrocomputing
1 sysadmin
1 tips
[...]
Total categories: 61
Singleton tags (used only once)
===============================
count name
----- ------------------------
1 adminjitsu
1 automation
1 commodore-64
1 cryptography
1 docker
1 espanso
1 fortune
1 vim
1 zsh
[...]
Total singleton tags: 69
Singleton categories (used only once)
=====================================
count name
----- ------------------------------
1 retrocomputing
1 sandbox
1 sysadmin
1 troubleshooting
[...]
Total singleton categories: 9
Files by Tag (excerpt)
linux
content/posts/kernel-tips.md
content/posts/bash-wizardry.md
docker
content/posts/compose-cleanup.md
content/posts/build-secrets.md
hugo
content/posts/theme-tweaks.md
content/posts/tag-audit-release.md
Conclusion
Whether you’re prepping a cleanup pass or just curious which tags are pulling their weight, tag-audit.py
gives you a clear snapshot of your Hugo taxonomy.
PRs, issues, and suggestions welcome! feedback@adminjitsu.com