Find Missing Translations
Overview
Extract string resource keys from the default values/strings.xml that are absent in a target locale's strings.xml, excluding non-translatable entries. Outputs missing keys and offers to translate them.
When to Use
- Need to find untranslated strings for a specific locale
- Preparing a batch of strings for a translator
- Checking translation coverage after adding new features
Background: Crowdin strip-identical behavior
This repo syncs translations via Crowdin (branch l10n_crowdin_translations). Crowdin's default export behavior omits any translation that exactly equals the source, so a key that the translator deliberately kept as English (common for brand terms like "Nowhere Drop", single-word loanwords like "Apps" / "Feed" / "Issues", or version prefixes like "v%1$s") will not appear in the locale's strings.xml even though the Crowdin UI shows it as 100% translated.
What this means for this skill:
- The raw on-disk diff is the candidate set. A key missing from a locale file is either genuinely untranslated or a source-identical entry Crowdin stripped. Both are reported; the human decides which to skip. The Crowdin web UI ("N untranslated") is the ground truth for what genuinely needs work.
- Source-identical entries are a small, recognizable minority. Brand terms (
Nowhere X), single-word loanwords (Apps/Feed/Issues), and bare version/format strings (v%1$s) are the usual cases. Skip these by inspection rather than translating them to something identical. - Don't add source-identical fallbacks. Android falls back to
values/strings.xmlat runtime, so a key intentionally kept as English already renders correctly, and Crowdin's next sync would strip a local duplicate anyway.
Historical note: an earlier version of this skill tried to auto-filter the candidate list with a git "sync-timestamp" heuristic (skip any key added before the last
New Crowdin translationscommit). It was dropped because it produced false negatives: a key added shortly before an export that translators simply hadn't reached yet is genuinely missing, but the heuristic classified it as "Crowdin already decided." Trust the raw diff + the Crowdin UI instead.
Target Locales
The default set of locales (unless the user specifies otherwise):
| Locale | Language | Directory |
|--------|----------|-----------|
| cs-rCZ | Czech | values-cs-rCZ |
| pt-rBR | Brazilian Portuguese | values-pt-rBR |
| sv-rSE | Swedish | values-sv-rSE |
| de-rDE | German | values-de-rDE |
Technique
1. Identify files
Default: amethyst/src/main/res/values/strings.xml
Target: amethyst/src/main/res/values-<locale>/strings.xml
2. Find missing keys using cs-rCZ as reference
Always diff against cs-rCZ first — it is the most complete locale and serves as the reference. Any keys missing in cs-rCZ will also be missing in the other target locales.
You MUST diff both <string name= AND <plurals name= — these are independent resource types and a key that is a <plurals> in the source will never appear in a <string> diff. Forgetting <plurals> is the most common silent failure of this skill (it misses things like music_playlist_track_count, notification_count_more, etc.).
# Strings: extract translatable keys from default (exclude translatable="false")
echo "=== missing <string> ==="
comm -23 \
<(grep '<string name=' amethyst/src/main/res/values/strings.xml \
| grep -v 'translatable="false"' \
| sed 's/.*name="\([^"]*\)".*/\1/' | sort) \
<(grep '<string name=' amethyst/src/main/res/values-cs-rCZ/strings.xml \
| sed 's/.*name="\([^"]*\)".*/\1/' | sort)
# Plurals: a separate resource type — MUST be diffed independently
echo "=== missing <plurals> ==="
comm -23 \
<(grep '<plurals name=' amethyst/src/main/res/values/strings.xml \
| sed 's/.*name="\([^"]*\)".*/\1/' | sort) \
<(grep '<plurals name=' amethyst/src/main/res/values-cs-rCZ/strings.xml \
| sed 's/.*name="\([^"]*\)".*/\1/' | sort)
This gives two lists of missing key names — keep them separate; <plurals> translations need the per-locale CLDR category set (see Step 5 → "Plurals: handle with care").
Crowdin can asymmetrically strip keys across locales (each translator independently chose source-identical for different keys), so cs-rCZ is not a reliable upper bound. Diff every target locale and union the results — don't assume the cs-rCZ set covers the others. A quick per-locale count is a useful sanity check against the Crowdin UI's "N untranslated":
for locale in cs-rCZ de-rDE sv-rSE pt-rBR; do
ns=$(comm -23 \
<(grep '<string name=' amethyst/src/main/res/values/strings.xml \
| grep -v 'translatable="false"' | sed 's/.*name="\([^"]*\)".*/\1/' | sort) \
<(grep '<string name=' amethyst/src/main/res/values-$locale/strings.xml \
| sed 's/.*name="\([^"]*\)".*/\1/' | sort) | wc -l)
np=$(comm -23 \
<(grep '<plurals name=' amethyst/src/main/res/values/strings.xml \
| sed 's/.*name="\([^"]*\)".*/\1/' | sort) \
<(grep '<plurals name=' amethyst/src/main/res/values-$locale/strings.xml \
| sed 's/.*name="\([^"]*\)".*/\1/' | sort) | wc -l)
echo "$locale: strings=$ns plurals=$np total=$((ns+np))"
done
The combined strings + plurals total should line up with the Crowdin web UI's untranslated count for that locale. If it does, the raw diff is your actionable set (minus any source-identical entries you skip by inspection — see Background).
3. Get English values for missing keys
For each missing key, extract its English value. <string> is a single line; <plurals> is a multi-line block — handle each appropriately.
# Missing <string>: full line from default strings.xml
while IFS= read -r key; do
grep "name=\"$key\"" amethyst/src/main/res/values/strings.xml
done < <(comm -23 \
<(grep '<string name=' amethyst/src/main/res/values/strings.xml \
| grep -v 'translatable="false"' \
| sed 's/.*name="\([^"]*\)".*/\1/' | sort) \
<(grep '<string name=' amethyst/src/main/res/values-cs-rCZ/strings.xml \
| sed 's/.*name="\([^"]*\)".*/\1/' | sort))
# Missing <plurals>: extract the multi-line block (opening tag through </plurals>)
while IFS= read -r key; do
awk -v key="$key" '
$0 ~ "<plurals name=\"" key "\"" { in_p = 1 }
in_p { print }
in_p && /<\/plurals>/ { in_p = 0 }
' amethyst/src/main/res/values/strings.xml
done < <(comm -23 \
<(grep '<plurals name=' amethyst/src/main/res/values/strings.xml \
| sed 's/.*name="\([^"]*\)".*/\1/' | sort) \
<(grep '<plurals name=' amethyst/src/main/res/values-cs-rCZ/strings.xml \
| sed 's/.*name="\([^"]*\)".*/\1/' | sort))
4. Audit missing strings for plural-shaped patterns
Before presenting results, scan the missing English strings for two red-flag patterns and warn the user about each match:
- Hardcoded
"1"next to a noun. A new English string like"1 reply","1 follower", or"1 minute ago"almost always belongs in a<plurals>resource — not a<string>. Hardcoding1in English forces every translator to either also hardcode1(breaking languages where theonecategory covers other numbers, e.g. some Slavic languages) or to silently change the meaning. - A
%d/%1$dplaceholder in a clearly singular/plural sentence (e.g."%1$d reply","%d follower"). Even though the placeholder is parameterised, English-onlyone/otheragreement won't survive translation into languages that needfew/many.
Also audit existing <plurals> resources for two anti-patterns:
quantity="one"items that hardcode the literal1(instead of using a%d/%1$dplaceholder) — broken for languages where theoneCLDR category covers more than justn=1(Russian, Ukrainian, Croatian, etc.).quantity="zero"items in any locale that doesn't natively use thezeroCLDR category — i.e. everything except Arabic (ar) and Welsh (cy). ICU/CLDR mapscount=0tootherfor English and all the locales we ship to (cs, de, pt-BR, sv, etc.), so<item quantity="zero">is dead code there:getQuantityString(id, 0)will pickother, never the zero entry, and the visible runtime string ends up"…0 items"instead of the intended"…no items".
If a UX genuinely wants special "no items" wording at count=0, that has to be a call-site if (count == 0) branch to a separate <string>, not a quantity="zero" plural item.
Flag and offer to fix:
# Scan every locale's strings.xml for <item quantity="one"> entries that
# hardcode "1" (or other literal digits) instead of using a placeholder.
# Looks at default + all values-* locales.
for f in amethyst/src/main/res/values/strings.xml amethyst/src/main/res/values-*/strings.xml; do
awk -v file="$f" '
/<plurals/ { in_plurals = 1; name = $0; sub(/.*name="/, "", name); sub(/".*/, "", name) }
in_plurals && /quantity="one"/ {
# Extract item text (between > and <)
text = $0; sub(/^[^>]*>/, "", text); sub(/<.*$/, "", text)
# Flag if it contains a digit AND no %d / %1$d placeholder
if (text ~ /[0-9]/ && text !~ /%[0-9]*\$?d/) {
print file ": <plurals name=\"" name "\"> one=\"" text "\""
}
}
/<\/plurals>/ { in_plurals = 0 }
' "$f"
done
Then scan for dead quantity="zero" entries. CLDR's zero category is integer-bearing only in Arabic (ar) and Welsh (cy). In every other locale, count=0 falls through to other, so a <item quantity="zero"> entry is dead and likely a translator/author bug (or it silently never fires):
for f in amethyst/src/main/res/values/strings.xml amethyst/src/main/res/values-*/strings.xml; do
# Skip Arabic and Welsh — they natively use the zero category.
case "$f" in
*values-ar*|*values-cy*) continue ;;
esac
awk -v file="$f" '
/<plurals/ { in_plurals = 1; name = $0; sub(/.*name="/, "", name); sub(/".*/, "", name) }
in_plurals && /quantity="zero"/ {
text = $0; sub(/^[^>]*>/, "", text); sub(/<.*$/, "", text)
print file ": <plurals name=\"" name "\"> zero=\"" text "\""
}
/<\/plurals>/ { in_plurals = 0 }
' "$f"
done
For each hit, warn the user that the entry is unreachable in that locale. The fix is to remove the <item quantity="zero"> and, if the UX wanted distinct wording for count=0, add a separate <string> plus an if (count == 0) branch at the call site (see "Plurals: handle with care" below).
Quick scan over the missing keys:
# Flag missing English values that look like they should be <plurals>
while IFS= read -r key; do
line=$(grep "name=\"$key\"" amethyst/src/main/res/values/strings.xml)
# Hardcoded standalone "1" (word-boundary), or a count placeholder followed by a likely-countable noun
if echo "$line" | grep -qE '>([^<]*\b1\b[^<]*|[^<]*%[0-9]*\$?d[^<]*)<'; then
echo "PLURAL CANDIDATE: $line"
fi
done < <(comm -23 \
<(grep '<string name=' amethyst/src/main/res/values/strings.xml \
| grep -v 'translatable="false"' \
| sed 's/.*name="\([^"]*\)".*/\1/' | sort) \
<(grep '<string name=' amethyst/src/main/res/values-cs-rCZ/strings.xml \
| sed 's/.*name="\([^"]*\)".*/\1/' | sort))
The regex is intentionally noisy — review each hit by hand. Many %d strings (e.g. "Limits for kind %1$d", "Max event size (bytes)") are not plural-bearing. Only flag the ones whose surrounding noun changes form with the count.
For each genuine match, stop and warn the user before translating, e.g.:
⚠️
notification_countis"1 new reply"— this hardcodes"1"and should likely be a<plurals>resource (e.g.quantity="one"→"%d new reply",quantity="other"→"%d new replies"). Convert before translating?
Do not silently translate plural-shaped <string> entries; the wrong shape will then need to be fixed in every locale.
5. Present results and ask to translate
Output the missing entries as raw XML resource lines (copy-paste ready):
<string name="attestation_valid">Valid</string>
<string name="attestation_valid_from">Valid from %1$s</string>
<string name="feed_group_lists">Lists</string>
Also check <string-array> and <plurals> tags using the same approach if the project uses them.
Plurals: handle with care
When adding or proposing <plurals> entries, follow these rules:
- Never hardcode
"1"in the English text of aquantity="one"item. Use the format placeholder (e.g.%1$d/%d) so the runtime substitutes the actual count. Hardcoding"1"breaks every language whoseonecategory covers numbers other than 1 (e.g. some Slavic languages). - Don't assume
one+otheris enough. CLDR plural categories vary by language:zero,one,two,few,many,other. Always include every category the target language uses, not just the categories present in English. Examples:- English (
en):one,other - Czech (
cs):one,few,many,other - Polish (
pl):one,few,many,other - Russian (
ru):one,few,many,other - Arabic (
ar):zero,one,two,few,many,other - German / Swedish / Brazilian Portuguese:
one,other
- English (
- When a missing string contains a count placeholder and is conceptually a singular/plural pair, flag it before translating — it may belong as a
<plurals>resource rather than a single<string>. Surface this to the user before proposing translations. - Do not use
quantity="zero"outside Arabic (ar) and Welsh (cy). CLDR'szerocategory is integer-bearing only in those two languages. Android callsPluralRules.select(0)for the device locale; in English/German/Czech/Polish/Russian/Swedish/Portuguese/etc. it returnsother, so the explicit<item quantity="zero">is never picked at runtime and the user sees"…0 items"instead of the intended wording. If the design calls for "no items" at count=0, model it as a separate<string>and anif (count == 0)branch at the call site:val label = if (count == 0) { stringRes(R.string.foo_no_items, dateLabel) } else { pluralStringResource(R.plurals.foo_items, count, dateLabel, count) } - Reference: Android
<plurals>docs and CLDR plural rules.
Then ask the user: "Would you like me to translate these missing strings into [list of target locales]?"
6. Adding translations (if approved)
When adding translated strings to locale files:
- Append new strings at the bottom of the file, just before the closing
</resources>tag. - Do NOT try to insert them in alphabetical or matching order — a separate process handles ordering.
Common Mistakes
- Forgetting
translatable="false"— these should never appear in locale files - Diffing only
<string name=—<plurals>is a separate resource type; a source<plurals>missing from a locale will never show up in a<string>diff. Always run the diff twice (once per resource type) as shown in Step 2. The same goes for<string-array>if the project uses it. - Trusting a git "sync-timestamp" heuristic to pre-filter the list — this skill used to skip keys added before the last
New Crowdin translationscommit, on the theory that Crowdin had already "decided" them. It was dropped: a key added shortly before an export that translators hadn't reached yet is genuinely missing, so the heuristic silently dropped real work. Use the raw on-disk diff and reconcile against the Crowdin web UI's untranslated count instead. - Adding source-identical fallbacks locally — they get overwritten on the next Crowdin sync. Android falls back to
values/strings.xmlat runtime anyway, so a key intentionally kept as English already renders correctly. Skip these by inspection (brand terms, loanwords,v%1$s-style strings); don't translate them to an identical value. - Skipping per-locale diffs when only diffing cs-rCZ — Crowdin can strip different keys in different locales (each translator's choice), so cs-rCZ is not a reliable upper bound. Diff each target locale and union the results.
- Inserting strings in a specific position — always append at the bottom; ordering is handled separately
- Hardcoding
"1"in a<plurals>quantity="one"item — always use the count placeholder; otherwise non-Englishonecategories produce wrong text - Copying English's
one/otherset into every locale — each language must include all CLDR plural categories it uses (e.g. Czech needsone,few,many,other) - Using
<item quantity="zero">to special-case count=0 — outside Arabic and Welsh, this entry is unreachable: ICU/CLDR maps 0 →other, so the runtime never picks the zero item and the user sees"…0 items". Special-case at the call site with a separate<string>instead.