News Articles Rename Skill

News Articles Rename

Purpose

Newspaper articles saved as PDFs or images typically arrive with unhelpful filenames like Image 2026-02-25 15-52-38.pdf. This skill extracts the main headline from each file using OCR and renames it to Article Title.pdf — making the News folder instantly browsable.

Target Folder

The default target is always:

Vivien (PA)/News/

Supported File Types

Process any file with these extensions: .pdf, .png, .jpg, .jpeg

Skip hidden files (starting with .) and any file that doesn't match these extensions.

How It Works

Run the bundled script scripts/rename_articles.py which handles the full pipeline:

python3 <skill-path>/scripts/rename_articles.py "<news-folder-path>"

The script will:

Scan the folder for all supported files
For each file, extract the first page as an image (300 DPI)
Run Tesseract OCR on the image
Identify the headline using heuristics (skip metadata, collect first substantial text block)
Apply common OCR corrections (e.g. "Al" → "AI")
Sanitise the headline for use as a filename
Rename the file, handling duplicates by appending a number
Print a summary table of old → new filenames

The script also handles mounted filesystem lock issues automatically by copying files to a temp directory for OCR processing when direct reads fail.

After Running

Present the results as a clear summary table showing what was renamed:

| # | Original Filename | New Filename | Status | |---|---|---|---| | 1 | Image 2026-02-25 15-52-38.pdf | Headline Goes Here.pdf | ✅ Renamed | | 2 | Image 2026-02-25 15-53-03.pdf | Another Article.pdf | ✅ Renamed | | 3 | some-file.png | some-file.png | ⚠️ No title found |

Flag any files that couldn't be processed and explain why. Note that minor OCR artefacts in headlines (e.g. misread characters) are expected from Tesseract — only flag files where no headline could be extracted at all.

Important Notes

Always process every file in the folder. Do not leave any file out.
If OCR is uncertain about a headline, prefer keeping the original name over guessing wrong.
The script handles both single-page and multi-page PDFs — only the first page is used for title extraction.
For image files (.png, .jpg, .jpeg), OCR is run directly on the image.

Agent Skills: News Articles Rename

Install this agent skill to your local

Skill Files