Article Extractor
Extract clean article content from URLs, removing ads, navigation, and clutter. Multi-tool fallback ensures reliability.
Workflow
When user provides a URL to download/extract:
- Call the extraction script directly with the URL (do NOT fetch the URL first with web_fetch)
- Script handles fetching, extraction, and saving automatically
- Returns clean markdown file with frontmatter
Usage
# Basic extraction
scripts/extract-article.sh "https://example.com/article"
# Specify output location
scripts/extract-article.sh "https://example.com/article" -o my-article.md -d ~/Documents
# Try Wayback Machine if original fails
scripts/extract-article.sh "https://example.com/article" --wayback
Make script executable if needed: chmod +x scripts/extract-article.sh
Key Options
-o <file>- Output filename-d <dir>- Output directory-w, --wayback- Try Wayback Machine if extraction fails-t <tool>- Force tool:jina,trafilatura,readability,fallback-q- Quiet mode
For complete options, exit codes, tool details, and examples, see references/tools-and-options.md.
Common Failures
- Exit 3 (access denied): Paywall or login required - try
--wayback - Exit 4 (no content): Heavy JavaScript - try different
--tool - Exit 2 (network): Connection issue - check URL
Local Tools (Optional)
For offline extraction: scripts/install-deps.sh