CSCTF — Chat Shared Conversation To File
A Bun-native CLI that turns public ChatGPT, Gemini, Grok, and Claude share links into clean Markdown + HTML transcripts with preserved code fences, stable filenames, and optional GitHub Pages publishing.
Why This Exists
Copy/pasting AI share links often:
- Breaks fenced code blocks — loses formatting and structure
- Loses language hints — no syntax highlighting
- Produces messy filenames — random or unreadable names
- Requires manual cleanup — inconsistent formatting
CSCTF fixes this with:
- Stable slugs — deterministic, collision-proof filenames
- Language-preserving fences — code blocks retain syntax hints
- Normalized whitespace — clean, consistent output
- Static HTML twin — no JS, ready for hosting/archiving
- One-command GitHub Pages — instant shareable microsite
Quick Start
# Install
curl -fsSL https://raw.githubusercontent.com/Dicklesworthstone/chat_shared_conversation_to_file/main/install.sh | bash
# Convert any share link
csctf https://chatgpt.com/share/69343092-91ac-800b-996c-7552461b9b70
csctf https://gemini.google.com/share/66d944b0e6b9
csctf https://grok.com/share/bGVnYWN5_d5329c61-f497-40b7-9472-c555fa71af9c
csctf https://claude.ai/share/549c846d-f6c8-411c-9039-a9a14db376cf
Output:
<conversation_title>.md— Clean Markdown with preserved code fences<conversation_title>.html— Styled static HTML (zero JavaScript)
Supported Providers
| Provider | URL Pattern | Method | Notes |
|----------|-------------|--------|-------|
| ChatGPT | chatgpt.com/share/* | Headless Chromium | Public shares only |
| Gemini | gemini.google.com/share/* | Headless Chromium | Public shares only |
| Grok | grok.com/share/* | Headless Chromium | Public shares only |
| Claude | claude.ai/share/* | Your Chrome session | Requires login |
Claude.ai Special Handling
Claude.ai uses Cloudflare protection that blocks standard browser automation. CSCTF handles this automatically:
- Copies your Chrome session cookies to a temporary profile
- Launches Chrome with remote debugging enabled
- Connects via Chrome DevTools Protocol to extract conversation
- If Chrome is running, offers to save tabs, restart, and restore afterward
Requirements: Chrome installed + logged into claude.ai in your regular Chrome session.
Design Principles
| Principle | Implementation | |-----------|----------------| | Determinism | Explicit slugging and collision handling | | Minimal network | Only share URL fetched (update checks/publish opt-in) | | Safety | Static HTML (inline CSS/HLJS), no scripts emitted | | Clarity | Colorized step-based logging, confirmation gates | | Atomicity | Temp+rename writes prevent partial files |
How It Works
ChatGPT, Gemini, Grok (End-to-End)
1. Launch headless Playwright Chromium with stealth config
(spoofed navigator properties, realistic headers)
2. Navigate twice (domcontentloaded → networkidle) for late-loading assets
3. Detect provider from URL hostname
4. Wait for provider-specific selectors with retry/fallback
5. Extract each role's inner HTML (assistant/user), traverse Shadow DOM
6. Clean pills/metadata, run Turndown with fenced-code rule
7. Normalize whitespace and newlines
8. Write Markdown to temp file, rename atomically
9. Render HTML twin with inline CSS/TOC/HLJS
Claude.ai
1. Copy Chrome session cookies to temporary profile
2. Launch Chrome with remote debugging
3. Connect via Chrome DevTools Protocol
4. Extract conversation HTML
5. Process through same Turndown/normalization pipeline
6. Clean up temporary profile
Processing Algorithms
Selector Strategy
Provider-specific selectors with fallback chains:
- ChatGPT:
article [data-message-author-role] - Gemini: Custom web components (
share-turn-viewer,response-container) - Grok: Flexible
data-testidpatterns - Claude:
[data-testid="user-message"]and streaming indicators
Each has multiple fallbacks tried with short timeouts.
Turndown Customization
- Injects fenced code blocks
- Detects language via
class="language-*" - Strips citation pills and
data-start/data-endattributes
Normalization
- Converts newlines to
\n - Removes Unicode LS/PS characters
- Collapses excessive blank lines
Slugging Algorithm
Title → lowercase → non-alphanumerics → "_" → trim → max 120 chars
→ Windows reserved-name suffix → collision suffix (_2, _3, ...)
HTML Rendering
- Markdown-it + highlight.js
- Heading slug de-dupe for TOC
- Inline CSS for light/dark/print
- Zero JavaScript
Command Reference
csctf <share-url> [options]
Output Options
| Flag | Default | Description |
|------|---------|-------------|
| --outfile <path> | auto | Override output path |
| --no-html / --md-only | off | Skip HTML output |
| --html-only | off | Skip Markdown output |
| --quiet | off | Minimal logging |
| --timeout-ms <ms> | 60000 | Navigation + selector timeout |
GitHub Pages Publishing
| Flag | Default | Description |
|------|---------|-------------|
| --publish-to-gh-pages | off | Publish to GitHub Pages |
| --gh-pages-repo <owner/name> | my_shared_conversations | Target repo |
| --gh-pages-branch <branch> | gh-pages | Target branch |
| --gh-pages-dir <dir> | csctf | Subdirectory in repo |
| --remember | off | Save GH settings |
| --forget-gh-pages | off | Clear saved settings |
| --dry-run | off | Simulate publish (build index, no push) |
| --yes / --no-confirm | off | Skip PROCEED confirmation prompt |
| --gh-install | off | Auto-install gh CLI |
Other
| Flag | Description |
|------|-------------|
| --check-updates | Print latest release tag |
| --version | Print version and exit |
Output Format
Markdown Structure
# Conversation: <Title>
**Source:** https://chatgpt.com/share/...
**Retrieved:** 2026-01-08T15:30:00Z
## User
How do I sort an array in Python?
## Assistant
Here's how to sort an array in Python:
```python
# Sort in place
my_list.sort()
# Return new sorted list
sorted_list = sorted(my_list)
### HTML Features
- **Standalone** — No external dependencies
- **Zero JavaScript** — Safe for any hosting
- **Inline CSS** — Light/dark mode via `prefers-color-scheme`
- **Syntax highlighting** — highlight.js themes inline
- **Table of contents** — Auto-generated from headings
- **Language badges** — Code block language indicators
- **Print-friendly** — Optimized print styles
## Filename Generation
"How to Build a REST API" → how_to_build_a_rest_api.md "Python Tips & Tricks!" → python_tips_tricks.md "File exists already" → file_exists_already_2.md
Rules:
- Lowercase
- Non-alphanumerics → `_`
- Trimmed leading/trailing `_`
- Max 120 characters
- Windows reserved names suffixed
- Collisions: `_2`, `_3`, ...
## GitHub Pages Publishing
### Quick Recipe
```bash
# Publish with defaults
csctf <url> --publish-to-gh-pages --yes
# Creates: <gh-username>/my_shared_conversations repo
# Branch: gh-pages
# Directory: csctf/
Remembered Settings
# First time: save settings
csctf <url> --publish-to-gh-pages --remember --yes
# Subsequent: just use --yes
csctf <url> --yes
# Clear remembered settings
csctf --forget-gh-pages
Custom Configuration
csctf <url> --publish-to-gh-pages \
--gh-pages-repo myuser/my-chats \
--gh-pages-branch main \
--gh-pages-dir exports \
--yes
Requirements
- GitHub CLI (
gh) installed and authenticated - Verify with:
gh auth status
Publish Flow
- Resolve repo/branch/dir (use remembered or defaults)
- Clone (or create via
gh) - Copy MD + HTML files
- Regenerate
manifest.jsonandindex.html - Commit + push
- Print viewer URL
Recipes
Quiet CI Scrape (MD only)
csctf <url> --md-only --quiet --outfile /tmp/chat.md
HTML-only for Embedding
csctf <url> --html-only --outfile site/chat.html
Slow/Large Conversations
csctf <url> --timeout-ms 90000
Custom Browser Cache
PLAYWRIGHT_BROWSERS_PATH=/opt/ms-playwright csctf <url>
Batch Archive
for url in $URLS; do
csctf "$url" --outfile ~/archive/ --quiet
done
Security & Privacy
Network Behavior
- Only fetches: The share URL itself
- Opt-in: Update checks, GitHub publish flows
- Auth: GitHub CLI (
gh) for publishing—no tokens stored
HTML Safety
- Zero JavaScript in output
- Inline styles only
- Citation pills and data attributes stripped
- highlight.js used statically
Filesystem
- Temp+rename write pattern (atomic)
- Collision-proof naming
- Config:
~/.config/csctf/config.json
Claude.ai Cookies
- Copied to temporary directory only
- Used for single scraping session
- Original Chrome profile never modified
Performance
| Phase | Time | |-------|------| | First run (Chromium download) | 30-60s | | Subsequent runs | 5-15s | | Claude.ai (uses local Chrome) | 5-10s |
Characteristics
- Playwright browsers cached after first run
- 60s default timeout, 3-attempt backoff
- Single page/context, linear processing
- Atomic writes prevent partial outputs
Failure Modes & Remedies
| Symptom | Fix |
|---------|-----|
| "No messages found" | Link is private or layout changed; verify public share, retry with --timeout-ms 90000 |
| Bot detection / challenge page | Stealth techniques used; retry or verify link in browser |
| Timeout or blank page | Raise --timeout-ms, verify connectivity |
| Publish fails (auth) | Ensure gh auth status passes |
| Publish fails (branch/dir) | Pass --gh-pages-branch / --gh-pages-dir; use --remember |
| Filename collisions | Expected; tool appends _2, _3, ... |
| Claude.ai Cloudflare challenge | Complete verification in Chrome window, press Enter |
| Claude.ai won't load | Ensure logged into claude.ai in Chrome; close Chrome if prompted |
Environment Variables
Runtime
| Variable | Description |
|----------|-------------|
| PLAYWRIGHT_BROWSERS_PATH | Reuse cached Chromium bundle |
Installer
| Variable | Description | Default |
|----------|-------------|---------|
| VERSION | Pin release tag | latest |
| DEST | Install directory | ~/.local/bin |
| CHECKSUM_URL | Override checksum location | — |
File Locations
| Path | Purpose |
|------|---------|
| ~/.local/bin/csctf | Binary |
| ~/.config/csctf/config.json | GitHub Pages settings |
| ~/.cache/ms-playwright/ | Playwright Chromium cache |
Installation
# One-liner (recommended)
curl -fsSL https://raw.githubusercontent.com/Dicklesworthstone/chat_shared_conversation_to_file/main/install.sh | bash
# Pin version
VERSION=v1.0.0 curl -fsSL .../install.sh | bash
# Custom directory
DEST=/opt/bin curl -fsSL .../install.sh | bash
# Verify checksum
curl -fsSL .../install.sh | bash -s -- --verify
From Source
bun install
bun run build
# Binary at dist/csctf
Comparison
| Feature | Copy/Paste | csctf | |---------|------------|-------| | Code blocks preserved | Often broken | Always preserved | | Language hints | Lost | Detected and kept | | Filenames | Random/messy | Deterministic slugs | | HTML output | None | Styled, no-JS twin | | GitHub Pages | Manual | One command | | Collision handling | Overwrite | Auto-suffix |
Limitations
- Requires public share links (except Claude.ai which uses your session)
- Provider layouts may change (selectors maintained with fallbacks)
- Markdown/HTML exports require share to be available at scrape time
- Claude.ai requires Chrome installed with active login session
- First run downloads Playwright Chromium (~200MB)
Integration with Flywheel
| Tool | Integration | |------|-------------| | CASS | Archive conversations for session search | | CM | Extract procedural memory from exported chats | | Agent Mail | Attach conversation exports to agent messages | | NTM | Export multi-agent session transcripts |