ArXiv Monitor Skill
Polls the ArXiv API every 6 hours for papers matching configured keywords. Deduplicates against previously seen papers, stores new summaries in named memory, and integrates with the morning briefing loop.
Setup
-
Set
ARXIV_KEYWORDSin.env:ARXIV_KEYWORDS=multi-agent systems,LLM reasoning,autonomous agents,RAG,tool use -
Start the monitor loop:
/loop 6h Skill({ skill: 'arxiv-monitor' })Or via CronCreate for programmatic control:
CronCreate({ schedule: '0 */6 * * *', task: "Invoke Skill({ skill: 'arxiv-monitor' }) to fetch new ArXiv papers", });
Core Logic
Step 1: Load Keywords and Seen Papers
const keywords = (process.env.ARXIV_KEYWORDS || 'multi-agent systems,autonomous agents')
.split(',')
.map(k => k.trim());
// Load previously seen paper IDs from memory
const seenRaw = await readMemory('arxiv-seen-ids');
const seenIds = new Set(seenRaw ? JSON.parse(seenRaw) : []);
Step 2: Search ArXiv API for Each Keyword
Use Bash to query the ArXiv API (no authentication required):
# Search for papers from last 7 days matching a keyword
ENCODED=$(python3 -c "import urllib.parse,sys; print(urllib.parse.quote(sys.argv[1]))" "$KEYWORD")
curl -s "https://export.arxiv.org/api/query?search_query=all:${ENCODED}&sortBy=submittedDate&sortOrder=descending&max_results=10"
Or use the HuggingFace MCP paper_search tool if available:
// Via mcp__claude_ai_Hugging_Face__paper_search if available
// Falls back to curl-based ArXiv API otherwise
Step 3: Filter and Store New Papers
const newPapers = [];
for (const paper of fetchedPapers) {
if (seenIds.has(paper.id)) continue; // Skip already processed
seenIds.add(paper.id);
newPapers.push({
id: paper.id,
title: paper.title,
authors: paper.authors.slice(0, 3).join(', '),
summary: paper.summary.slice(0, 300),
published: paper.published,
url: paper.url,
keyword: keyword,
});
}
// Persist seen IDs via named memory (cap at 1000 to prevent unbounded growth)
const seenArr = [...seenIds].slice(-1000);
await writeMemory('arxiv-seen-ids', JSON.stringify(seenArr));
Step 4: Append to Digest File
if (newPapers.length > 0) {
const digest = newPapers
.map(
p =>
`## ${p.title}\n**Authors:** ${p.authors}\n**Keyword:** ${p.keyword}\n**Published:** ${p.published}\n${p.summary}...\n[Read →](${p.url})\n`
)
.join('\n---\n');
// Append to named memory digest
const digestPath = '.claude/context/memory/named/arxiv-digest.md';
const existing = fs.existsSync(digestPath) ? fs.readFileSync(digestPath, 'utf8') : '';
fs.writeFileSync(
'.claude/context/memory/named/arxiv-digest.md',
`${existing}\n\n## ArXiv Update — ${new Date().toISOString().slice(0, 10)}\n\n${digest}`
);
}
Integration with Morning Briefing
The morning briefing loop reads arxiv-digest.md for recent papers:
/loop at 8:00am Read .claude/context/memory/named/arxiv-digest.md and issues.md. Summarize: (1) top 3 new papers relevant to agent-studio, (2) technical debt, (3) top 2 tasks for today.
Configuration Reference
| Variable | Default | Description |
| --------------------- | --------------------- | -------------------------------- |
| ARXIV_KEYWORDS | multi-agent systems | Comma-separated search keywords |
| ARXIV_MAX_RESULTS | 10 | Max papers per keyword per run |
| ARXIV_LOOKBACK_DAYS | 7 | Days to look back for new papers |
Deduplication
- Seen paper IDs stored via
writeMemory('arxiv-seen-ids', ...)(named memory API) - Capped at 1000 most recent IDs to prevent unbounded growth
- Papers with matching IDs are silently skipped on subsequent runs
- Reset by calling
await writeMemory('arxiv-seen-ids', '[]')
Related Skills
scheduled-tasks— CronCreate API for scheduling loopsexa-monitor— Exa web search companion monitorheartbeat— Start all 7 heartbeat loops including this onememory-search— Search the arxiv-digest for specific papers