Agent Skills: Screenpipe API

Query the user's screen recordings, audio, UI elements, and usage analytics via the local Screenpipe REST API at localhost:3030. Use when the user asks about their screen activity, meetings, apps, productivity, media export, retranscription, or connected services.

UncategorizedID: mediar-ai/screenpipe/screenpipe-api

Repository

screenpipeLicense: NOASSERTION
19,6411,899

Install this agent skill to your local

pnpm dlx add-skill https://github.com/screenpipe/screenpipe/tree/HEAD/.claude/skills/screenpipe-api

Skill Files

Browse the full folder contents for screenpipe-api.

Download Skill

Loading file tree…

.claude/skills/screenpipe-api/SKILL.md

Skill Metadata

Name
screenpipe-api
Description
Query the user's screen recordings, audio, UI elements, and usage analytics via the local Screenpipe REST API at localhost:3030. Use when the user asks about their screen activity, meetings, apps, productivity, media export, retranscription, or connected services.

Screenpipe API

Local REST API at http://localhost:3030. Full reference (60+ endpoints): https://docs.screenpi.pe/llms-full.txt

Authentication

ALL requests require authentication. Add the auth header to every curl call:

curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/..."

The $SCREENPIPE_LOCAL_API_KEY env var is already set in your environment. Without it you get 403. The only exception is /health (no auth needed).

Context Window Protection

API responses can be large. Always write curl output to a file first (curl ... -o /tmp/sp_result.json), check size (wc -c /tmp/sp_result.json), and if over 5KB read only the first 50-100 lines. Extract what you need with jq. NEVER dump full large responses into context.

For the list endpoints (/search, /elements, /frames/{id}/elements) you can also cut tokens at the source: add &format=csv (or tsv) to get a columnar table that writes each column name once instead of repeating keys per row, and &fields=a,b,c to return only the columns you need (dotted paths like content.text). On a list of UI elements that is roughly a 70% token cut versus JSON. For the element endpoints specifically, &format=outline (alias tree) goes further still — a deduped, indented tree of just the text-bearing nodes (~91% fewer tokens, measured) — and is the best default for reading UI structure. Text-heavy ocr/audio barely benefit from any reshaping (the text blob dominates), so reach for fields + max_content_length there. With no format/fields the response is unchanged JSON.


1. Search — GET /search

curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/search?q=QUERY&content_type=all&limit=10&start_time=1h%20ago"

Parameters

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | q | string | No | Keywords. Do NOT use for audio searches — transcriptions are noisy, q filters too aggressively. | | content_type | string | No | all (default), accessibility, audio, input, ocr, memory. Screen text is primarily captured via the OS accessibility tree (accessibility); OCR is a fallback for apps without accessibility support (games, remote desktops). Use all unless you need a specific modality. | | limit | integer | No | Max 1-20. Default: 10 | | offset | integer | No | Pagination. Default: 0 | | start_time | ISO 8601 or relative | Yes | Accepts 2024-01-15T10:00:00Z or 16h ago, 2d ago, 30m ago | | end_time | ISO 8601 or relative | No | Defaults to now. Accepts now, 1h ago | | app_name | string | No | e.g. "Google Chrome", "Slack", "zoom.us" | | window_name | string | No | Window title substring | | speaker_name | string | No | Filter audio by speaker (case-insensitive partial) | | focused | boolean | No | Only focused windows | | tags | string | No | Comma-separated; return only items carrying ALL of them (e.g. person:ada,project:atlas). Works for screen/audio and, with content_type=memory, memories. See Tags below. | | include_related | boolean | No | With tags, also return a related map of co-occurring tags (people/projects/workflows seen alongside yours), most-frequent first. One call for the surrounding context instead of several. See Tags below. | | max_content_length | integer | No | Truncate each result's text (middle-truncation) | | format | string | No | json (default), csv, tsv/table, or outline/tree (element endpoints only). CSV/TSV return a columnar table (column names written once) instead of one JSON object per row. outline returns a deduped indented text tree of the text-bearing UI nodes — the cheapest read for "what's on screen?" (~91% fewer tokens). CSV is lossless; TSV collapses newlines (worse for long ocr text). | | fields | string | No | Comma-separated column allowlist of dotted paths, e.g. type,content.app_name,content.text. Returns only those columns (handy for dropping the repeated absolute content.file_path). Works for json too (sparse objects). |

Progressive Disclosure

Don't jump to heavy /search calls. Escalate:

| Step | Endpoint | When | |------|----------|------| | 0 | GET /memories?q=... | Always query first/in parallel — highest signal, lowest cost | | 1 | GET /activity-summary?start_time=...&end_time=... | Broad questions ("what was I doing?", "which apps?") | | 2 | GET /search?... | Need specific content | | 3 | GET /elements?... or GET /frames/{id}/context | UI structure, buttons, links | | 4 | GET /frames/{frame_id} (PNG) | Visual context needed |

Decision tree:

  • "What was I doing?" → Step 1 only
  • "Summarize my meeting" → Step 2 with content_type=audio, NO q param. Add content_type=all for screen context.
  • "How long on X?" → Step 1 (/activity-summarytotal_active_minutes for the whole range, plus per-app/window minutes)
  • "Which apps today?" → Step 1 (do NOT use frame counts or raw SQLite)
  • "What button did I click?" → Step 3 (/elements with role=AXButton)
  • "Show me what I saw" → Step 2 (find frame_id) → Step 4

Tags — linking people, projects, topics

Tags are a shared label layer across screen, audio, and memories under one string namespace. Use namespaced tags: person:ada, project:atlas, topic:pricing. Two items sharing a tag are connected.

  • Add to a frame/audio: POST /tags/vision/{frame_id} or POST /tags/audio/{chunk_id} body {"tags":["person:ada"]}.
  • Add to a memory: include tags in POST /memories (or PUT /memories/{id}).
  • Retrieve by tag: GET /search?tags=person:ada&start_time=30d%20ago (screen+audio), or add content_type=memory for memories. Multiple tags AND together; matching is exact, not substring.

Frames are pruned by retention, so for a durable link tag a memory (memories also carry created_at and a frame_id back to the moment — jump there with GET /frames/{frame_id}). To pull everything about a person across time: one call for captures (content_type=all&tags=person:ada) plus one for facts (content_type=memory&tags=person:ada).

Add include_related=true to a tag query to get the surrounding context in the same response — the tags that co-occur with yours, grouped by namespace (prefix pluralized: person:people, project:projects) and ranked by frequency. Replaces the 2-3 follow-up "who/what else" calls with one:

curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" \
  "http://localhost:3030/search?tags=person:ada&include_related=true&limit=5"
# data: [...], related: { "people": ["connor","drew"], "projects": ["atlas"], "workflows": ["planning"] }

Critical Rules

  1. ALWAYS include start_time — queries without time bounds WILL timeout
  2. Start with 1-2 hour ranges — expand only if no results
  3. Use app_name when user mentions a specific app
  4. Keep limit low (5-10) initially
  5. "recent" = 30 min. "today" = since midnight. "yesterday" = yesterday's range
  6. If timeout, narrow the time range

Response Format

{
  "data": [
    {"type": "OCR", "content": {"frame_id": 12345, "text": "...", "timestamp": "...", "app_name": "Chrome", "window_name": "..."}},
    {"type": "Audio", "content": {"chunk_id": 678, "transcription": "...", "timestamp": "...", "speaker": {"name": "John"}}},
    {"type": "UI", "content": {"id": 999, "text": "Clicked 'Submit'", "timestamp": "...", "app_name": "Safari"}}
  ],
  "pagination": {"limit": 10, "offset": 0, "total": 42}
}

Note: The "OCR" type label is used for all screen text results, including text captured via the accessibility tree. Most screen text comes from accessibility, not OCR.


2. Activity Summary — GET /activity-summary

curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/activity-summary?start_time=1h%20ago&end_time=now"

Returns a rich overview with:

  • total_active_minutes: authoritative total active screen time for the whole range (every app, idle gaps excluded). Use this as the grand total / denominator. Do NOT sum windows[].minutes (capped at 30) and do NOT open db.sqlite to recompute durations — this field already is the answer.
  • apps: per-app minutes (active time), first/last seen
  • windows: every distinct window/tab with title, browser_url, and minutes spent — the most valuable field for what the user worked on (top 30 by time)
  • key_texts: one representative text snippet per window context (user input fields prioritized over static page text)
  • audio_summary.top_transcriptions: actual transcription text with speaker and timestamp (not just counts)

This is usually enough to answer "what was I doing?" without further searches. Only drill into /search if you need verbatim quotes or specific content.

Building a pipe/automation? Same rule: call this endpoint for time math. The numbers are computed server-side from frame timestamps — never recompute durations from raw frames, and never ask an LLM to sum minutes (it will drift). Let the model label activities; let this endpoint own the durations.


3. Elements — GET /elements

Lightweight FTS search across UI elements (~100-500 bytes each vs 5-20KB from /search).

curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/elements?q=Submit&start_time=1h%20ago&limit=10"

Parameters: q, frame_id, source (accessibility|ocr), role, start_time, end_time, app_name, limit, offset, plus format (json/csv/tsv/outline) and fields (dotted paths). Elements are uniform rows, so this is where compact formats pay off most.

format=outline (alias tree) is the cheapest read for "what's on screen?" — a deduped, indented text tree of just the text-bearing nodes (drops empty structural nodes + bounds, collapses repeats into ×N, #id is the ref, inlines (disabled)/(selected)/(focused)/(expanded)/(off-screen) state, body capped). Best on source=accessibility (the common UI case — structural noise, repeated rows, hierarchy, state): 85–99% fewer tokens than JSON (o200k_base). Flat OCR text blocks are the floor (~67%, nothing to dedup) — for pure OCR format=csv&fields=text is about as good.

# compact outline — best default for an LLM reading the UI
curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/elements?q=Submit&format=outline&limit=30"
#   frame 12345 · accessibility · 8 text elements
#     AXButton "Submit Order" #4012
#     AXButton "Cancel" #4013 (disabled)
#     AXCell "Shipped" #4020 ×6

# columnar table when you need specific columns (e.g. bounds) instead
curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/elements?frame_id=12345&format=csv&fields=role,text,bounds.left,bounds.top"

GET /frames/{id}/elements?format=outline gives the whole frame's tree the same way (and is capped, unlike the raw JSON dump).

Frame Context — GET /frames/{id}/context

Returns accessibility text, parsed nodes, and extracted URLs for a frame.

curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/frames/6789/context"

Common Roles (platform-specific)

Roles are not normalized across platforms. Use the correct format for the user's OS:

| Concept | macOS | Windows | Linux | |---------|-------|---------|-------| | Button | AXButton | Button | Button | | Static text | AXStaticText | Text | Label | | Link | AXLink | Hyperlink | Link | | Text field | AXTextField | Edit | Entry | | Text area | AXTextArea | Document | Text | | Menu item | AXMenuItem | MenuItem | MenuItem | | Checkbox | AXCheckBox | CheckBox | CheckBox | | Group | AXGroup | Group | Group | | Web area | AXWebArea | Pane | DocumentWeb | | Heading | AXHeading | Header | Heading | | Tab | AXTab | TabItem | Tab | | List item | AXRow | ListItem | ListItem |

OCR-only roles (fallback when accessibility unavailable): line, word, block, paragraph, page


4. Frames (Screenshots) — GET /frames/{frame_id}

curl -o /tmp/frame.png "http://localhost:3030/frames/12345"

Returns raw PNG. Never fetch more than 2-3 frames per query (~1000-2000 tokens each).


5. Media Export — POST /export

Renders a real-time MP4 (screen frames at their true timestamps + synced microphone audio). The clip's duration matches the wall-clock span you ask for — it is NOT a sped-up timelapse.

curl -X POST http://localhost:3030/export \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" \
  -d '{"start": "5m ago", "end": "now"}'

Fields: start + end (ISO 8601 or relative like "2h ago", "now"; end defaults to now), OR meeting_id to export a whole meeting. Optional output_path writes the MP4 to a specific absolute path (e.g. ~/Downloads/clip.mp4); otherwise it lands in the data dir's exports/ folder.

Returns {"output_path": "...", "frame_count": N, "audio_chunk_count": N, "duration_secs": N, "file_size_bytes": N}. Show output_path as an inline code block for playback. Long ranges can take a few minutes.

Audio & ffmpeg

Audio files from search results (file_path). Common operations:

ffmpeg -y -i /path/to/audio.mp4 -q:a 2 ~/.screenpipe/exports/output.mp3          # convert
ffmpeg -y -i input.mp4 -ss 00:01:00 -to 00:05:00 -q:a 2 clip.mp3                 # trim
ffmpeg -y -i input.mp4 -filter:v "setpts=0.5*PTS" -an fast.mp4                    # speed 2x
ffmpeg -y -i input.mp4 -t 10 -vf "fps=10,scale=640:-1" output.gif                 # GIF

Always use -y, save to ~/.screenpipe/exports/.


6. Retranscribe — POST /audio/retranscribe

curl -X POST http://localhost:3030/audio/retranscribe \
  -H "Content-Type: application/json" \
  -d '{"start": "1h ago", "end": "now"}'

Optional: engine (whisper-large-v3-turbo|whisper-large-v3|deepgram|qwen3-asr), vocabulary (array of {"word": "...", "replacement": "..."} for bias/replacement), prompt (topic context for Whisper).

Keep ranges short (1h max). Show old vs new transcription.


7. Raw SQL — POST /raw_sql

curl -X POST http://localhost:3030/raw_sql \
  -H "Content-Type: application/json" \
  -d '{"query": "SELECT ... LIMIT 100"}'

Rules: Every SELECT needs LIMIT. Always filter by time. Read-only. For time math, see the timestamp caveat below.

Timestamp caveat: DB timestamps are stored as RFC3339 strings — usually 2026-06-26T18:01:14.214586+00:00 (frames / audio_transcriptions / ui_events), though some tables (e.g. meetings.meeting_start, memories) use a Z suffix with milliseconds: 2026-06-26T18:01:14.214Z. Do not compare either form directly to SQLite datetime() strings like timestamp > datetime('now', '-10 seconds'): the T vs space makes it a lexical string comparison and can include stale same-day rows. Use datetime(timestamp) > datetime('now', '-10 seconds') (works for both forms), or for indexed string comparisons use an RFC3339-shaped cutoff: timestamp > strftime('%Y-%m-%dT%H:%M:%f+00:00', 'now', '-10 seconds').

WARNING: Do NOT use frame counts for time estimates — frames are event-driven, not fixed-interval. Use /activity-summary for screen time.

Schema

| Table | Key Columns | Time Column | |-------|-------------|-------------| | frames | app_name, window_name, browser_url, focused | timestamp | | ocr_text | text, app_name, window_name | join via frame_id | | elements | source, role, text, bounds_* | join via frame_id | | audio_transcriptions | transcription, device, speaker_id, is_input_device | timestamp | | audio_chunks | file_path | timestamp | | speakers | name, metadata | — | | ui_events | event_type, app_name, window_title, browser_url | timestamp | | accessibility | app_name, window_name, text_content, browser_url | timestamp | | meetings | meeting_app, title, attendees, detection_source | meeting_start | | memories | content, source, tags, importance | created_at |

Example Queries

-- Most used apps (last 24h)
SELECT app_name, COUNT(*) as frames FROM frames
WHERE timestamp > strftime('%Y-%m-%dT%H:%M:%f+00:00', 'now', '-24 hours') AND app_name IS NOT NULL
GROUP BY app_name ORDER BY frames DESC LIMIT 20

-- Most visited domains
SELECT CASE WHEN INSTR(SUBSTR(browser_url, INSTR(browser_url, '://') + 3), '/') > 0
  THEN SUBSTR(SUBSTR(browser_url, INSTR(browser_url, '://') + 3), 1, INSTR(SUBSTR(browser_url, INSTR(browser_url, '://') + 3), '/') - 1)
  ELSE SUBSTR(browser_url, INSTR(browser_url, '://') + 3) END as domain,
COUNT(*) as visits FROM frames
WHERE timestamp > strftime('%Y-%m-%dT%H:%M:%f+00:00', 'now', '-24 hours') AND browser_url IS NOT NULL
GROUP BY domain ORDER BY visits DESC LIMIT 20

-- Speaker stats
SELECT COALESCE(NULLIF(s.name, ''), 'Unknown') as speaker, COUNT(*) as segments
FROM audio_transcriptions at LEFT JOIN speakers s ON at.speaker_id = s.id
WHERE at.timestamp > strftime('%Y-%m-%dT%H:%M:%f+00:00', 'now', '-24 hours')
GROUP BY at.speaker_id ORDER BY segments DESC LIMIT 20

-- Context switches per hour
SELECT strftime('%H:00', timestamp) as hour, COUNT(*) as switches
FROM ui_events WHERE event_type = 'app_switch' AND timestamp > strftime('%Y-%m-%dT%H:%M:%f+00:00', 'now', '-24 hours')
GROUP BY hour ORDER BY hour LIMIT 24

Common patterns: GROUP BY date(timestamp) (daily), GROUP BY strftime('%H:00', timestamp) (hourly), HAVING frames > 5 (filter noise).


8. Connections — GET /connections

# List all integrations (Telegram, Slack, Discord, Email, Todoist, Teams, 40+)
curl http://localhost:3030/connections

# Get saved credentials for a webhook/token integration
curl http://localhost:3030/connections/telegram

Credential integrationsGET /connections/<id> returns saved fields to use with the service API directly:

  • Telegram: bot_token + chat_idPOST https://api.telegram.org/bot{token}/sendMessage
  • Slack: webhook_urlPOST {webhook_url} with {"text": "..."}
  • Discord: webhook_urlPOST {webhook_url} with {"content": "..."}
  • Todoist: api_tokenPOST https://api.todoist.com/api/v1/tasks with Bearer auth
  • Teams: webhook_urlPOST {webhook_url} with {"text": "..."}
  • Email: smtp_host, smtp_port, smtp_user, smtp_pass, from_address

OAuth/proxy integrations — tokens are stored in SecretStore and are never exposed via GET /connections/<id>. Call the local proxy instead; it injects auth and forwards to the upstream API:

# GitHub — create an issue (repo owner/name from pipe settings)
curl -X POST http://localhost:3030/connections/github/proxy/repos/OWNER/REPO/issues \
  -H "Content-Type: application/json" \
  -d '{"title":"Found a bug","body":"Steps to reproduce..."}'

# GitHub — comment on an issue
curl -X POST http://localhost:3030/connections/github/proxy/repos/OWNER/REPO/issues/42/comments \
  -H "Content-Type: application/json" \
  -d '{"body":"Thanks for the report!"}'

# Generic OAuth proxy pattern (Zoom, Vercel, Google Docs, Microsoft 365, etc.)
curl -X POST http://localhost:3030/connections/<id>/proxy/<upstream-api-path> \
  -H "Content-Type: application/json" \
  -d '{...}'

Do not call https://api.github.com/... directly from a pipe — use /connections/github/proxy/... instead. There is no /connections/<id>/token endpoint.

If not connected, tell the user to set it up from the Connections page in the desktop app.


9. Meetings — GET /meetings, PUT /meetings/:id

curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/meetings?start_time=1d%20ago&end_time=now&limit=10&offset=0"
curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/meetings?q=alice%40acme.com"
curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/meetings/42"

# Update mutable fields. This is a partial update body: omitted fields stay as-is.
curl -X PUT http://localhost:3030/meetings/42 \
  -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"title":"Q3 planning", "note":"<existing note>\n\n## Summary\n<summary>"}'

Returns detected meetings (from calendar, app detection, window titles, UI elements, multi-speaker audio). q is a case-insensitive substring filter against title, attendees, and notes.

Meeting updates use PUT /meetings/:id, not PATCH. Before appending an AI-generated summary, read the current meeting first and include the existing note text in the new note body so user-written notes are preserved.

| Field | Type | Description | |-------|------|-------------| | id | integer | Meeting ID | | meeting_start | ISO 8601 | Start time | | meeting_end | ISO 8601? | End time (null if ongoing) | | meeting_app | string | App (zoom, teams, meet, etc.) | | title | string? | Meeting title | | attendees | string? | Attendees | | note | string? | User notes / appended AI summaries | | detection_source | string | How detected (app, calendar, ui, etc.) |

Also available via raw SQL: SELECT * FROM meetings WHERE meeting_start > strftime('%Y-%m-%dT%H:%M:%f+00:00', 'now', '-24 hours') LIMIT 20


10. Speakers — Management & Reassignment

# Search speakers by name
curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/speakers/search?name=John"

# Get unnamed speakers (for labeling)
curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/speakers/unnamed?limit=20&offset=0"

# Get speakers similar to a given speaker (by voice embedding)
curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/speakers/similar?speaker_id=29&limit=5"

# Update speaker name/metadata
curl -X POST http://localhost:3030/speakers/update \
  -H "Content-Type: application/json" \
  -d '{"id": 29, "name": "Jordan"}'

# Reassign speaker for an audio chunk (propagates to similar chunks by default)
curl -X POST http://localhost:3030/speakers/reassign \
  -H "Content-Type: application/json" \
  -d '{"audio_chunk_id": 456, "new_speaker_name": "Jordan", "propagate_similar": true}'
# Returns: new_speaker_id, transcriptions_updated, old_assignments (for undo)

# Undo a speaker reassignment
curl -X POST http://localhost:3030/speakers/undo-reassign \
  -H "Content-Type: application/json" \
  -d '{"old_assignments": [{"transcription_id": 1, "old_speaker_id": 29}]}'

# Merge two speakers (keeps one, merges the other into it)
curl -X POST http://localhost:3030/speakers/merge \
  -H "Content-Type: application/json" \
  -d '{"speaker_to_keep_id": 5, "speaker_to_merge_id": 29}'

# Mark speaker as hallucination (false detection)
curl -X POST http://localhost:3030/speakers/hallucination \
  -H "Content-Type: application/json" \
  -d '{"speaker_id": 29}'

# Delete a speaker (also removes associated audio chunk files)
curl -X POST http://localhost:3030/speakers/delete \
  -H "Content-Type: application/json" \
  -d '{"id": 29}'

Speaker Reassignment Workflow

When the user says "that was actually Jordan, not Karishma":

  1. Search audio results to find the chunk_id for the misidentified audio
  2. Call POST /speakers/reassign with audio_chunk_id and new_speaker_name
  3. With propagate_similar: true (default), it also fixes similar-sounding chunks

11. Memories — High-Signal Persistent Knowledge

Memories are the highest-signal data source in screenpipe. They contain curated facts, user preferences, decisions, and project context — distilled from hours of screen/audio data. Always check memories when answering questions or building context.

When to Query Memories

Query memories FIRST (before or alongside /search) when:

  • The user asks about preferences, decisions, or past context
  • You need background on a project, person, or workflow
  • You're generating a summary, recommendation, or action plan
  • You're unsure about user preferences or past decisions
  • Any task where historical context would improve the output

Rule: If you're calling /search, also call /memories in parallel. Memories provide the "why" behind the raw screen data. Search gives you what happened; memories tell you what matters.

API

# Search memories (FTS) — do this often!
curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/memories?q=preference&limit=20"

# List recent memories (high importance first)
curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/memories?min_importance=0.5&limit=20"

# Filter by source or tags
curl -H "Authorization: Bearer $SCREENPIPE_LOCAL_API_KEY" "http://localhost:3030/memories?source=user&tags=project&limit=20"

# Create a memory
curl -X POST http://localhost:3030/memories \
  -H "Content-Type: application/json" \
  -d '{"content": "User prefers dark mode", "source": "user", "tags": ["preference", "ui"], "importance": 0.7}'

# Update a memory
curl -X PUT http://localhost:3030/memories/1 \
  -H "Content-Type: application/json" \
  -d '{"content": "User prefers dark mode in all apps", "importance": 0.8}'

# Delete a memory
curl -X DELETE http://localhost:3030/memories/1

Parameters for GET /memories: q (FTS search), source, tags, min_importance, start_time, end_time, limit, offset.

Memories also appear in /search?content_type=memory.

Creating Memories

When you learn something important about the user (preferences, decisions, project context), store it as a memory. Use importance 0.0-1.0 to rank signal. Only store genuinely useful long-lived facts, not transient observations.


12. Notifications — POST http://localhost:11435/notify

Send a notification to the screenpipe desktop UI. This uses the Tauri sidecar server (port 11435), not the main API (port 3030).

The notification body supports markdown: **bold**, `inline code`, and [link text](url). Links can be web URLs, file paths, or screenpipe deeplinks.

# Simple notification
curl -X POST http://localhost:11435/notify \
  -H "Content-Type: application/json" \
  -d '{"title": "3 new voice memos", "body": "found recordings from today"}'

# Markdown body with links
curl -X POST http://localhost:11435/notify \
  -H "Content-Type: application/json" \
  -d '{"title": "Meeting summary", "body": "**Q3 Planning** notes saved\n\nopen [meeting notes](~/Documents/notes/q3.md) or view [recording](screenpipe://timeline)"}'

# Link to a local file (absolute path or ~ path)
curl -X POST http://localhost:11435/notify \
  -H "Content-Type: application/json" \
  -d '{"title": "Export complete", "body": "saved to [report.csv](~/Downloads/report.csv)"}'

# With action buttons
# Use `type: "link"` for external URLs and `type: "deeplink"` for
# screenpipe:// in-app routes. `type: "dismiss"` closes the notification.
curl -X POST http://localhost:11435/notify \
  -H "Content-Type: application/json" \
  -d '{"title": "Meeting summary", "body": "**Q3 Planning**\n- Budget approved", "actions": [{"id": "view", "label": "view", "type": "deeplink", "url": "screenpipe://timeline"}, {"id": "skip", "label": "skip", "type": "dismiss"}]}'

# External URL action (opens in browser)
curl -X POST http://localhost:11435/notify \
  -H "Content-Type: application/json" \
  -d '{"title": "PR ready for review", "body": "nice work", "actions": [{"id": "open", "label": "open pr", "type": "link", "url": "https://github.com/screenpipe/screenpipe/pull/1234"}]}'

# Ask permission, then run a pipe on approval — the opt-in / agent-gated flow.
# `type: "pipe"` runs the TARGET pipe when clicked (POST /pipes/<pipe>/run); the
# `context` is injected into that pipe's prompt as the notification action
# context. Set `pipe` EXPLICITLY — if omitted it falls back to the sending pipe,
# which usually does nothing. Add `"open_in_chat": true` to run it in the chat UI
# so the user sees the output live instead of in the background.
# Actions persist into the notification bell, so the user can still approve from
# the bell after the ~20s toast fades.
curl -X POST http://localhost:11435/notify \
  -H "Content-Type: application/json" \
  -d '{"title": "share meeting notes with the team?", "body": "approve to send the adriaan call notes", "actions": [{"id": "approve", "label": "approve", "type": "pipe", "primary": true, "pipe": "share-data", "context": {"meeting_id": 274}}, {"id": "decline", "label": "decline", "type": "dismiss"}]}'

# Run an inline prompt in a fresh chat session on click (`type: "chat"`).
# No pre-installed pipe needed — write the whole task in `prompt`, attach data
# in `context`. Add `"auto_send": false` to pre-fill chat for the user to
# review/edit before sending. This is the lightweight counterpart to a `pipe`
# action for one-off "approve → do this specific thing" flows.
curl -X POST http://localhost:11435/notify \
  -H "Content-Type: application/json" \
  -d '{"title": "summarize this call into a CRM note?", "body": "approve to draft it", "actions": [{"id": "go", "label": "draft it", "type": "chat", "primary": true, "prompt": "summarize meeting 274 into a short CRM follow-up note and save it to output/", "context": {"meeting_id": 274}}, {"id": "no", "label": "no", "type": "dismiss"}]}'

# Call a local API endpoint on click (`type: "api"`)
curl -X POST http://localhost:11435/notify \
  -H "Content-Type: application/json" \
  -d '{"title": "stop recording?", "body": "tap to stop", "actions": [{"id": "stop", "label": "stop", "type": "api", "url": "/recording/stop", "method": "POST"}]}'

# Custom auto-dismiss (5 seconds)
curl -X POST http://localhost:11435/notify \
  -H "Content-Type: application/json" \
  -d '{"title": "Saved", "body": "Note saved", "timeout": 5000}'

| Field | Type | Required | Description | |-------|------|----------|-------------| | title | string | Yes | Notification title | | body | string | Yes | Markdown body (**bold**, `code`, [text](url)) | | type | string | No | Category (default "pipe") | | timeout | integer | No | Auto-dismiss in ms (default 20000) | | autoDismissMs | integer | No | Alias for timeout | | actions | array | No | Action buttons (up to 5; each needs id, label, type) |

Action button types:

  • link — open a web URL in the browser (url)
  • deeplink — navigate within screenpipe (url = screenpipe://...)
  • pipe — run an installed pipe on click (pipe = target pipe name, optional context injected into its prompt, optional open_in_chat). The opt-in / agent-gated-sharing primitive.
  • chat — run an inline prompt in a fresh chat session (no installed pipe needed; optional context as background data, optional auto_send default true). Lightweight counterpart to pipe for one-off approve-and-do flows.
  • api — POST a local endpoint (url, optional method, optional body)
  • dismiss — close the notification, no side effect
  • primary: true renders the button filled (the recommended action). Actions persist into the notification bell, so a missed toast can still be acted on.

Supported link types in body markdown:

  • Web URLs: [docs](https://docs.screenpi.pe) — opens in browser
  • File paths: [notes](~/notes/file.md) or [log](/var/log/app.log) — opens in default app
  • Deeplinks: [timeline](screenpipe://timeline) — navigates within screenpipe

Returns {"success": true, "message": "Notification sent successfully"}.


13. Other Endpoints

curl http://localhost:3030/health              # Health check
curl http://localhost:3030/audio/list           # Audio devices
curl http://localhost:3030/vision/list          # Monitors

Deep Links

Reference specific moments with clickable links:

[10:30 AM — Chrome](screenpipe://frame/12345)           # screen text results (use frame_id)
[meeting at 3pm](screenpipe://timeline?timestamp=ISO8601) # Audio results (use timestamp)

Only use IDs/timestamps from actual search results. Never fabricate.

Showing Videos

Show file_path from search results as inline code for playable video:

`/Users/name/.screenpipe/data/monitor_1_2024-01-15_10-30-00.mp4`