polli — Pollinations CLI Skill

polli — Pollinations CLI

Thin wrapper around gen.pollinations.ai. Generates images, text, audio, video; transcribes speech; manages API keys, usage, quests, and invite-only my-models.

Install: npm i -g @pollinations/cli@latest (provides the polli binary).

When to use this skill

User asks to generate an image / text / audio / video via pollinations
User mentions polli, pollinations, pollen, pollinations.ai
User wants to transcribe speech or run TTS
User asks about their pollen balance, usage, or API keys
User wants to browse or filter available models
User wants to inspect quests or manage invite-only my-models

Quick reference

| Intent | Command | |---|---| | Log in once | polli auth login | | Store an existing key | printf '%s' "$POLLINATIONS_API_KEY" \| polli auth login --with-token | | Generate image | polli gen image "<prompt>" --output out.png | | Generate text | polli gen text "<prompt>" | | Text with stdin as context | echo "<ctx>" \| polli gen text "<question>" | | Describe an image (vision) | polli gen text "what is this?" --image <url> | | One-shot TTS | polli gen audio "<text>" --output speech.mp3 | | Speak out loud | polli gen audio "<text>" --play (uses afplay on macOS; ffplay/mpv/mpg123 on Linux) | | Generate video | polli gen video "<prompt>" --output out.mp4 | | Transcribe audio | polli gen transcribe path/to.mp3 | | Upload a local file | polli upload path/to.png (prints public URL) | | List all models | polli models | | Filter models by type | polli models --type image | | Model health + latency | polli models --stats (default 60m, --window <min>) | | Check balance | polli usage | | List your quests + claim state | polli quests (filters: --open --claimable --claimed --coming-soon) | | Manage invite-only community models | polli my-models list | | Machine-readable output | append --json to any command |

Setup

One-time: polli auth login (device-flow; creates a key with profile, usage, and keys). To store an existing key, run printf '%s' "$POLLINATIONS_API_KEY" | polli auth login --with-token. Verify with polli auth status. Override the stored key for a single command with --key <key>.

Recipes

Generate an image to a file

polli gen image "a fox reading a book, studio ghibli style" --output fox.png

Defaults: zimage, 1024x1024. Pick a different model with --model flux (see polli models --type image). For edits / img2img, pass one or more --image <url> flags — must be public http(s) URLs, local paths are rejected client-side. Only models that list "image" in input_modalities actually consume the flag — flux and zimage are text-only and will silently ignore --image. Find i2i-capable models with polli models --type image --json | jq -r '.[] | select(.input_modalities | contains(["image"])) | .name' (common choices: nanobanana, kontext, p-image-edit). To use a local file, upload it first with polli upload (see next recipe).

Upload a local file to get a public URL

URL=$(polli upload cat.png)
polli gen image "make the cat purple" --image "$URL" --output purple.png

polli upload <file> posts a multipart upload to media.pollinations.ai (100MB max; 30-day lifecycle, refreshed by GETs once the object is at least 15 days old). Each upload receives a unique id. Human mode: URL on stdout and id/size/contentType on stderr. --json: full upload response on stdout. The returned URL is public (no auth to fetch) and works anywhere --image is accepted — gen image, gen video, etc.

Generate text

polli gen text "summarize the three laws of robotics"

Save to file: --output summary.txt. Use --system "<msg>" to set system prompt. For reasoning models, pass --reasoning low|medium|high to control reasoning effort. Only send --reasoning to models where reasoning: true in polli models --type text --json — the flag is not validated client-side, and non-reasoning models may silently accept it (openai), ignore it, or return a 400 (mistral).

Describe an image with a vision model

URL=$(polli upload selfie.jpg)
polli gen text "turn this person into a cartoon pet in one playful sentence" --image "$URL"

gen text --image <url...> attaches one or more public https URLs as an OpenAI-style multimodal message — repeatable for multi-image prompts. Local paths aren't supported; run them through polli upload first (see the upload recipe above). Only text models with "image" in input_modalities actually read the image — filter with polli models --type text --json | jq -r '.[] | select(.input_modalities | index("image")) | .name'. Non-vision models silently ignore the attachment. Good defaults: openai, gemini, claude.

Pipe stdin as context into text generation

cat README.md | polli gen text "what does this project do?"

stdin becomes context; the positional argument is the question.

Interactive chat session

polli gen chat --model openai --system "you are a terse assistant"

Slash commands inside the session: /exit, /clear, /save <path>.

Text-to-speech

polli gen audio "hello world" --voice nova --output hello.mp3
echo "long script" | polli gen audio --voice nova --output out.mp3

Default voice is sage. To discover the full live voice list, use the model registry: polli models --type audio --json | jq -r '.[].voices[]?' — each audio model entry includes its voices[] array. Format defaults to mp3; --format opus|aac|flac|wav to change. Accepts stdin (same as gen text). Add --play to save and then play the audio back (handy for narration/demos). Playback starts after the file is fully written, and the command blocks until playback finishes — if you want fire-and-forget, wrap in a subshell: ( polli gen audio "..." --play & ). Player on macOS: afplay; on Linux it tries ffplay, then mpv, then mpg123 in that order.

Generate music (elevenmusic)

polli gen audio "lofi hip-hop beat" --model elevenmusic --duration 30 --instrumental --output track.mp3

Generate sound effects (eleven-sfx)

polli gen audio "thunderclap with heavy rain" --model eleven-sfx --duration 5 --output thunder.mp3

Text-to-sound-effect via ElevenLabs (aliases: sfx, sound-effects). --duration 0.5–30s (omit to let the model pick). Billed per second of output (~$0.004/s, capped at $0.12 for a full 30s).

Multilingual TTS (eleven-multilingual-v2)

polli gen audio "Bonjour, ceci est un test" --model multilingual-v2 --voice rachel --output fr.mp3

Stable, lifelike TTS across 29 languages (aliases: multilingual-v2, eleven-v2) — a non-alpha alternative to the default v3.

Generate video

polli gen video "a spacecraft landing on mars" --model wan-fast --duration 5 --output mars.mp4

Cheapest path: --model wan-fast at ~$0.01/sec, fixed 5-second output (any --duration value is ignored — you always pay for and receive 5 sec). For image-to-video, pass --image <url> with a public HTTPS URL (local file paths and 404/rate-limited hosts will fail with a server error).

Flag support varies per model and is not enforced client-side. --duration, --aspect-ratio, and --audio are forwarded to the server but may be silently ignored — verified on wan-fast where duration is locked to 5s, --aspect-ratio 9:16 still returns 16:9, and --audio produces no audio track. Always inspect the output (file, ffprobe) before trusting a flag worked. Check polli models --type video --json for per-model capabilities.

Video is not tracked by --stats. polli models --type video --stats returns empty — the stats pipe only records text/image/audio events. To compare video models, fall back to polli models --type video --json and look at price/description fields.

Transcribe audio to text

polli gen transcribe recording.mp3 --language en

Models: whisper (default), scribe, universal-2, universal-3-pro. Accepts common audio formats (mp3, wav, m4a, flac, ogg); non-audio input (e.g. a .txt file) returns a clear 400 invalid_request_error: extension "txt" not supported — no need to pre-validate with file. Default output is the plain transcript on stdout as a single line (pipe-friendly). Use --json for structured output: whisper and AssemblyAI return timing data when requested through the API; scribe returns only {text: "..."} — use whisper or AssemblyAI if you need timing data. --language <ISO-639-1> (e.g. en, fr) is an optional hint that can improve accuracy for non-English or accented speech.

Discover models

polli models --type text              # text models only
polli models --type image --verbose   # with context length / pricing
polli models --stats                  # health + avg latency + err% (60m default)
polli models --stats --window 5       # last 5 minutes only

Use --stats before choosing a model. Caveat: the err% column counts 5xx only — a model can show 0.0% while having massive 4xx rates (auth, validation, etc.). For the full picture use --stats --json and read errors_4xx, errors_5xx, latency_p95_ms.

Pricing fields are per-token, not per-request. completionImageTokens: 0.000008 means each output image-token costs that much — a single 1024x1024 image from gptimage lands at ~$0.008, not $0.000008. Flat-priced image models (flux, zimage) expose completionImageTokens as the whole-image price because they emit exactly one "token" per image. When in doubt, make one call and read the true cost from polli usage --history --limit 5 --json.

Check usage and balance

polli usage              # current pollen balance
polli usage --history    # recent individual requests
polli usage --daily      # daily cost summary
polli quests             # your quests + claim state (open/claimable/claimed/coming)
polli quests --claimable # only rewards ready to claim

History is eventually consistent — a request you just made may not appear for 30–60s. When matching costs to freshly-generated media, use --limit 50 and filter by timestamp, and retry if the expected entry is missing. polli usage --json returns {"pollen": <number>} — the current balance only; use --history --json or --daily --json for cost breakdowns.

Manage my-models

polli my-models list
polli my-models models --base-url https://api.example.com/v1 --bearer-token "$UPSTREAM_KEY"
polli my-models create --name my-model --base-url https://api.example.com/v1 --bearer-token "$UPSTREAM_KEY" --upstream-model gpt-4.1-mini
polli my-models update <id> --description "Updated description"
polli my-models delete <id>

my-models manages owned community text models for invite-only accounts. It requires communityEndpointsAllowed: true plus a key with account:keys, or an authenticated dashboard session through the API. Use account:usage for narrow read-only usage and polli quests; use both permissions when a client needs both read-only account state and admin operations. Quest claiming is dashboard-only; polli quests is read-only and account-aware.

Manage API keys

polli keys list                                                    # list all keys on the account
polli keys info                                                    # details about the CURRENTLY AUTHENTICATED key only (takes no id)
polli keys create --name "my-bot" --type secret --budget 1000 --permissions profile usage   # scoped key
polli keys create --name "my-app" --type publishable --redirect-uri https://app.example/callback --earnings
polli keys revoke <id>                                             # id comes from `keys list --json`

--permissions <perms...> scopes what the new key can do on the account (e.g. profile usage lets it call polli --key <new> usage). Without --permissions, new scoped keys can generate media but cannot read account state — polli --key <new> usage will 403. "keys" is auto-stripped from the list so a scoped key can never mint further keys. Existing keys with account:keys can manage my-models where that invite-only feature is enabled, but still need account:usage for read-only account state. Publishable app keys default developer earnings off; pass --earnings to enable them. To inspect a specific key other than the current one, use polli keys list --json | jq '.[] | select(.id == "<id>")'. keys info is intentionally scoped to the caller's own key.

Read API docs

polli docs                          # full llm.txt reference
polli docs /v1/chat/completions     # filter to one endpoint
polli docs --open                   # open in browser

Output contract

Default (human mode): varies by command — most emit key: value pairs or tab-separated tables with a header row. Exceptions: gen text prints the full response to stdout; gen transcribe prints the transcript as plain text; gen chat runs an interactive REPL. Status/progress messages go to stderr, so pipes stay clean.
--json: every command emits machine-parseable JSON to stdout; all human messages go to stderr. Always prefer --json when piping into jq or parsing — it's the only shape with a stable contract.
Exit codes: 0 on success, non-zero on auth failure, rate limit, network error, or invalid args. Error messages go to stderr.

Agent operating rules

Run polli auth status first if you don't know whether the user is logged in. Fail fast with a clear "run polli auth login" message if not.
Prefer --json whenever you'll parse the output. Never grep human-formatted tables.
Don't hardcode model IDs. Fetch the live list with polli models --type <type>. Model availability changes.
Before picking a model for production use, check polli models --stats. Rule of thumb for "healthy": err% ≤ 5, avg latency in a reasonable range for the modality (standard text <5s, image <10s, video <60s), and requests high enough to be statistically meaningful (ignore rows with <10 requests — noise). Filter by capability first, then optimize by health — e.g. for a reasoning task, narrow to models where reasoning: true (via polli models --type text --json), then cross-reference against --stats output. The healthiest model overall may not support the capability you need. Reasoning models are inherently slower — expect 5–50s, not <5s; when picking among them, prioritize low err% and request count over raw latency, and compare latency only within the reasoning-capable subset.
Always pass --output <path> for gen image, gen audio, gen video — otherwise the file lands in the current directory with a default name.
For stdin-as-context on gen text, pipe the context and pass the question as the positional argument: cat file | polli gen text "question about the file".
For exact flag lists, run polli <cmd> --help or polli gen <cmd> --help. This skill's recipes cover the common path; the CLI's own help is always the source of truth.
Use polli docs [endpoint] over guessing API shapes. It prints the canonical llm.txt reference from the live API.

Common pitfalls

Forgetting --output on binary generators (image/audio/video) — the file goes to a default path, which may not be what the user wants.
Using polli gen text --json expecting OpenAI chat-completions shape — the CLI's --json wraps its own structure. Use polli docs /v1/chat/completions to see the raw API shape if you need it.
Running commands without auth — polli auth status tells you who you're logged in as and your balance in one call.
gen text streams to a TTY, buffers when piped. The default now auto-detects — a human at the terminal sees tokens tick in, a pipe/redirect gets the full response once. Force either mode with --stream or --no-stream. For scripts and chains like polli gen text … | polli gen audio …, you don't need to do anything; buffering happens automatically.
Translating a polli workflow into a browser app. gen.pollinations.ai requires a bearer token for generation requests, so a plain client-side fetch with no auth returns 401 unless it is served from cache. Mint a scoped key with polli keys create and proxy via your own backend.

Agent Skills: polli — Pollinations CLI

Install this agent skill to your local

Skill Files