Ollama Skill Skill | Agent Skills

Ollama Skill

Practical guidance for integrating with Ollama's REST API and building agents that dynamically detect and use model capabilities.

When to Use

Building an application that calls Ollama locally or remotely.
Detecting which capabilities a model supports (completion, tools, vision, audio, embeddings, thinking).
Implementing chat with streaming, tool calling, structured outputs, or multimodal inputs.

Quick Reference

Base URLs

| Environment | URL | |-------------|-----| | Local API | http://localhost:11434/api | | OpenAI-compatible | http://localhost:11434/v1 |

Core Endpoints

| Endpoint | Method | Purpose | |----------|--------|---------| | /api/version | GET | Runtime version | | /api/tags | GET | Installed models | | /api/show | POST | Model metadata + capabilities | | /api/ps | GET | Running models + memory usage | | /api/chat | POST | Chat completion (stream or sync) | | /api/embed | POST | Text embeddings |

Capability States

Models report capabilities with three confidence levels:

confirmed — listed in /api/show.capabilities or verified end-to-end.
inferred — detected via projector_info (e.g., clip.has_vision_encoder) but not confirmed by the API.
pending — unknown; requires a probe or is unsupported.

Quick Reference

cURL Basics

# Chat completion
curl http://localhost:11434/api/chat -d '{
  "model": "gemma4:e2b",
  "messages": [{"role": "user", "content": "Why is the sky blue?"}]
}'

# Simple generation
curl http://localhost:11434/api/generate -d '{
  "model": "gemma4:e2b",
  "prompt": "Why is the sky blue?"
}'

# List installed models
curl http://localhost:11434/api/tags

# Check running models + GPU usage
curl http://localhost:11434/api/ps
ollama ps

OpenAI SDK (Python / JS)

from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
completion = client.chat.completions.create(
    model="gemma4:e2b",
    messages=[{"role": "user", "content": "Say this is a test"}],
)

Official Ollama SDKs

See references/examples.md for the official ollama Python and JavaScript libraries. They support native features like passing Python functions directly as tools and streaming with for chunk in ....

Common Environment Variables

| Variable | Purpose | Example | |----------|---------|---------| | OLLAMA_HOST | Bind address (default 127.0.0.1:11434) | 0.0.0.0:11434 | | OLLAMA_CONTEXT_LENGTH | Default context window size | 8192 | | OLLAMA_MODELS | Model storage directory | /path/to/models | | OLLAMA_ORIGINS | Allowed CORS origins | chrome-extension://*,moz-extension://* | | HTTPS_PROXY | Proxy for model downloads | https://proxy.example.com |

Authentication

Local: No authentication required for http://localhost:11434.
Cloud models: Requires ollama signin or an API key for https://ollama.com/api.

Reference Files

Load these as needed:

references/api-reference.md — Complete endpoint documentation: request/response schemas, streaming protocol, status codes, OpenAI compatibility.
references/capabilities.md — How to detect model capabilities from /api/show, model_info, and projector_info; status taxonomy.
references/examples.md — Code snippets in cURL, Go, Python, and JavaScript/TypeScript (REST, OpenAI SDK, and official Ollama SDKs).
references/cloud.md — Cloud models, web search API, authentication, and IDE integrations.

Troubleshooting

Model Not Loading on GPU

ollama ps

100% GPU — Fully loaded on GPU.
100% CPU — Fully loaded in system memory.
48%/52% CPU/GPU — Split between both.

Solutions: Verify CUDA/ROCm installation, review available VRAM, try smaller model variants.

Cannot Access Ollama Remotely

Ollama binds to 127.0.0.1 by default. Set OLLAMA_HOST=0.0.0.0:11434 and restart the service.

Proxy Issues

export HTTPS_PROXY=https://proxy.example.com

Then restart Ollama. HTTP proxy is not supported for downloads.

CORS Errors in Browser

export OLLAMA_ORIGINS="chrome-extension://*,moz-extension://*"

Tool Calling Performance

Tool calling and search agents work best with larger context windows. If a model struggles with multi-turn tool use, increase the context:

curl http://localhost:11434/api/chat -d '{
  "model": "qwen3",
  "messages": [...],
  "options": {"num_ctx": 32000}
}'

Resources

Official docs: https://docs.ollama.com
API Reference: https://docs.ollama.com/api
Model Library: https://ollama.com/models
Python SDK: https://github.com/ollama/ollama-python
JavaScript SDK: https://github.com/ollama/ollama-js
GitHub: https://github.com/ollama/ollama

Quick Command Reference

# CLI
ollama signin                 # Sign in to ollama.com
ollama run gemma4:e2b        # Run a model interactively
ollama pull gemma4:e2b       # Download a model
ollama ps                    # List running models
ollama list                  # List installed models

# API status
curl http://localhost:11434/api/version

Tips

Structured Outputs

For reliable JSON schema responses:

Use Pydantic (Python) or Zod (JavaScript) to define and validate schemas.
Set temperature=0 for deterministic output.
Add "return as JSON" to the prompt to help the model understand the request.

Streaming with Tool Calls

Tool calls are supported in streaming mode (Ollama 0.6+). In a stream, tool calls arrive in a chunk where message.tool_calls is set and message.content is empty. Execute the tool, append the result as a tool role message, and call the model again.

Agent Skills: Ollama Skill

Install this agent skill to your local

Skill Files

Ollama Skill

When to Use

Quick Reference

Base URLs

Core Endpoints

Capability States

Quick Reference

cURL Basics

OpenAI SDK (Python / JS)

Official Ollama SDKs

Common Environment Variables

Authentication

Reference Files

Troubleshooting

Model Not Loading on GPU

Cannot Access Ollama Remotely

Proxy Issues

CORS Errors in Browser

Tool Calling Performance

Resources

Quick Command Reference

Tips

Structured Outputs

Streaming with Tool Calls