Agent Skills: GrepAI Embeddings with Ollama

Configure Ollama as embedding provider for GrepAI. Use this skill for local, private embedding generation.

UncategorizedID: yoanbernabeu/grepai-skills/grepai-embeddings-ollama

Install this agent skill to your local

pnpm dlx add-skill https://github.com/yoanbernabeu/grepai-skills/tree/HEAD/skills/embeddings/grepai-embeddings-ollama

Skill Files

Browse the full folder contents for grepai-embeddings-ollama.

Download Skill

Loading file tree…

skills/embeddings/grepai-embeddings-ollama/SKILL.md

Skill Metadata

Name
grepai-embeddings-ollama
Description
Configure Ollama as embedding provider for GrepAI. Use this skill for local, private embedding generation.

GrepAI Embeddings with Ollama

This skill covers using Ollama as the embedding provider for GrepAI, enabling 100% private, local code search.

When to Use This Skill

  • Setting up private, local embeddings
  • Choosing the right Ollama model
  • Optimizing Ollama performance
  • Troubleshooting Ollama connection issues

Why Ollama?

| Advantage | Description | |-----------|-------------| | πŸ”’ Privacy | Code never leaves your machine | | πŸ’° Free | No API costs or usage limits | | ⚑ Speed | No network latency | | πŸ”Œ Offline | Works without internet | | πŸ”§ Control | Choose your model |

Prerequisites

  1. Ollama installed and running
  2. An embedding model downloaded
# Install Ollama
brew install ollama  # macOS
# or
curl -fsSL https://ollama.com/install.sh | sh  # Linux

# Start Ollama
ollama serve

# Download model
ollama pull nomic-embed-text

Configuration

Basic Configuration

# .grepai/config.yaml
embedder:
  provider: ollama
  model: nomic-embed-text
  endpoint: http://localhost:11434

With Custom Endpoint

embedder:
  provider: ollama
  model: nomic-embed-text
  endpoint: http://192.168.1.100:11434  # Remote Ollama server

With Explicit Dimensions

embedder:
  provider: ollama
  model: nomic-embed-text
  endpoint: http://localhost:11434
  dimensions: 768  # Usually auto-detected

Available Models

Recommended: nomic-embed-text

ollama pull nomic-embed-text

| Property | Value | |----------|-------| | Dimensions | 768 | | Size | ~274 MB | | Speed | Fast | | Quality | Excellent for code | | Language | English-optimized |

Configuration:

embedder:
  provider: ollama
  model: nomic-embed-text

Multilingual: nomic-embed-text-v2-moe

ollama pull nomic-embed-text-v2-moe

| Property | Value | |----------|-------| | Dimensions | 768 | | Size | ~500 MB | | Speed | Medium | | Quality | Excellent | | Language | Multilingual |

Best for codebases with non-English comments/documentation.

Configuration:

embedder:
  provider: ollama
  model: nomic-embed-text-v2-moe

High Quality: bge-m3

ollama pull bge-m3

| Property | Value | |----------|-------| | Dimensions | 1024 | | Size | ~1.2 GB | | Speed | Slower | | Quality | Very high | | Language | Multilingual |

Best for large, complex codebases where accuracy is critical.

Configuration:

embedder:
  provider: ollama
  model: bge-m3
  dimensions: 1024

Maximum Quality: mxbai-embed-large

ollama pull mxbai-embed-large

| Property | Value | |----------|-------| | Dimensions | 1024 | | Size | ~670 MB | | Speed | Medium | | Quality | Highest | | Language | English |

Configuration:

embedder:
  provider: ollama
  model: mxbai-embed-large
  dimensions: 1024

Model Comparison

| Model | Dims | Size | Speed | Quality | Use Case | |-------|------|------|-------|---------|----------| | nomic-embed-text | 768 | 274MB | ⚑⚑⚑ | ⭐⭐⭐ | General use | | nomic-embed-text-v2-moe | 768 | 500MB | ⚑⚑ | ⭐⭐⭐⭐ | Multilingual | | bge-m3 | 1024 | 1.2GB | ⚑ | ⭐⭐⭐⭐⭐ | Large codebases | | mxbai-embed-large | 1024 | 670MB | ⚑⚑ | ⭐⭐⭐⭐⭐ | Maximum accuracy |

Performance Optimization

Memory Management

Models load into RAM. Ensure sufficient memory:

| Model | RAM Required | |-------|--------------| | nomic-embed-text | ~500 MB | | nomic-embed-text-v2-moe | ~800 MB | | bge-m3 | ~1.5 GB | | mxbai-embed-large | ~1 GB |

GPU Acceleration

Ollama automatically uses:

  • macOS: Metal (Apple Silicon)
  • Linux/Windows: CUDA (NVIDIA GPUs)

Check GPU usage:

ollama ps

Keeping Model Loaded

By default, Ollama unloads models after 5 minutes of inactivity. Keep loaded:

# Keep model loaded indefinitely
curl http://localhost:11434/api/generate -d '{
  "model": "nomic-embed-text",
  "keep_alive": -1
}'

Verifying Connection

Check Ollama is Running

curl http://localhost:11434/api/tags

List Available Models

ollama list

Test Embedding

curl http://localhost:11434/api/embeddings -d '{
  "model": "nomic-embed-text",
  "prompt": "function authenticate(user, password)"
}'

Running Ollama as a Service

macOS (launchd)

Ollama app runs automatically on login.

Linux (systemd)

# Enable service
sudo systemctl enable ollama

# Start service
sudo systemctl start ollama

# Check status
sudo systemctl status ollama

Manual Background

nohup ollama serve > /dev/null 2>&1 &

Remote Ollama Server

Run Ollama on a powerful server and connect remotely:

On the Server

# Allow remote connections
OLLAMA_HOST=0.0.0.0 ollama serve

On the Client

# .grepai/config.yaml
embedder:
  provider: ollama
  model: nomic-embed-text
  endpoint: http://server-ip:11434

Common Issues

❌ Problem: Connection refused βœ… Solution:

# Start Ollama
ollama serve

❌ Problem: Model not found βœ… Solution:

# Pull the model
ollama pull nomic-embed-text

❌ Problem: Slow embedding generation βœ… Solutions:

  • Use a smaller model (nomic-embed-text)
  • Ensure GPU is being used (ollama ps)
  • Close memory-intensive applications
  • Consider a remote server with better hardware

❌ Problem: Out of memory βœ… Solutions:

  • Use a smaller model
  • Close other applications
  • Upgrade RAM
  • Use remote Ollama server

❌ Problem: Embeddings differ after model update βœ… Solution: Re-index after model updates:

rm .grepai/index.gob
grepai watch

Best Practices

  1. Start with nomic-embed-text: Best balance of speed/quality
  2. Keep Ollama running: Background service recommended
  3. Match dimensions: Don't mix models with different dimensions
  4. Re-index on model change: Delete index and re-run watch
  5. Monitor memory: Embedding models use significant RAM

Output Format

Successful Ollama configuration:

βœ… Ollama Embedding Provider Configured

   Provider: Ollama
   Model: nomic-embed-text
   Endpoint: http://localhost:11434
   Dimensions: 768 (auto-detected)
   Status: Connected

   Model Info:
   - Size: 274 MB
   - Loaded: Yes
   - GPU: Apple Metal