GX10 Offload Skill | Agent Skills

GX10 Offload

Offload work to the local NVIDIA DGX Spark cluster node running Ollama with Devstral models on GB10 Blackwell GPU (128GB unified memory).

When to Use

Long code generation tasks that benefit from a dedicated local model
Batch processing of multiple prompts
Draft generation for review (speculative decoding pattern)
Tasks where latency to cloud APIs is a bottleneck
Privacy-sensitive inference that must stay on-premises

Connection

| Property | Value | |----------|-------| | Host (WiFi) | 10.0.0.234 / gx10-94e2.local (mDNS) | | Host (Tailscale) | 100.67.53.87 (gx10-acee, different unit) | | User | a | | Password | aaaaaa | | Ollama API | http://localhost:11434 on the device | | SSH tunnel | ssh -L 11434:localhost:11434 a@10.0.0.234 |

Note: The WiFi-connected unit is gx10-94e2 (discovered via mDNS). The Tailscale-reachable unit is gx10-acee (different physical Spark).

Available Models

| Model | Size | Use Case | |-------|------|----------| | devstral | 14GB | Fast coding tasks, lightweight generation | | devstral-2:123b | 74GB | Heavy reasoning, complex code generation | | devstral2-4k | 74GB | Same as above, 4k context window |

Quick Usage

Single prompt via SSH

sshpass -p 'aaaaaa' ssh -o PreferredAuthentications=password -o PubkeyAuthentication=no \
  a@100.67.53.87 \
  "curl -s http://localhost:11434/api/generate -d '{\"model\":\"devstral\",\"prompt\":\"YOUR_PROMPT\",\"stream\":false}'"

Via SSH tunnel (persistent)

# Open tunnel in background
sshpass -p 'aaaaaa' ssh -o PreferredAuthentications=password -o PubkeyAuthentication=no \
  -fNL 11434:localhost:11434 a@100.67.53.87

# Then use locally as if Ollama were running here
curl http://localhost:11434/api/generate \
  -d '{"model":"devstral","prompt":"Hello","stream":false}'

OpenAI-compatible API

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "devstral",
    "messages": [{"role": "user", "content": "Write a Python function to sort a list"}]
  }'

Using the offload script

# Simple prompt
~/.claude/skills/gx10-offload/scripts/offload.sh "Write a Rust function for binary search"

# With specific model
~/.claude/skills/gx10-offload/scripts/offload.sh "Explain monads" devstral-2:123b

# Batch mode (one prompt per line)
~/.claude/skills/gx10-offload/scripts/offload.sh --batch prompts.txt

Offload Patterns

1. Draft-and-Review

Offload draft generation to GX10, then review/refine with Claude:

# GX10 generates draft
DRAFT=$(~/.claude/skills/gx10-offload/scripts/offload.sh "Implement a Redis cache wrapper in Python with TTL support")
# Claude reviews and improves the draft

2. Batch Code Generation

Generate multiple implementations in parallel on GX10:

for task in "sort" "search" "hash" "tree"; do
  ~/.claude/skills/gx10-offload/scripts/offload.sh "Implement $task in Rust" &
done
wait

3. Test Generation

Offload test writing to the local model:

~/.claude/skills/gx10-offload/scripts/offload.sh "Write pytest tests for: $(cat src/main.py)"

Device Status Check

~/.claude/skills/gx10-offload/scripts/offload.sh --status

Ensure Ollama is Running

sshpass -p 'aaaaaa' ssh -o PreferredAuthentications=password -o PubkeyAuthentication=no \
  a@100.67.53.87 'pgrep ollama || nohup ollama serve > /tmp/ollama.log 2>&1 &'

Hardware

GPU: NVIDIA GB10 Blackwell (DGX Spark)
Memory: 128GB unified (Grace-Blackwell architecture)
CPU: 20-core Grace ARM64
OS: Ubuntu 24.04 aarch64, kernel 6.14-nvidia
PyTorch: 2.10.0 with CUDA
Disk: 510GB free

GF(3) Assignment

| Trit | Role | Description | |------|------|-------------| | +1 | PLUS | Generator - produces code/text offloaded from Claude |

Conservation triad: gx10-offload (+1) + tailscale (0) + skill-creator (-1) = 0

Agent Skills: GX10 Offload

Install this agent skill to your local

Skill Files