Agent Skills: GX10 Offload

Offload inference, code generation, and batch processing to local GX10 DGX Spark (GB10 Blackwell) running Ollama

UncategorizedID: plurigrid/asi/gx10-offload

Install this agent skill to your local

pnpm dlx add-skill https://github.com/plurigrid/asi/tree/HEAD/skills/gx10-offload

Skill Files

Browse the full folder contents for gx10-offload.

Download Skill

Loading file tree…

skills/gx10-offload/SKILL.md

Skill Metadata

Name
gx10-offload
Description
Offload inference, code generation, and batch processing to local GX10 DGX Spark (GB10 Blackwell) running Ollama

GX10 Offload

Offload work to the local NVIDIA DGX Spark cluster node running Ollama with Devstral models on GB10 Blackwell GPU (128GB unified memory).

When to Use

  • Long code generation tasks that benefit from a dedicated local model
  • Batch processing of multiple prompts
  • Draft generation for review (speculative decoding pattern)
  • Tasks where latency to cloud APIs is a bottleneck
  • Privacy-sensitive inference that must stay on-premises

Connection

| Property | Value | |----------|-------| | Host (WiFi) | 10.0.0.234 / gx10-94e2.local (mDNS) | | Host (Tailscale) | 100.67.53.87 (gx10-acee, different unit) | | User | a | | Password | aaaaaa | | Ollama API | http://localhost:11434 on the device | | SSH tunnel | ssh -L 11434:localhost:11434 a@10.0.0.234 |

Note: The WiFi-connected unit is gx10-94e2 (discovered via mDNS). The Tailscale-reachable unit is gx10-acee (different physical Spark).

Available Models

| Model | Size | Use Case | |-------|------|----------| | devstral | 14GB | Fast coding tasks, lightweight generation | | devstral-2:123b | 74GB | Heavy reasoning, complex code generation | | devstral2-4k | 74GB | Same as above, 4k context window |

Quick Usage

Single prompt via SSH

sshpass -p 'aaaaaa' ssh -o PreferredAuthentications=password -o PubkeyAuthentication=no \
  a@100.67.53.87 \
  "curl -s http://localhost:11434/api/generate -d '{\"model\":\"devstral\",\"prompt\":\"YOUR_PROMPT\",\"stream\":false}'"

Via SSH tunnel (persistent)

# Open tunnel in background
sshpass -p 'aaaaaa' ssh -o PreferredAuthentications=password -o PubkeyAuthentication=no \
  -fNL 11434:localhost:11434 a@100.67.53.87

# Then use locally as if Ollama were running here
curl http://localhost:11434/api/generate \
  -d '{"model":"devstral","prompt":"Hello","stream":false}'

OpenAI-compatible API

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "devstral",
    "messages": [{"role": "user", "content": "Write a Python function to sort a list"}]
  }'

Using the offload script

# Simple prompt
~/.claude/skills/gx10-offload/scripts/offload.sh "Write a Rust function for binary search"

# With specific model
~/.claude/skills/gx10-offload/scripts/offload.sh "Explain monads" devstral-2:123b

# Batch mode (one prompt per line)
~/.claude/skills/gx10-offload/scripts/offload.sh --batch prompts.txt

Offload Patterns

1. Draft-and-Review

Offload draft generation to GX10, then review/refine with Claude:

# GX10 generates draft
DRAFT=$(~/.claude/skills/gx10-offload/scripts/offload.sh "Implement a Redis cache wrapper in Python with TTL support")
# Claude reviews and improves the draft

2. Batch Code Generation

Generate multiple implementations in parallel on GX10:

for task in "sort" "search" "hash" "tree"; do
  ~/.claude/skills/gx10-offload/scripts/offload.sh "Implement $task in Rust" &
done
wait

3. Test Generation

Offload test writing to the local model:

~/.claude/skills/gx10-offload/scripts/offload.sh "Write pytest tests for: $(cat src/main.py)"

Device Status Check

~/.claude/skills/gx10-offload/scripts/offload.sh --status

Ensure Ollama is Running

sshpass -p 'aaaaaa' ssh -o PreferredAuthentications=password -o PubkeyAuthentication=no \
  a@100.67.53.87 'pgrep ollama || nohup ollama serve > /tmp/ollama.log 2>&1 &'

Hardware

  • GPU: NVIDIA GB10 Blackwell (DGX Spark)
  • Memory: 128GB unified (Grace-Blackwell architecture)
  • CPU: 20-core Grace ARM64
  • OS: Ubuntu 24.04 aarch64, kernel 6.14-nvidia
  • PyTorch: 2.10.0 with CUDA
  • Disk: 510GB free

GF(3) Assignment

| Trit | Role | Description | |------|------|-------------| | +1 | PLUS | Generator - produces code/text offloaded from Claude |

Conservation triad: gx10-offload (+1) + tailscale (0) + skill-creator (-1) = 0