GX10 Offload
Offload work to the local NVIDIA DGX Spark cluster node running Ollama with Devstral models on GB10 Blackwell GPU (128GB unified memory).
When to Use
- Long code generation tasks that benefit from a dedicated local model
- Batch processing of multiple prompts
- Draft generation for review (speculative decoding pattern)
- Tasks where latency to cloud APIs is a bottleneck
- Privacy-sensitive inference that must stay on-premises
Connection
| Property | Value |
|----------|-------|
| Host (WiFi) | 10.0.0.234 / gx10-94e2.local (mDNS) |
| Host (Tailscale) | 100.67.53.87 (gx10-acee, different unit) |
| User | a |
| Password | aaaaaa |
| Ollama API | http://localhost:11434 on the device |
| SSH tunnel | ssh -L 11434:localhost:11434 a@10.0.0.234 |
Note: The WiFi-connected unit is gx10-94e2 (discovered via mDNS). The Tailscale-reachable unit is gx10-acee (different physical Spark).
Available Models
| Model | Size | Use Case |
|-------|------|----------|
| devstral | 14GB | Fast coding tasks, lightweight generation |
| devstral-2:123b | 74GB | Heavy reasoning, complex code generation |
| devstral2-4k | 74GB | Same as above, 4k context window |
Quick Usage
Single prompt via SSH
sshpass -p 'aaaaaa' ssh -o PreferredAuthentications=password -o PubkeyAuthentication=no \
a@100.67.53.87 \
"curl -s http://localhost:11434/api/generate -d '{\"model\":\"devstral\",\"prompt\":\"YOUR_PROMPT\",\"stream\":false}'"
Via SSH tunnel (persistent)
# Open tunnel in background
sshpass -p 'aaaaaa' ssh -o PreferredAuthentications=password -o PubkeyAuthentication=no \
-fNL 11434:localhost:11434 a@100.67.53.87
# Then use locally as if Ollama were running here
curl http://localhost:11434/api/generate \
-d '{"model":"devstral","prompt":"Hello","stream":false}'
OpenAI-compatible API
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "devstral",
"messages": [{"role": "user", "content": "Write a Python function to sort a list"}]
}'
Using the offload script
# Simple prompt
~/.claude/skills/gx10-offload/scripts/offload.sh "Write a Rust function for binary search"
# With specific model
~/.claude/skills/gx10-offload/scripts/offload.sh "Explain monads" devstral-2:123b
# Batch mode (one prompt per line)
~/.claude/skills/gx10-offload/scripts/offload.sh --batch prompts.txt
Offload Patterns
1. Draft-and-Review
Offload draft generation to GX10, then review/refine with Claude:
# GX10 generates draft
DRAFT=$(~/.claude/skills/gx10-offload/scripts/offload.sh "Implement a Redis cache wrapper in Python with TTL support")
# Claude reviews and improves the draft
2. Batch Code Generation
Generate multiple implementations in parallel on GX10:
for task in "sort" "search" "hash" "tree"; do
~/.claude/skills/gx10-offload/scripts/offload.sh "Implement $task in Rust" &
done
wait
3. Test Generation
Offload test writing to the local model:
~/.claude/skills/gx10-offload/scripts/offload.sh "Write pytest tests for: $(cat src/main.py)"
Device Status Check
~/.claude/skills/gx10-offload/scripts/offload.sh --status
Ensure Ollama is Running
sshpass -p 'aaaaaa' ssh -o PreferredAuthentications=password -o PubkeyAuthentication=no \
a@100.67.53.87 'pgrep ollama || nohup ollama serve > /tmp/ollama.log 2>&1 &'
Hardware
- GPU: NVIDIA GB10 Blackwell (DGX Spark)
- Memory: 128GB unified (Grace-Blackwell architecture)
- CPU: 20-core Grace ARM64
- OS: Ubuntu 24.04 aarch64, kernel 6.14-nvidia
- PyTorch: 2.10.0 with CUDA
- Disk: 510GB free
GF(3) Assignment
| Trit | Role | Description | |------|------|-------------| | +1 | PLUS | Generator - produces code/text offloaded from Claude |
Conservation triad: gx10-offload (+1) + tailscale (0) + skill-creator (-1) = 0