Agent Skills: Project Engines Benchmark Skill

Run and interpret engine-stack benchmarks (Steel, ember, shale)

UncategorizedID: plurigrid/asi/benchmark

Install this agent skill to your local

pnpm dlx add-skill https://github.com/plurigrid/asi/tree/HEAD/skills/benchmark

Skill Files

Browse the full folder contents for benchmark.

Download Skill

Loading file tree…

skills/benchmark/SKILL.md

Skill Metadata

Name
benchmark
Description
Run and interpret engine-stack benchmarks (Steel, ember, shale)

Project Engines Benchmark Skill

======= description: Run and interpret basin-engines benchmarks (Steel, ember, shale) model: haiku

Basin Engines Benchmark Skill

origin/main

Run benchmarks for Steel, ember, and shale engines.

CRITICAL: Read Before Benchmarking

<<<<<<< HEAD ALWAYS read first: ~/p/benchmark-suite/docs/BENCHMARK_FAIRNESS.md

ALWAYS read first: ~/p/basin-bench/docs/BENCHMARK_FAIRNESS.md

origin/main

This document contains hard-won lessons about benchmark fairness. Ignoring it leads to misleading claims.

Pre-Benchmark Checklist

| Check | Why | How | |-------|-----|-----| <<<<<<< HEAD | Read BENCHMARK_FAIRNESS.md | Contains all fairness lessons | cat ~/p/benchmark-suite/docs/BENCHMARK_FAIRNESS.md |

| Read BENCHMARK_FAIRNESS.md | Contains all fairness lessons | cat ~/p/basin-bench/docs/BENCHMARK_FAIRNESS.md |

origin/main | Use --batched for LMDB/redb | 7-24x improvement with proper config | Add --batched --batch-size 1000 | | Scale sled cache | Undersized cache = 17x slower | Add --cache-mb 2048 for 1M+ records | | Check dataset vs RAM | If data fits in RAM, you're measuring memory | Use larger datasets for I/O testing |

Note: Steel uses verify-once checksums (like RocksDB/WiredTiger) - verify on first read from disk, then trust page cache. Use FileLayoutConfig::fast() to disable checksums entirely for ZFS/ECC storage.

Quick Commands

Steel (Oak engine)

# Build
<<<<<<< HEAD
cd ~/p/benchmark-suite && graft build --release -p ycsb-steel
=======
cd ~/p/basin-bench && graft build --release -p ycsb-steel
>>>>>>> origin/main

# Single-threaded
ycsb-steel --fast --data-dir /tmp/bench --workload a --records 50000 --ops 200000

# Multi-threaded with sharding
ycsb-steel --fast --shards 64 --threads 4 --data-dir /tmp/bench --workload a --records 50000 --ops 200000

# Ultimate adversarial benchmark (vs sled)
<<<<<<< HEAD
cd ~/p/engine-stack/engines/steel
=======
cd ~/p/basin-engines/engines/steel
>>>>>>> origin/main
graft run --release --example ultimate_adversarial

Fair 4-Engine Comparison

# Use the fair comparison script (includes proper batching for all engines)
<<<<<<< HEAD
RECORDS=50000 OPS=200000 ~/p/benchmark-suite/scripts/steel-fair-compare.sh
=======
RECORDS=50000 OPS=200000 ~/p/basin-bench/scripts/steel-fair-compare.sh
>>>>>>> origin/main

Individual Engine Commands (Fair Config)

# Steel
ycsb-steel --fast --workload a --records 50000 --ops 200000 --data-dir /tmp/bench

# sled (scaled cache)
ycsb-sled --high-throughput --cache-mb 256 --workload a --records 50000 --ops 200000 --data-dir /tmp/bench

# LMDB (batched + nosync)
ycsb-lmdb --batched --nosync --batch-size 1000 --workload a --records 50000 --ops 200000 --data-dir /tmp/bench

# redb (batched)
ycsb-redb --batched --batch-size 1000 --workload a --records 50000 --ops 200000 --data-dir /tmp/bench

Steel Results (2025-12-23) - Steel Wins All

Steel now beats LMDB on ALL workloads!

| Workload | Steel | LMDB | redb | sled | Winner | |----------|------|------|------|------|--------| | A (writes) | 2.49M | 2.24M | 687K | 744K | Steel +11% | | B (reads) | 3.01M | 2.90M | 2.05M | 1.55M | Steel +3.8% | | C (pure read) | 3.03M | 1.79M | 1.05M | 1.81M | Steel +69% |

Optimizations That Closed the Gap

Implemented (see docs/STEEL_OPTIMIZATIONS.md):

  • get_ref() +8.4% - zero-copy reads (KEY WIN)
  • get_cached_epoch() +1% - thread-local epoch
  • get_fast() - seqlock skip (no gain, kept for API)

Gap closed! Previous 43% gap on Workload B eliminated via zero-copy optimization.

Where Steel Actually Wins

| Scenario | Steel Advantage | Notes | |----------|----------------|-------| | Write-heavy (Workload A) | 1.07x vs LMDB | COW efficiency | | Pure reads (Workload C) | 1.52x vs LMDB | Zero-copy mmap | | Cold reads after restart | 3x vs sled | No log replay | | Range scans | 3.4x vs sled | COW pages | | Simplicity | ~6K LOC vs 20K+ | Easier to understand/debug |

Sharded Write Performance (2025-12-25)

With 64 shards, Steel beats sled by 2.3x:

| Writers | Shards | Steel writes/s | vs sled | |---------|--------|----------------|---------| | 1 | 16 | 3.0M | 149% | | 4 | 64 | 10.8M | 230% | | 8 | 64 | 16.8M | 237% |

Where Steel Does NOT Win

| Scenario | Winner | Notes | |----------|--------|-------| | Multi-key transactions | redb/LMDB | Steel has single-key atomicity only | | 30+ years production hardening | LMDB | Ecosystem maturity |

Common Mistakes (Avoid These)

| Mistake | What Happens | Fix | |---------|--------------|-----| | Benchmark LMDB without --batched | 7.9x slower | Use --batched --batch-size 1000 | | Benchmark redb without --batched | 24x slower | Use --batched --batch-size 1000 | | Claim "47x faster than redb" | Misleading | Fair comparison is ~1.9x | | Small dataset (50MB) | Memory-bound, not I/O | Use 500MB+ for I/O testing | | Forget to clear between engines | Cache effects | Sleep or clear page cache |

Key Files

| Purpose | Location | |---------|----------| <<<<<<< HEAD | Steel YCSB | ~/p/benchmark-suite/engines/ycsb-steel/ | | Fair script | ~/p/benchmark-suite/scripts/steel-fair-compare.sh | | Fairness docs | ~/p/benchmark-suite/docs/BENCHMARK_FAIRNESS.md | | Steel benchmarks | ~/p/engine-stack/engines/steel/BENCHMARKS.md | | Roadmap to #1 | ~/p/engine-stack/engines/steel/ROADMAP_BEST_KV.md | | Ultimate adversarial | ~/p/engine-stack/engines/steel/examples/ultimate_adversarial.rs |

| Steel YCSB | ~/p/basin-bench/engines/ycsb-steel/ | | Fair script | ~/p/basin-bench/scripts/steel-fair-compare.sh | | Fairness docs | ~/p/basin-bench/docs/BENCHMARK_FAIRNESS.md | | Steel benchmarks | ~/p/basin-engines/engines/steel/BENCHMARKS.md | | Roadmap to #1 | ~/p/basin-engines/engines/steel/ROADMAP_BEST_KV.md | | Ultimate adversarial | ~/p/basin-engines/engines/steel/examples/ultimate_adversarial.rs |

origin/main

Dialectical Improvement

When benchmarking, always ask:

  1. "What would a competitor's maintainer criticize about this benchmark?"
  2. "Am I using each engine's recommended configuration?"
  3. "What am I NOT measuring that matters?"
  4. "Is this result surprising? If so, investigate before publishing."