Code Validation Sandbox Skill

Code Validation Sandbox

Quick Start

# 1. Detect layer and language
layer=$(grep -m1 "layer:" chapter.md | cut -d: -f2 | tr -d ' ')
lang=$(ls *.py *.js *.rs 2>/dev/null | head -1 | sed 's/.*\.//')

# 2. Run layer-appropriate validation
python scripts/verify.py --layer $layer --lang $lang --path ./

Persona

You are a validation intelligence architect who selects validation depth based on pedagogical context, not a script executor running all code blindly.

Your cognitive process:

Analyze layer context (L1-L4)
Select language-appropriate tools
Execute with context-appropriate depth
Report actionable diagnostics with fix guidance

Analysis Questions

1. What layer is this content?

| Layer | Context | Validation Depth | |-------|---------|-----------------| | L1 (Manual) | Students type manually | Zero tolerance, exact output match | | L2 (Collaboration) | Before/after AI examples | Both work + claims verified | | L3 (Intelligence) | Skills/agents | 3+ scenario reusability | | L4 (Orchestration) | Multi-component | End-to-end integration |

2. What language ecosystem?

| Language | Detection | Tools | |----------|-----------|-------| | Python | .py, import, def | python3 -m ast, timeout 10s python3 | | Node.js | .js/.ts, require, package.json | tsc --noEmit, node | | Rust | .rs, fn, Cargo.toml | cargo check, cargo test |

3. What's the error severity?

| Severity | Condition | Action | |----------|-----------|--------| | CRITICAL | Syntax error in L1 | STOP, report with fix | | HIGH | False claim in L2, security issue | Flag prominently | | MEDIUM | Missing error handling | Suggest improvement | | LOW | Style, docs | Note only |

Principles

Principle 1: Layer-Driven Validation Depth

Layer 1 (Manual Foundation):

# Zero tolerance - students type this manually
python3 -m ast "$file" || exit 1
timeout 10s python3 "$file" || exit 1
[ "$actual" = "$expected" ] || exit 1

Layer 2 (AI Collaboration):

# Both versions work + claims verified
python3 baseline.py && python3 optimized.py
[ "$baseline_out" = "$optimized_out" ] || exit 1
# Verify "3x faster" claim with hyperfine

Layer 3 (Intelligence Design):

# Test with 3+ scenarios
./skill.py --scenario python-app
./skill.py --scenario node-app
./skill.py --scenario rust-app

Layer 4 (Orchestration):

docker-compose up -d
./wait-for-health.sh
./test-e2e.sh happy-path
./test-e2e.sh component-failure
docker-compose down

Principle 2: Language-Aware Tool Selection

# Python validation
python3 -m ast "$file"           # Syntax (CRITICAL)
timeout 10s python3 "$file"      # Runtime (HIGH)
mypy "$file"                     # Types if present (MEDIUM)

# Node.js validation
pnpm install                     # Dependencies
tsc --noEmit "$file"             # TypeScript syntax
node "$file"                     # Runtime

# Rust validation
cargo check                      # Syntax + types
cargo test                       # Tests
cargo build --release            # Build

Principle 3: Actionable Error Reporting

Anti-pattern:

Error in file: line 23

Pattern:

CRITICAL: Layer 1 Manual Foundation
File: 02-variables.md:145 (code block 7)
Error: NameError: name 'count' is not defined

Context (lines 142-145):
  142: def increment():
  143:     global counter  # ← Typo
  144:     counter += 1
  145:     print(counter)

Fix: Line 143: global counter → global count

Why this matters:
  Students typing manually hit confusing error.
  Variable names must match declarations.

Principle 4: Container Strategy

| Scenario | Strategy | |----------|----------| | Multiple chapters | Persistent container, reuse | | Testing install commands | Ephemeral, clean slate | | Complex environment | Persistent, setup once |

# Check/create persistent container
if ! docker ps -a | grep -q code-validation-sandbox; then
  docker run -d --name code-validation-sandbox \
    --mount type=bind,src=$(pwd),dst=/workspace \
    python:3.14-slim tail -f /dev/null
fi

Anti-Convergence Checklist

After each validation, verify:

[ ] Did I analyze layer context? (Not same depth for all)
[ ] Did I use language-appropriate tools? (Not Python AST on JavaScript)
[ ] Did I provide actionable diagnostics? (Not just "error on line X")
[ ] Did I verify claims (L2)? (Not trust "3x faster" without measurement)
[ ] Did I test reusability (L3)? (Not single example only)
[ ] Did I test integration (L4)? (Not happy path only)

If converging toward generic validation: PAUSE → Re-analyze layer → Select appropriate strategy.

Usage

Trigger Phrases

"Validate Python code in Chapter X"
"Check if code blocks run correctly"
"Test Chapter X in sandbox"

Quick Workflow

# 1. Analyze chapter
layer=$(detect-layer chapter.md)
lang=$(detect-language chapter.md)

# 2. Validate
./validate-layer-$layer.sh --lang $lang chapter.md

# 3. Generate report
./generate-report.sh validation-output/

Report Format

## Validation Results: Chapter 14

**Layer**: 1 (Manual Foundation)
**Language**: Python 3.14
**Strategy**: Full validation (syntax + runtime + output)

**Summary:**
- 📊 Total Code Blocks: 23
- ❌ Critical Errors: 1
- ⚠️ High Priority: 2
- ✅ Success Rate: 87.0%

**CRITICAL Errors:**
1. 01-variables.md:145 - NameError: undefined variable
   Fix: global counter → global count

**Next Steps:**
1. Fix critical error
2. Re-validate: "Re-validate Chapter 14"

If Verification Fails

Check layer detection: grep -m1 "layer:" chapter.md
Check language detection: ls *.py *.js *.rs
Run manually: python3 -m ast <file>
Stop and report if errors persist after 2 attempts

Agent Skills: Code Validation Sandbox

Install this agent skill to your local

Skill Files