Agent Skills: Proof-of-Work

Proof artifact generation patterns for task validation. Covers screenshots, test results, deployments, and confidence scoring.

UncategorizedID: madappgang/claude-code/proof-of-work

Install this agent skill to your local

pnpm dlx add-skill https://github.com/MadAppGang/claude-code/tree/HEAD/plugins/autopilot/skills/proof-of-work

Skill Files

Browse the full folder contents for proof-of-work.

Download Skill

Loading file tree…

plugins/autopilot/skills/proof-of-work/SKILL.md

Skill Metadata

Name
proof-of-work
Description
Proof artifact generation patterns for task validation. Covers screenshots, test results, deployments, and confidence scoring.

plugin: autopilot updated: 2026-01-20

Proof-of-Work

Version: 0.1.0 Purpose: Generate validation artifacts for autonomous task completion Status: Phase 1

When to Use

Use this skill when you need to:

  • Generate proof artifacts after task completion
  • Capture screenshots for UI verification
  • Parse and report test results
  • Calculate confidence scores for task validation
  • Determine if a task can be auto-approved

Overview

Proof-of-work is the mechanism that validates task completion. Every finished task must include verifiable artifacts that demonstrate the work was done correctly.

Proof Types by Task

Bug Fix Proof

| Artifact | Required | Purpose | |----------|----------|---------| | Git diff | Yes | Show minimal, focused changes | | Test results | Yes | All tests passing | | Regression test | Yes | Specific test for the bug | | Error log (before/after) | Optional | Visual evidence |

Feature Proof

| Artifact | Required | Purpose | |----------|----------|---------| | Screenshots | Yes | Visual verification | | Test results | Yes | Functionality works | | Coverage report | Yes | >= 80% coverage | | Build output | Yes | Builds successfully | | Deployment URL | Optional | Live demo |

UI Change Proof

| Artifact | Required | Purpose | |----------|----------|---------| | Desktop screenshot | Yes | 1920x1080 view | | Mobile screenshot | Yes | 375x667 view | | Tablet screenshot | Yes | 768x1024 view | | Accessibility score | Yes | >= 80 Lighthouse | | Visual regression | Optional | BackstopJS diff |

Screenshot Capture

Playwright Pattern:

import { chromium } from 'playwright';

async function captureScreenshots(url: string, outputDir: string) {
  const browser = await chromium.launch({ headless: true });
  const context = await browser.newContext();
  const page = await context.newPage();

  // Desktop
  await page.setViewportSize({ width: 1920, height: 1080 });
  await page.goto(url);
  await page.waitForLoadState('networkidle');
  await page.screenshot({
    path: `${outputDir}/desktop.png`,
    fullPage: true,
  });

  // Mobile
  await page.setViewportSize({ width: 375, height: 667 });
  await page.goto(url);
  await page.waitForLoadState('networkidle');
  await page.screenshot({
    path: `${outputDir}/mobile.png`,
    fullPage: true,
  });

  // Tablet
  await page.setViewportSize({ width: 768, height: 1024 });
  await page.goto(url);
  await page.waitForLoadState('networkidle');
  await page.screenshot({
    path: `${outputDir}/tablet.png`,
    fullPage: true,
  });

  await browser.close();
}

Confidence Scoring

Algorithm:

interface ProofArtifacts {
  testResults?: { passed: number; total: number };
  buildSuccessful?: boolean;
  lintErrors?: number;
  screenshots?: string[];
  testCoverage?: number;
  performanceScore?: number;
}

function calculateConfidence(artifacts: ProofArtifacts): number {
  let score = 0;

  // Tests (40 points)
  if (artifacts.testResults) {
    if (artifacts.testResults.passed === artifacts.testResults.total) {
      score += 40;
    }
  }

  // Build (20 points)
  if (artifacts.buildSuccessful) {
    score += 20;
  }

  // Coverage (20 points)
  if (artifacts.testCoverage) {
    if (artifacts.testCoverage >= 80) score += 20;
    else if (artifacts.testCoverage >= 60) score += 15;
    else if (artifacts.testCoverage >= 40) score += 10;
    else score += 5;
  }

  // Screenshots (10 points)
  if (artifacts.screenshots) {
    if (artifacts.screenshots.length >= 3) score += 10;
    else if (artifacts.screenshots.length >= 1) score += 5;
  }

  // Lint (10 points)
  if (artifacts.lintErrors === 0) {
    score += 10;
  }

  return score;
}

Confidence Thresholds

| Confidence | Action | |------------|--------| | >= 95% | Auto-approve (In Review -> Done) | | 80-94% | Manual review required | | < 80% | Validation failed, iterate |

Proof Summary Template

# Proof of Work

**Task**: {issue_id}
**Type**: {task_type}
**Confidence**: {score}%

## Test Results
- Total: {total}
- Passed: {passed}
- Failed: {failed}
- Coverage: {coverage}%

## Build
- Status: {status}
- Duration: {duration}

## Screenshots
- Desktop: proof/desktop.png
- Mobile: proof/mobile.png
- Tablet: proof/tablet.png

## Artifacts
- test-results.txt
- coverage.json
- build-output.txt

Examples

Example 1: Feature Proof Generation

const proof = {
  testResults: { passed: 15, total: 15 },
  buildSuccessful: true,
  lintErrors: 0,
  screenshots: ['desktop.png', 'mobile.png', 'tablet.png'],
  testCoverage: 85,
};

const confidence = calculateConfidence(proof);
// 40 (tests) + 20 (build) + 20 (coverage) + 10 (screenshots) + 10 (lint) = 100%

Example 2: Partial Proof

const proof = {
  testResults: { passed: 12, total: 15 },  // Some failing
  buildSuccessful: true,
  lintErrors: 2,
  screenshots: ['desktop.png'],
  testCoverage: 65,
};

const confidence = calculateConfidence(proof);
// 0 (tests fail) + 20 (build) + 15 (coverage) + 5 (1 screenshot) + 0 (lint errors) = 40%
// Result: Validation failed, must iterate

Best Practices

  • Always capture screenshots for UI work
  • Run full test suite, not just affected tests
  • Include coverage report for features
  • Build must pass before any proof is valid
  • Store proofs in session directory for debugging
  • Generate proof summary in markdown for Linear comments