Agent Skills: HTML Structure Validate Skill

Validate HTML5 structure and basic syntax. BLOCKING quality gate - stops pipeline if validation fails. Ensures deterministic output quality.

UncategorizedID: aiskillstore/marketplace/html-structure-validate

Install this agent skill to your local

pnpm dlx add-skill https://github.com/aiskillstore/marketplace/tree/HEAD/skills/abejitsu/html-structure-validate

Skill Files

Browse the full folder contents for html-structure-validate.

Download Skill

Loading file tree…

skills/abejitsu/html-structure-validate/SKILL.md

Skill Metadata

Name
html-structure-validate
Description
Validate HTML5 structure and basic syntax. BLOCKING quality gate - stops pipeline if validation fails. Ensures deterministic output quality.

HTML Structure Validate Skill

Purpose

This skill is a BLOCKING quality gate that ensures generated HTML meets minimum structural requirements. It is the first deterministic validation of probabilistic AI-generated output.

The skill checks:

  • HTML5 compliance - Proper DOCTYPE, tags
  • Tag closure - All tags properly closed
  • Required elements - Meta tags, stylesheet links
  • Well-formedness - Valid structure

If validation fails, the pipeline STOPS and triggers a hook to notify the user.

This enforces the principle: Python validates, ensuring deterministic quality.

What to Do

  1. Load HTML file to validate

    • Read 04_page_XX.html generated by AI skill
    • Verify file exists and is readable
    • Confirm file is text (not binary)
  2. Run validation checks

    • Check HTML5 structure compliance
    • Verify tag closure
    • Validate head section
    • Check required CSS link
    • Validate page container structure
  3. Generate validation report

    • Document all checks performed
    • List any errors found
    • Note warnings (non-blocking)
    • Record informational findings
  4. Save validation report as JSON

    • Save to: output/chapter_XX/page_artifacts/page_YY/06_validation_structure.json
    • Include timestamp
    • Include all check results
  5. Exit with appropriate code

    • Return 0 if VALID (continue pipeline)
    • Return 1 if INVALID (STOP pipeline, trigger hook)

Input Parameters

html_file: <str>         - Path to 04_page_XX.html
output_dir: <str>        - Directory for validation report
strict_mode: <bool>      - If true, warnings also fail (default: false)
page_number: <int>       - Page number (for reporting)
chapter: <int>           - Chapter number (for reporting)

Validation Checks

Check 1: DOCTYPE Declaration

Requirement: File must start with proper DOCTYPE

<!DOCTYPE html>

Check:

  • [ ] File contains <!DOCTYPE html> (case-insensitive)
  • [ ] DOCTYPE appears before any tags
  • [ ] DOCTYPE is on first line or near beginning

Error if: Missing or incorrect DOCTYPE

Check 2: HTML Tags

Requirement: Proper <html> opening and closing tags

<html lang="en">
    ...
</html>

Checks:

  • [ ] <html> tag present
  • [ ] </html> closing tag present
  • [ ] Tags are properly paired
  • [ ] No unclosed <html> tags

Error if: Missing either tag or improperly paired

Check 3: Head Section

Requirement: Complete <head> section with metadata

<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>...</title>
    <link rel="stylesheet" href="../../styles/main.css">
</head>

Checks:

  • [ ] <head> and </head> tags present
  • [ ] <meta charset="UTF-8"> present
  • [ ] <meta name="viewport"> present (warning if missing)
  • [ ] <title> tag with content present
  • [ ] CSS <link> tag present with href attribute

Error if: Missing charset, title, or CSS link Warning if: Missing viewport meta tag

Check 4: Body Section

Requirement: Proper <body> tags with content

<body>
    <div class="page-container">
        <main class="page-content">
            ...
        </main>
    </div>
</body>

Checks:

  • [ ] <body> and </body> tags present
  • [ ] <div class="page-container"> present
  • [ ] <main class="page-content"> present inside container
  • [ ] Body contains substantial content (> 100 bytes)

Error if: Missing tags or required container divs

Check 5: Tag Closure Validation

Requirement: All tags must be properly closed

Checks for:

  • Unmatched opening tags (e.g., <p> without </p>)
  • Improper nesting (e.g., <p><h2>text</h2></p>)
  • Self-closing tags used correctly (e.g., <br/>, <img/>)
  • Comment blocks properly formatted (<!-- -->)

Validation method:

  • Parse HTML into tree structure
  • Verify all nodes properly matched
  • Check nesting doesn't violate HTML5 rules

Error if: Any unmatched or improperly nested tags

Check 6: Heading Tags (h1-h6)

Requirement: Valid heading hierarchy

<h1>Chapter Title</h1>
<h2>Section Heading</h2>
<h3>Subsection</h3>

Checks:

  • [ ] All heading tags properly closed
  • [ ] First heading should be h1 (warning if not)
  • [ ] Heading levels don't skip dramatically (h1 → h4 is suspicious)
  • [ ] All headings have text content (not empty)

Error if: Heading tags improperly closed Warning if: Suspicious hierarchy

Check 7: Content Structure

Requirement: Meaningful content in page container

Checks:

  • [ ] <main class="page-content"> contains elements
  • [ ] Content includes headings or paragraphs
  • [ ] No completely empty content area
  • [ ] Text nodes or elements present (> 100 words total)

Error if: No content or empty structure

Check 8: List Integrity

Requirement: All lists properly structured

Checks for each <ul> or <ol>:

  • [ ] List opening and closing tags matched
  • [ ] List contains <li> elements
  • [ ] All <li> tags properly closed
  • [ ] <li> count matches opening/closing pairs
  • [ ] No nested <ul> or <ol> improperly closed

Error if: Empty lists or unmatched <li> tags

Check 9: Image and Link Tags

Requirement: Self-closing tags properly formatted

Checks:

  • [ ] All <img> tags have src and alt attributes
  • [ ] All <a> tags have valid href attributes
  • [ ] Image paths don't have obvious errors (no broken syntax)
  • [ ] Self-closing tags use proper syntax

Warning if: Images missing alt text or links missing href

Check 10: Table Tags (if present)

Requirement: Proper table structure

Checks:

  • [ ] <table>, <tr>, <td>, <th> tags properly nested
  • [ ] All rows have consistent column counts
  • [ ] Table headers and body properly structured

Error if: Malformed table structure

Validation Report Format

Output: 06_validation_structure.json

{
  "page": 16,
  "book_page": 17,
  "chapter": 2,
  "validation_type": "structure",
  "validation_timestamp": "2025-11-08T14:34:00Z",
  "overall_status": "PASS",
  "error_count": 0,
  "warning_count": 1,
  "checks_performed": [
    {
      "check_name": "DOCTYPE Declaration",
      "status": "PASS",
      "details": "Valid HTML5 DOCTYPE found"
    },
    {
      "check_name": "HTML Tags",
      "status": "PASS",
      "details": "Proper <html> opening and closing tags"
    },
    {
      "check_name": "Head Section",
      "status": "PASS",
      "details": "All required meta tags and title present"
    },
    {
      "check_name": "Body Section",
      "status": "PASS",
      "details": "Body and content structure valid"
    },
    {
      "check_name": "Tag Closure",
      "status": "PASS",
      "details": "All tags properly matched and closed"
    },
    {
      "check_name": "Heading Hierarchy",
      "status": "PASS",
      "details": "4 headings found, proper h1-h4 hierarchy"
    },
    {
      "check_name": "Content Structure",
      "status": "PASS",
      "details": "Main content area contains 245 words across 3 paragraphs"
    },
    {
      "check_name": "List Integrity",
      "status": "PASS",
      "details": "1 list with 3 items, all properly formed"
    },
    {
      "check_name": "Image Tags",
      "status": "PASS",
      "details": "No images on this page"
    },
    {
      "check_name": "Table Tags",
      "status": "PASS",
      "details": "No tables on this page"
    }
  ],
  "errors": [],
  "warnings": [
    {
      "check": "Heading Hierarchy",
      "message": "First heading is h2, typically should be h1 for page opening",
      "severity": "LOW"
    }
  ],
  "summary": {
    "total_checks": 10,
    "passed": 9,
    "failed": 0,
    "warnings": 1,
    "html_valid": true,
    "tags_matched": true,
    "content_substantial": true
  }
}

Validation Rules

PASS Criteria

  • DOCTYPE present and valid
  • All required tags (html, head, body, main, div.page-container) present
  • All tags properly closed and matched
  • Title tag with content
  • CSS stylesheet link present
  • Content structure valid
  • No structural errors

FAIL Criteria (BLOCKS PIPELINE)

  • Missing DOCTYPE
  • Missing required tags
  • Unmatched or improperly nested tags
  • Missing title or CSS link
  • Empty content
  • Malformed lists or tables

WARNING (Logged but doesn't block)

  • Missing viewport meta tag
  • First heading is not h1
  • Large heading jumps (h1 → h4)
  • Missing alt text on images
  • Missing href on links

Implementation: Using Python Script

This validation is performed by existing validate_html.py tool, run in structure validation mode:

cd Calypso/tools

# Validate single page HTML
python3 validate_html.py \
  ../output/chapter_02/page_artifacts/page_16/04_page_16.html \
  --output-json ../output/chapter_02/page_artifacts/page_16/06_validation_structure.json \
  --strict-structure

# Exit code:
# 0 = VALID (continue to next skill)
# 1 = INVALID (STOP pipeline)

Hook Integration

When validation FAILS:

# Trigger hook: .claude/hooks/validate-structure.sh
# Receives:
#   - Page number
#   - HTML file path
#   - Validation report path
#   - Error details

# Hook behavior:
# - Log failure with details
# - Save error report
# - Notify user
# - STOP pipeline (no further processing)

Error Recovery

If validation fails:

  1. User reviews validation report
  2. User identifies issue in AI-generated HTML
  3. Options:
    • Fix HTML manually and re-validate
    • Re-run AI generation with improved prompt
    • Review source extraction data for errors
    • Proceed with caution (expert override)

Quality Metrics

Validation provides metrics:

  • Percentage of checks passing
  • Error severity levels
  • Content size (word count, element count)
  • Structure complexity

These metrics feed into final quality reports.

Success Criteria

✓ Validation completes successfully ✓ All structural checks pass (0 errors) ✓ Validation report saved in JSON format ✓ Exit code 0 returned (or 1 if invalid) ✓ Clear error messages if validation fails

Next Steps After PASS

If validation passes:

  1. All pages of chapter processed through this gate
  2. Skill 4 (consolidate pages) merges individual page HTMLs
  3. Quality Gate 2 (semantic validate) checks semantic structure
  4. Continue through validation pipeline

Next Steps After FAIL

If validation fails:

  1. PIPELINE STOPS
  2. Hook validate-structure.sh triggered
  3. User receives error report with details
  4. User must fix issues and retry

Design Notes

  • This is the first deterministic quality gate
  • Uses proven validate_html.py tool
  • Catches structural issues before semantic analysis
  • Provides clear, actionable error messages
  • Essential for ensuring pipeline reliability

Testing

To test structure validation:

# Test with known-good HTML
python3 validate_html.py ../output/chapter_01/chapter_01.html

# Should show: ✓ VALID

# Test with invalid HTML (if needed)
python3 validate_html.py broken_html.html

# Should show: ✗ INVALID with specific errors