Agent Skills: Basic usage

Screen understanding with OCR and template matching for Android device automation

UncategorizedID: rdmptv/adbautoplayer/adb-screen-detection

Install this agent skill to your local

pnpm dlx add-skill https://github.com/rdmptv/AdbAutoPlayer/tree/HEAD/.claude/skills/adb/adb-foundation/adb-screen-detection

Skill Files

Browse the full folder contents for adb-screen-detection.

Download Skill

Loading file tree…

.claude/skills/adb/adb-foundation/adb-screen-detection/SKILL.md

Skill Metadata

Name
adb-screen-detection
Description
Screen understanding with OCR and template matching for Android device automation

Scripts

1. adb-screen-capture.py

Capture Android device screen and save locally.

# Basic usage
uv run .claude/skills/adb-screen-detection/scripts/adb-screen-capture.py

# Specify device
uv run .claude/skills/adb-screen-detection/scripts/adb-screen-capture.py --device 127.0.0.1:5555

# Custom output path
uv run .claude/skills/adb-screen-detection/scripts/adb-screen-capture.py --output /tmp/screen.png

# JSON output
uv run .claude/skills/adb-screen-detection/scripts/adb-screen-capture.py --json

Output:

{
  "device": "127.0.0.1:5555",
  "timestamp": "2025-12-01T10:30:45Z",
  "local_path": "/tmp/screenshot.png",
  "size": [1080, 2400],
  "success": true
}

2. adb-ocr-extract.py

Extract all visible text from device screen using Tesseract OCR.

# Basic usage (uses most recent screenshot)
uv run .claude/skills/adb-screen-detection/scripts/adb-ocr-extract.py

# Specify screenshot path
uv run .claude/skills/adb-screen-detection/scripts/adb-ocr-extract.py --image /tmp/screen.png

# Search for specific text
uv run .claude/skills/adb-screen-detection/scripts/adb-ocr-extract.py --search "Login"

# JSON output with coordinates
uv run .claude/skills/adb-screen-detection/scripts/adb-ocr-extract.py --json

Output:

{
  "text": ["Login", "Username", "Password", "Submit"],
  "detected": true,
  "search_found": true,
  "search_term": "Login",
  "coordinates": {
    "Login": [[100, 200, 150, 230]]
  }
}

3. adb-find-element.py

Find UI element by template matching or OCR text search.

# Find by OCR text
uv run .claude/skills/adb-screen-detection/scripts/adb-find-element.py \
    --method ocr \
    --target "Login Button" \
    --threshold 0.8

# Find by template image
uv run .claude/skills/adb-screen-detection/scripts/adb-find-element.py \
    --method template \
    --template /path/to/template.png \
    --threshold 0.8

# JSON output
uv run .claude/skills/adb-screen-detection/scripts/adb-find-element.py \
    --method ocr \
    --target "Login" \
    --json

Output:

{
  "found": true,
  "method": "ocr",
  "target": "Login",
  "coordinates": {
    "x": 100,
    "y": 200,
    "width": 150,
    "height": 30
  },
  "confidence": 0.95,
  "message": "Element found at (100, 200)"
}

4. adb-tap-coordinate.py

Tap device screen at specific coordinates.

# Tap at coordinates
uv run .claude/skills/adb-screen-detection/scripts/adb-tap-coordinate.py \
    --x 100 \
    --y 200 \
    --device 127.0.0.1:5555

# Tap with verification (check screen after tap)
uv run .claude/skills/adb-screen-detection/scripts/adb-tap-coordinate.py \
    --x 100 \
    --y 200 \
    --verify-text "Next Screen" \
    --timeout 5

# JSON output
uv run .claude/skills/adb-screen-detection/scripts/adb-tap-coordinate.py \
    --x 100 \
    --y 200 \
    --json

Output:

{
  "device": "127.0.0.1:5555",
  "tap": {
    "x": 100,
    "y": 200
  },
  "success": true,
  "verified": true,
  "verify_text": "Next Screen",
  "verification_match": true
}

Usage Patterns

Pattern 1: Verify Screen State Before Action

# 1. Capture current screen
adb-screen-capture.py

# 2. Check for expected element
adb-find-element.py --method ocr --target "Login Button"

# 3. If found, tap it
adb-tap-coordinate.py --x 100 --y 200 --verify-text "Welcome"

Pattern 2: OCR-Based Automation

# 1. Capture screen
adb-screen-capture.py

# 2. Extract all text
adb-ocr-extract.py --search "Settings"

# 3. Get coordinates and tap
adb-find-element.py --method ocr --target "Settings"
adb-tap-coordinate.py --x 150 --y 300

Pattern 3: Template-Based Element Detection

# 1. Have known UI template images in ./templates/
# 2. Capture screen
adb-screen-capture.py

# 3. Match against templates
adb-find-element.py --method template --template ./templates/button.png

# 4. Tap matched location
adb-tap-coordinate.py --x $(jq -r '.coordinates.x') --y $(jq -r '.coordinates.y')

Architecture

Design Principles:

  • Independent: Each script can run standalone
  • Chainable: Scripts output JSON for piping
  • Stateless: No dependencies between executions
  • Verifiable: Always verify screen state before proceeding
  • Timeout Protected: All network operations have timeouts

Dependency Relationship:

adb-screen-capture.py (foundation)
    ↓
adb-ocr-extract.py (uses capture)
adb-find-element.py (uses capture or templates)
    ↓
adb-tap-coordinate.py (uses find-element for verification)

Integration Points

Used By:

  • adb-navigation-base - Wait for elements between actions
  • adb-magisk - Verify Magisk UI state
  • adb-karrot - Verify app state during automation
  • adb-workflow-orchestrator - Screen verification in workflows

Dependencies:

  • System: adb command-line tool
  • Python: pytesseract, opencv-python, pillow, numpy

Troubleshooting

OCR Not Working

  • Install Tesseract: brew install tesseract (macOS) or apt-get install tesseract-ocr (Linux)
  • Set TESSDATA_PREFIX: export TESSDATA_PREFIX=/usr/local/share/tessdata

Template Matching Too Strict/Loose

  • Adjust --threshold parameter (0.0-1.0)
  • Higher threshold = stricter matching
  • Recommended: 0.8-0.9 for reliable detection

Device Offline

  • Check ADB connection: adb devices
  • Reconnect: adb connect <device>
  • Restart ADB: adb kill-server && adb start-server

Workflows

This skill includes TOON-based workflow definitions for automation.

What is TOON?

TOON (Task-Oriented Orchestration Notation) is a structured workflow definition language that pairs with Markdown documentation. Each workflow consists of:

  • [name].toon - Orchestration logic and execution steps
  • [name].md - Complete documentation and usage guide

This TOON+MD pairing approach is inspired by the BMAD METHOD pattern, adapted to use TOON instead of YAML for better orchestration support.

Available Workflows

Workflow files are located in workflow/ directory:

Example Workflows (adb-screen-detection):

  • workflow/screen-verification.toon - Capture and verify screen state
  • workflow/element-detection.toon - Find elements via OCR or template matching
  • workflow/screen-monitoring.toon - Continuous screen monitoring and analysis

Running a Workflow

Execute any workflow using the ADB workflow orchestrator:

uv run .claude/skills/adb-workflow-orchestrator/scripts/adb-run-workflow.py \
  --workflow .claude/skills/adb-screen-detection/workflow/screen-verification.toon \
  --param device="127.0.0.1:5555"

Workflow Documentation

Each workflow includes comprehensive documentation in the corresponding .md file:

  • Purpose and use case
  • Prerequisites and requirements
  • Available parameters
  • Execution phases and steps
  • Success criteria
  • Error handling and recovery
  • Example commands

See the workflow/ directory for complete TOON file definitions and documentation.

Creating New Workflows

To create custom workflows for this skill:

  1. Create a new .toon file in the workflow/ directory
  2. Define phases, steps, and parameters using TOON v4.0 syntax
  3. Create corresponding .md file with comprehensive documentation
  4. Test with the workflow orchestrator

For more information, refer to the TOON specification and the workflow orchestrator documentation.


Version: 1.0.0 Status: ✅ Foundation Tier Scripts: 4 (all MCP-ready) Last Updated: 2025-12-01 Tier: 2 (Foundation)