Basic usage Skill | Agent Skills

Scripts

1. adb-screen-capture.py

Capture Android device screen and save locally.

# Basic usage
uv run .claude/skills/adb-screen-detection/scripts/adb-screen-capture.py

# Specify device
uv run .claude/skills/adb-screen-detection/scripts/adb-screen-capture.py --device 127.0.0.1:5555

# Custom output path
uv run .claude/skills/adb-screen-detection/scripts/adb-screen-capture.py --output /tmp/screen.png

# JSON output
uv run .claude/skills/adb-screen-detection/scripts/adb-screen-capture.py --json

Output:

{
  "device": "127.0.0.1:5555",
  "timestamp": "2025-12-01T10:30:45Z",
  "local_path": "/tmp/screenshot.png",
  "size": [1080, 2400],
  "success": true
}

2. adb-ocr-extract.py

Extract all visible text from device screen using Tesseract OCR.

# Basic usage (uses most recent screenshot)
uv run .claude/skills/adb-screen-detection/scripts/adb-ocr-extract.py

# Specify screenshot path
uv run .claude/skills/adb-screen-detection/scripts/adb-ocr-extract.py --image /tmp/screen.png

# Search for specific text
uv run .claude/skills/adb-screen-detection/scripts/adb-ocr-extract.py --search "Login"

# JSON output with coordinates
uv run .claude/skills/adb-screen-detection/scripts/adb-ocr-extract.py --json

Output:

{
  "text": ["Login", "Username", "Password", "Submit"],
  "detected": true,
  "search_found": true,
  "search_term": "Login",
  "coordinates": {
    "Login": [[100, 200, 150, 230]]
  }
}

3. adb-find-element.py

Find UI element by template matching or OCR text search.

# Find by OCR text
uv run .claude/skills/adb-screen-detection/scripts/adb-find-element.py \
    --method ocr \
    --target "Login Button" \
    --threshold 0.8

# Find by template image
uv run .claude/skills/adb-screen-detection/scripts/adb-find-element.py \
    --method template \
    --template /path/to/template.png \
    --threshold 0.8

# JSON output
uv run .claude/skills/adb-screen-detection/scripts/adb-find-element.py \
    --method ocr \
    --target "Login" \
    --json

Output:

{
  "found": true,
  "method": "ocr",
  "target": "Login",
  "coordinates": {
    "x": 100,
    "y": 200,
    "width": 150,
    "height": 30
  },
  "confidence": 0.95,
  "message": "Element found at (100, 200)"
}

4. adb-tap-coordinate.py

Tap device screen at specific coordinates.

# Tap at coordinates
uv run .claude/skills/adb-screen-detection/scripts/adb-tap-coordinate.py \
    --x 100 \
    --y 200 \
    --device 127.0.0.1:5555

# Tap with verification (check screen after tap)
uv run .claude/skills/adb-screen-detection/scripts/adb-tap-coordinate.py \
    --x 100 \
    --y 200 \
    --verify-text "Next Screen" \
    --timeout 5

# JSON output
uv run .claude/skills/adb-screen-detection/scripts/adb-tap-coordinate.py \
    --x 100 \
    --y 200 \
    --json

Output:

{
  "device": "127.0.0.1:5555",
  "tap": {
    "x": 100,
    "y": 200
  },
  "success": true,
  "verified": true,
  "verify_text": "Next Screen",
  "verification_match": true
}

Usage Patterns

Pattern 1: Verify Screen State Before Action

# 1. Capture current screen
adb-screen-capture.py

# 2. Check for expected element
adb-find-element.py --method ocr --target "Login Button"

# 3. If found, tap it
adb-tap-coordinate.py --x 100 --y 200 --verify-text "Welcome"

Pattern 2: OCR-Based Automation

# 1. Capture screen
adb-screen-capture.py

# 2. Extract all text
adb-ocr-extract.py --search "Settings"

# 3. Get coordinates and tap
adb-find-element.py --method ocr --target "Settings"
adb-tap-coordinate.py --x 150 --y 300

Pattern 3: Template-Based Element Detection

# 1. Have known UI template images in ./templates/
# 2. Capture screen
adb-screen-capture.py

# 3. Match against templates
adb-find-element.py --method template --template ./templates/button.png

# 4. Tap matched location
adb-tap-coordinate.py --x $(jq -r '.coordinates.x') --y $(jq -r '.coordinates.y')

Architecture

Design Principles:

Independent: Each script can run standalone
Chainable: Scripts output JSON for piping
Stateless: No dependencies between executions
Verifiable: Always verify screen state before proceeding
Timeout Protected: All network operations have timeouts

Dependency Relationship:

adb-screen-capture.py (foundation)
    ↓
adb-ocr-extract.py (uses capture)
adb-find-element.py (uses capture or templates)
    ↓
adb-tap-coordinate.py (uses find-element for verification)

Integration Points

Used By:

adb-navigation-base - Wait for elements between actions
adb-magisk - Verify Magisk UI state
adb-karrot - Verify app state during automation
adb-workflow-orchestrator - Screen verification in workflows

Dependencies:

System: adb command-line tool
Python: pytesseract, opencv-python, pillow, numpy

Troubleshooting

OCR Not Working

Install Tesseract: brew install tesseract (macOS) or apt-get install tesseract-ocr (Linux)
Set TESSDATA_PREFIX: export TESSDATA_PREFIX=/usr/local/share/tessdata

Template Matching Too Strict/Loose

Adjust --threshold parameter (0.0-1.0)
Higher threshold = stricter matching
Recommended: 0.8-0.9 for reliable detection

Device Offline

Check ADB connection: adb devices
Reconnect: adb connect <device>
Restart ADB: adb kill-server && adb start-server

Workflows

This skill includes TOON-based workflow definitions for automation.

What is TOON?

TOON (Task-Oriented Orchestration Notation) is a structured workflow definition language that pairs with Markdown documentation. Each workflow consists of:

[name].toon - Orchestration logic and execution steps
[name].md - Complete documentation and usage guide

This TOON+MD pairing approach is inspired by the BMAD METHOD pattern, adapted to use TOON instead of YAML for better orchestration support.

Available Workflows

Workflow files are located in workflow/ directory:

Example Workflows (adb-screen-detection):

workflow/screen-verification.toon - Capture and verify screen state
workflow/element-detection.toon - Find elements via OCR or template matching
workflow/screen-monitoring.toon - Continuous screen monitoring and analysis

Running a Workflow

Execute any workflow using the ADB workflow orchestrator:

uv run .claude/skills/adb-workflow-orchestrator/scripts/adb-run-workflow.py \
  --workflow .claude/skills/adb-screen-detection/workflow/screen-verification.toon \
  --param device="127.0.0.1:5555"

Workflow Documentation

Each workflow includes comprehensive documentation in the corresponding .md file:

Purpose and use case
Prerequisites and requirements
Available parameters
Execution phases and steps
Success criteria
Error handling and recovery
Example commands

See the workflow/ directory for complete TOON file definitions and documentation.

Creating New Workflows

To create custom workflows for this skill:

Create a new .toon file in the workflow/ directory
Define phases, steps, and parameters using TOON v4.0 syntax
Create corresponding .md file with comprehensive documentation
Test with the workflow orchestrator

For more information, refer to the TOON specification and the workflow orchestrator documentation.

Version: 1.0.0 Status: ✅ Foundation Tier Scripts: 4 (all MCP-ready) Last Updated: 2025-12-01 Tier: 2 (Foundation)

Agent Skills: Basic usage

Install this agent skill to your local

Skill Files

Scripts

1. adb-screen-capture.py

2. adb-ocr-extract.py

3. adb-find-element.py

4. adb-tap-coordinate.py

Usage Patterns

Pattern 1: Verify Screen State Before Action

Pattern 2: OCR-Based Automation

Pattern 3: Template-Based Element Detection

Architecture

Integration Points

Troubleshooting

OCR Not Working

Template Matching Too Strict/Loose

Device Offline

Workflows

What is TOON?

Available Workflows

Running a Workflow

Workflow Documentation

Creating New Workflows