UI Automation Workflows
Use the execute_idb_command MCP tool for all UI automation
The xclaude-plugin provides the execute_idb_command MCP tool which consolidates all IDB UI automation operations into a single, token-efficient dispatcher.
⚠️ CRITICAL: Always Use MCP Tools First
This is the most important rule: When automating UI interactions, you MUST use the execute_idb_command MCP tool.
- ✅ DO: Invoke
execute_idb_commandfor all UI automation, element finding, and accessibility queries - ✅ DO: If the MCP tool fails, adjust parameters and retry
- ✅ DO: Read error messages and debug the parameters
- ❌ NEVER: Fall back to bash
idbcommands - ❌ NEVER: Use
idbdirectly in bash - ❌ NEVER: Run
idbcommands in a terminal
Why? The MCP tool provides:
- Structured error handling
- Token efficiency (consolidated into 1 tool vs. verbose bash output)
- Proper integration with the xclaude-plugin architecture
- Accessibility-first patterns built-in
If execute_idb_command fails, the issue is with parameters or app state - not that you should use bash.
Core Principle: Accessibility Before Screenshots
Always query the accessibility tree first. Only use screenshots as a fallback.
Use the execute_idb_command MCP tool with operation describe to access the accessibility tree.
Why Accessibility-First?
| Approach | Time | Tokens | Reliability | |----------|------|--------|-------------| | Accessibility tree | ~120ms | ~50 | Survives theme changes | | Screenshot | ~2000ms | ~170 | Breaks on visual changes |
Result: 3-4x faster, 80% cheaper, more reliable
Standard Workflow
1. Check Accessibility Quality (Optional) - Use execute_idb_command
Before starting automation, check if the app has good accessibility support:
Invoke the execute_idb_command MCP tool:
{
"operation": "check-accessibility",
"target": "booted"
}
Interprets:
- "excellent" or "good": Proceed with accessibility-first workflow
- "poor" or "insufficient": May need to rely more on screenshots
Note: Most modern iOS apps have good accessibility support. Skip this check if you're confident.
2. Query Accessibility Tree - Use execute_idb_command with operation: "describe"
This is your starting point for all UI automation:
Invoke the execute_idb_command MCP tool:
{
"operation": "describe",
"target": "booted",
"parameters": {
"operation": "all"
}
}
Returns:
{
"elements": [
{
"label": "Login",
"type": "Button",
"frame": { "x": 100, "y": 400, "width": 175, "height": 50 },
"centerX": 187,
"centerY": 425,
"enabled": true,
"visible": true
},
{
"label": "Email",
"type": "TextField",
"value": "",
"frame": { "x": 50, "y": 300, "width": 275, "height": 44 },
"centerX": 187,
"centerY": 322
}
]
}
Use centerX and centerY for tap coordinates.
3. Find Your Element
Option A: Search by Label/Text (Preferred)
{
"operation": "find-element",
"target": "booted",
"parameters": {
"query": "Login"
}
}
Option B: Manual Search
From the accessibility tree response, find the element you want by:
label: Button text, field labelstype: Button, TextField, Cell, etc.value: Current input valuevisible: Only interact with visible elements
4. Interact with Element
Tap:
{
"operation": "tap",
"target": "booted",
"parameters": {
"x": 187,
"y": 425
}
}
Input Text:
{
"operation": "input",
"target": "booted",
"parameters": {
"text": "user@example.com"
}
}
Keyboard Actions:
{
"operation": "input",
"target": "booted",
"parameters": {
"key": "return"
}
}
Available keys: return, home, delete, space, escape, tab, up, down, left, right
5. Verify State
After interaction, query accessibility tree again to verify:
{
"operation": "describe",
"target": "booted"
}
Common Patterns
Pattern: Login Flow
1. describe → Find "Email" text field
2. tap → Focus email field
3. input → Type email
4. describe → Find "Password" text field
5. tap → Focus password field
6. input → Type password
7. describe → Find "Login" button
8. tap → Submit form
9. describe → Verify next screen
Pattern: Navigate and Tap
1. describe → Get all buttons
2. find-element → Search for specific button
3. tap → Execute tap
4. describe → Verify navigation
Pattern: Fill Form
1. describe → Get all text fields
2. For each field:
- tap → Focus field
- input → Enter text
- input key:return → Next field
3. describe → Find submit button
4. tap → Submit
Pattern: Scroll and Find
1. describe → Check if element visible
2. If not visible:
- gesture (swipe up) → Scroll
- describe → Check again
3. find-element → Locate target
4. tap → Interact
Gestures
Swipe
{
"operation": "gesture",
"target": "booted",
"parameters": {
"gesture_type": "swipe",
"direction": "up",
"duration": 200
}
}
Directions: up, down, left, right
Button Presses
{
"operation": "gesture",
"target": "booted",
"parameters": {
"gesture_type": "button",
"button": "HOME"
}
}
Buttons: HOME, LOCK, SIRI, SIDE_BUTTON, APPLE_PAY, SCREENSHOT, APP_SWITCH
When to Use Screenshots (Fallback Only)
Only use screenshots if:
-
Accessibility quality is "poor"
{ "operation": "check-accessibility", "target": "booted" } -
Visual verification needed
- Checking UI layout
- Verifying colors/images
- Debug visual issues
-
Element not in accessibility tree
- Custom drawn UI
- Canvas/game elements
- Some third-party components
For everything else, use accessibility tree.
Troubleshooting
Element Not Found
Problem: find-element returns no results
Solutions:
- Query full tree with
describeto see all elements - Check if element is in a scroll view (may be off-screen)
- Verify app state (correct screen?)
- Check if element has accessibility label
Tap Not Working
Problem: Tap executes but nothing happens
Solutions:
- Verify element is
enabled: true - Check element is
visible: true - Confirm coordinates are correct (use
centerX,centerY) - Element might need double-tap or long-press
Input Not Working
Problem: Text input not appearing
Solutions:
- Tap text field first to focus
- Wait for keyboard to appear
- Check field is not disabled
- Use keyboard-specific keys (
return,delete)
Advanced: Coordinate Transformation
If using screenshots with idb-ui-tap, coordinates may need scaling:
{
"operation": "tap",
"target": "booted",
"parameters": {
"x": 187,
"y": 425,
"applyScreenshotScale": true,
"screenshotScaleX": 0.5,
"screenshotScaleY": 0.5
}
}
But with accessibility-first, this is rarely needed.
Performance Tips
- Batch Operations: Group describe queries to minimize round-trips
- Cache Tree: Reuse accessibility tree if UI hasn't changed
- Target Specific Areas: Use
describewith point coordinates for specific regions - Avoid Unnecessary Waits: Accessibility tree reflects real-time state
Integration with MCP Tools
This Skill works with execute_idb_command tool:
- All operations use the
execute_idb_commandtool - Tool handles IDB connection and execution
- Tool returns structured accessibility data
- This Skill teaches WHEN and HOW to use operations
Related Skills
- accessibility-testing: WCAG compliance and quality assessment
- ios-testing-patterns: Test automation strategies
- simulator-workflows: Device and app management
Related Resources
xc://operations/idb: Complete IDB operations referencexc://reference/accessibility: Accessibility tree structure guidexc://workflows/accessibility-first: This workflow pattern
Remember: Accessibility tree first, screenshots last. 3-4x faster, 80% cheaper.