Macro Agent
Desktop automation and UI control skill with image recognition.
π¨ CRITICAL: How to Handle User Requests
BEFORE doing ANY action, ALWAYS check if a sequence exists for it:
- FIRST run
seq-listto see available sequences - LOOK for sequences that match the user's intent (e.g.,
whatsapp_send_marcofor "send message to Marco") - IF sequence exists: Use
seq-run <sequence_name>then add your custom actions (write message, press enter) - IF NO sequence exists: Then use individual commands
Common Workflow: Send Message to Contact
When user says "send message to X" or "envΓa mensaje a X":
1. seq-list # Check available sequences
2. seq-run whatsapp_send_<contact> # Run the messaging sequence
3. write "<message>" # Type the message
4. press enter # Send it
NEVER use hotkey super or manual navigation when a sequence exists!
Available Sequences (check with seq-list)
The user has pre-configured sequences for common tasks. Always check them first!
whatsapp_send_ross- Opens WhatsApp and selects Ross contactwhatsapp_send_marco- Opens WhatsApp and selects Marco contact- Other sequences may exist - always run
seq-listfirst!
π― How Element Detection Works
When using click-on or move-to, the agent ALWAYS uses image recognition:
- Searches for element image on screen (template matching)
- If not found β FAILS (no fallback to coordinates)
This ensures elements are found dynamically based on their actual position.
Output includes method field:
image= Found by template matching βnot_found= Image not visible on screen β
If element not found: You need to capture it first with region-capture.
β οΈ Important
NO "navigate" command exists. To navigate:
find <name>- Search for element infoclick-on <name>- Click using image recognition (ALWAYS)
Usage
python ~/.copilot/skills/macro-agent/macro_agent.py <command> [args]
Commands Reference
| Action | Command | Example |
|--------|---------|---------|
| Search element | find <name> | find brave |
| Search text | search <text> | search save |
| Click element | click-on <name> | click-on brave |
| Click coords | click X Y | click 500 300 |
| Move to element | move-to <name> | move-to button |
| Move to coords | move X Y | move 500 300 |
| Write text | write <text> | write "hello" |
| Press key | press <key> | press enter |
| Hotkey | hotkey <keys> | hotkey ctrl c |
| Scroll | scroll N | scroll -3 |
| Screenshot | screenshot <name> | screenshot test |
| Region capture | region-capture | region-capture |
Sequence Commands
| Command | Description |
|---------|-------------|
| seq-create <name> | Create new sequence |
| seq-add <name> "<action>" | Add action to sequence |
| seq-show <name> | View sequence |
| seq-run <name> | Execute sequence |
| seq-list | List all sequences |
| seq-delete <name> | Delete sequence |
Output
JSON with:
success: true/falseaction: Command executedtarget: Element name (if applicable)coordinates: {x, y} positionmessage: Result description
Data Locations
- Elements:
~/.copilot/skills/macro-agent/data/elements.json(elemento definitions) - Captures:
~/.copilot/skills/macro-agent/data/captures/(template images) - Sequences:
~/.copilot/skills/macro-agent/data/sequences/(action sequences)
Examples
Find and Click App
python ~/.copilot/skills/macro-agent/macro_agent.py find chrome
python ~/.copilot/skills/macro-agent/macro_agent.py click-on chrome
Type and Submit
python ~/.copilot/skills/macro-agent/macro_agent.py write "search query"
python ~/.copilot/skills/macro-agent/macro_agent.py press enter
Keyboard Shortcut
python ~/.copilot/skills/macro-agent/macro_agent.py hotkey ctrl shift s
Create and Run Sequence
python ~/.copilot/skills/macro-agent/macro_agent.py seq-create my_macro
python ~/.copilot/skills/macro-agent/macro_agent.py seq-add my_macro "click-on file_menu"
python ~/.copilot/skills/macro-agent/macro_agent.py seq-add my_macro "wait 0.5"
python ~/.copilot/skills/macro-agent/macro_agent.py seq-add my_macro "click-on save_option"
python ~/.copilot/skills/macro-agent/macro_agent.py seq-run my_macro
Capture New Elements
python ~/.copilot/skills/macro-agent/macro_agent.py region-capture
Keys: f=freeze, c/Space=capture, +/-=resize, q/ESC=quit
π± Example: Send WhatsApp Message
User says: "EnvΓa mensaje a Marco diciendo hola"
CORRECT approach:
# 1. First check sequences
seq-list
# 2. Found whatsapp_send_marco! Run it
seq-run whatsapp_send_marco
# 3. Type and send
write "hola"
press enter
WRONG approach (NEVER do this):
# β WRONG - Don't manually navigate!
hotkey super
wait 500
# This is stupid, use sequences!
π Decision Flow
User Request
β
Run seq-list
β
Sequence exists? ββYESβββ seq-run <name> β Additional actions (write, press)
β NO
Use individual commands (click-on, write, press, etc.)