Agent Communication Debugger Skill

Agent Communication Debugger

Debug and diagnose issues with the A2A (Agent-to-Agent) communication system, including the orchestrator, coder-agent, tester-agent, and message transport layers.

Prerequisites

A2A agent system located in a2a_communicating_agents/
Python 3.10+ environment
Access to agent logs in logs/ directory
Agent configurations in respective agent.json files

Instructions

1. Check Agent Status

First, determine which agents are running:

# Check all agent processes
ps aux | grep -E "(orchestrator|coder|tester|websocket)_agent|main.py" | grep -v grep

Look for:

orchestrator_agent/main.py
coder_agent/main.py
tester_agent/main.py
websocket_server.py

Common issues:

Agent process not found → Agent isn't running, needs to be started
Multiple instances → Duplicate processes causing conflicts

2. Inspect Agent Configurations

Read the agent configuration files to verify capabilities and topics:

# View orchestrator config
cat a2a_communicating_agents/orchestrator_agent/agent.json

# View coder agent config
cat a2a_communicating_agents/coder_agent/agent.json

# View tester agent config (if exists)
cat a2a_communicating_agents/tester_agent/agent.json

Verify:

Agent names match expected values
Topics are correctly defined
Capabilities describe what the agent does
No JSON syntax errors

3. Check Agent Logs

Examine logs for errors and message flow:

# View orchestrator logs (last 50 lines)
tail -50 logs/orchestrator.log

# View all logs with timestamps
tail -f logs/*.log

# Search for specific errors
grep -i "error\|exception\|failed" logs/*.log

# Check for routing decisions
grep -i "routing to\|routed to" logs/orchestrator.log

Look for:

Connection errors
Routing decisions showing wrong agent selection
JSON parsing errors
Message processing failures

4. Verify Message Transport

Check if the message transport (WebSocket or RAG board) is working:

# Check if WebSocket server is running
ps aux | grep websocket_server | grep -v grep
netstat -tlnp 2>/dev/null | grep 8765 || ss -tlnp 2>/dev/null | grep 8765

# Check RAG board storage
ls -lh a2a_communicating_agents/storage/
ls -lh storage/

# Check recent messages in message board
tail -20 storage/message_board.jsonl 2>/dev/null || echo "Message board not found"

Expected:

WebSocket server on port 8765 (if using WebSocket transport)
Recent messages in storage/message_board.jsonl (if using RAG transport)
No permission errors accessing storage

5. Test Message Sending

Use the provided test script to send a message and verify delivery:

# Send a test message to orchestrator
python .claude/skills/agent-debug/scripts/test_message.py

This script will:

Send a test message to the orchestrator topic
Wait for response
Show message delivery status
Display any responses received

6. Diagnose Routing Issues

If messages reach orchestrator but route to wrong agent:

Check orchestrator's routing logic:

# View the decide_route method
grep -A 50 "def decide_route" a2a_communicating_agents/orchestrator_agent/main.py

Check priority keyword mappings:

# View fallback routing keywords
grep -A 20 "priority_mappings = {" a2a_communicating_agents/orchestrator_agent/main.py

Verify agent discovery:

# Check discovered agents in logs
grep "Discovered.*agents" logs/orchestrator.log | tail -5

Common routing issues:

Agent not discovered → Check agent.json exists and is valid
Wrong agent selected → Keywords don't match, update priority_mappings
Null target → No suitable agent found, check agent topics/capabilities

7. Check Environment Variables

Verify API keys and configuration:

# Check if OPENAI_API_KEY is set (don't display value)
env | grep -E "(OPENAI|API_KEY)" | sed 's/=.*/=***HIDDEN***/'

# Check model configuration
grep -E "(model|MODEL)" .env 2>/dev/null | sed 's/=.*/=***HIDDEN***/' || echo "No .env file"

Required environment variables:

OPENAI_API_KEY - For LLM-based routing and code generation
ORCHESTRATOR_MODEL or OPENAI_MODEL - Model to use (default: gpt-5-mini)
CODER_MODEL - Model for coder agent (optional, defaults to OPENAI_MODEL)

8. Restart Agents (if needed)

If agents are stuck or not responding:

# Stop all agents
pkill -f "orchestrator_agent/main.py"
pkill -f "coder_agent/main.py"
pkill -f "tester_agent/main.py"
pkill -f "websocket_server.py"

# Wait a moment
sleep 2

# Start WebSocket server (if using)
cd a2a_communicating_agents
nohup python agent_messaging/websocket_server.py > ../logs/websocket.log 2>&1 &

# Start orchestrator
nohup python orchestrator_agent/main.py > ../logs/orchestrator.log 2>&1 &

# Start coder agent
nohup python coder_agent/main.py > ../logs/coder.log 2>&1 &

# Verify they started
sleep 3
ps aux | grep -E "(orchestrator|coder|websocket)" | grep -v grep

9. Common Issues and Solutions

See common_issues.md for a detailed troubleshooting guide covering:

Messages not being delivered
Routing to wrong agent
Agent not generating responses
Duplicate message processing
Transport connectivity problems

Quick Diagnostic Checklist

Run through this checklist systematically:

[ ] All required agents are running (orchestrator, coder, tester)
[ ] WebSocket server is running (if using WebSocket transport)
[ ] Agent configuration files are valid JSON
[ ] Orchestrator discovered all agents (check logs)
[ ] OPENAI_API_KEY is set in environment
[ ] Recent log entries show activity
[ ] No Python exceptions in logs
[ ] Test message sends and receives successfully
[ ] Routing decisions select correct agent

Examples

Example 1: Agent Not Responding to Messages

User problem:

I'm sending messages to the orchestrator but getting no response

Debug workflow:

Check if orchestrator is running:
```
ps aux | grep orchestrator_agent | grep -v grep
```
Result: No process found → Orchestrator isn't running
Check logs for crash:
```
tail -50 logs/orchestrator.log
```
Result: ImportError for OpenAI package
Solution: Install missing dependency
```
pip install openai
```

Restart orchestrator:

cd a2a_communicating_agents
nohup python orchestrator_agent/main.py > ../logs/orchestrator.log 2>&1 &

Verify it's running:

ps aux | grep orchestrator_agent | grep -v grep
tail -10 logs/orchestrator.log

Example 2: Messages Routing to Wrong Agent

User problem:

I asked for code but it routed to dashboard-agent instead of coder-agent

Debug workflow:

Check orchestrator discovered coder-agent:
```
grep "Discovered.*agents" logs/orchestrator.log | tail -1
```
Result: Shows coder-agent in list ✓
Check routing decision in logs:
```
grep -A 5 "please write.*code" logs/orchestrator.log
```
Result: Shows routing to dashboard-agent

Check routing logic:

grep -A 30 "priority_mappings = {" a2a_communicating_agents/orchestrator_agent/main.py

Result: Keywords look correct

Check LLM routing decision:
```
grep "Error in decision making" logs/orchestrator.log
```
Result: LLM routing failed, falling back to heuristic

Check API key:

env | grep OPENAI_API_KEY | sed 's/=.*/=***HIDDEN***/'

Result: Variable not set

Solution: Set API key and restart orchestrator:

export OPENAI_API_KEY="your-key-here"
# Or add to .env file
echo "OPENAI_API_KEY=your-key-here" >> .env

Restart orchestrator to pick up new environment

Example 3: Coder Agent Acknowledges But Doesn't Generate Code

User problem:

Coder agent receives the message but only acknowledges, doesn't generate code

Debug workflow:

Check coder agent logs:
```
grep -i "generate\|code" logs/coder.log | tail -20
```
Result: "OpenAI package not available. Code generation will be limited."

Check if OpenAI is installed:

python -c "import openai; print(openai.__version__)" 2>&1

Result: ModuleNotFoundError

Install OpenAI package:
```
pip install openai
```

Restart coder agent:

pkill -f "coder_agent/main.py"
cd a2a_communicating_agents
nohup python coder_agent/main.py > ../logs/coder.log 2>&1 &

Verify initialization:
```
grep "Initialized with model" logs/coder.log | tail -1
```
Result: Should show model name (e.g., gpt-5-mini)
Send test message and verify code generation

Example 4: Complete System Health Check

User request:

Run a complete diagnostic on the agent system

Complete diagnostic workflow:

Check all agents running:

echo "=== Agent Processes ==="
ps aux | grep -E "(orchestrator|coder|tester|websocket)" | grep -v grep

Check agent configs:

echo "=== Agent Configurations ==="
for agent in orchestrator_agent coder_agent tester_agent; do
  if [ -f "a2a_communicating_agents/$agent/agent.json" ]; then
    echo "--- $agent ---"
    cat "a2a_communicating_agents/$agent/agent.json" | python -m json.tool
  fi
done

Check environment:

echo "=== Environment Variables ==="
env | grep -E "(OPENAI|MODEL)" | sed 's/=.*/=***HIDDEN***/'

Check recent logs:

echo "=== Recent Log Activity ==="
tail -5 logs/*.log 2>/dev/null

Check for errors:

echo "=== Recent Errors ==="
grep -i "error\|exception" logs/*.log | tail -10

Test message sending:

echo "=== Message Transport Test ==="
python .claude/skills/agent-debug/scripts/test_message.py

Provide summary report with:
- Agent status (running/stopped)
- Configuration validity
- Environment completeness
- Recent error count
- Transport test result

Related Tools

orchestrator_chat.py - Interactive chat interface for testing
send_agent_message.py - Send messages programmatically
Agent start/stop scripts in a2a_communicating_agents/

Summary

This skill provides systematic debugging for the A2A agent communication system. Use it whenever:

Agents aren't communicating
Messages aren't being delivered
Routing is incorrect
System behavior is unexpected

Follow the diagnostic steps in order, checking status → configuration → logs → transport → routing. Most issues are:

Agent not running
Missing dependencies
Missing API keys
Invalid configurations
Routing logic issues

Start with the Quick Diagnostic Checklist and drill down based on what fails.

Agent Skills: Agent Communication Debugger

Install this agent skill to your local

Skill Files