Agent Communication Debugger
Debug and diagnose issues with the A2A (Agent-to-Agent) communication system, including the orchestrator, coder-agent, tester-agent, and message transport layers.
Prerequisites
- A2A agent system located in
a2a_communicating_agents/ - Python 3.10+ environment
- Access to agent logs in
logs/directory - Agent configurations in respective
agent.jsonfiles
Instructions
1. Check Agent Status
First, determine which agents are running:
# Check all agent processes
ps aux | grep -E "(orchestrator|coder|tester|websocket)_agent|main.py" | grep -v grep
Look for:
orchestrator_agent/main.pycoder_agent/main.pytester_agent/main.pywebsocket_server.py
Common issues:
- Agent process not found → Agent isn't running, needs to be started
- Multiple instances → Duplicate processes causing conflicts
2. Inspect Agent Configurations
Read the agent configuration files to verify capabilities and topics:
# View orchestrator config
cat a2a_communicating_agents/orchestrator_agent/agent.json
# View coder agent config
cat a2a_communicating_agents/coder_agent/agent.json
# View tester agent config (if exists)
cat a2a_communicating_agents/tester_agent/agent.json
Verify:
- Agent names match expected values
- Topics are correctly defined
- Capabilities describe what the agent does
- No JSON syntax errors
3. Check Agent Logs
Examine logs for errors and message flow:
# View orchestrator logs (last 50 lines)
tail -50 logs/orchestrator.log
# View all logs with timestamps
tail -f logs/*.log
# Search for specific errors
grep -i "error\|exception\|failed" logs/*.log
# Check for routing decisions
grep -i "routing to\|routed to" logs/orchestrator.log
Look for:
- Connection errors
- Routing decisions showing wrong agent selection
- JSON parsing errors
- Message processing failures
4. Verify Message Transport
Check if the message transport (WebSocket or RAG board) is working:
# Check if WebSocket server is running
ps aux | grep websocket_server | grep -v grep
netstat -tlnp 2>/dev/null | grep 8765 || ss -tlnp 2>/dev/null | grep 8765
# Check RAG board storage
ls -lh a2a_communicating_agents/storage/
ls -lh storage/
# Check recent messages in message board
tail -20 storage/message_board.jsonl 2>/dev/null || echo "Message board not found"
Expected:
- WebSocket server on port 8765 (if using WebSocket transport)
- Recent messages in storage/message_board.jsonl (if using RAG transport)
- No permission errors accessing storage
5. Test Message Sending
Use the provided test script to send a message and verify delivery:
# Send a test message to orchestrator
python .claude/skills/agent-debug/scripts/test_message.py
This script will:
- Send a test message to the orchestrator topic
- Wait for response
- Show message delivery status
- Display any responses received
6. Diagnose Routing Issues
If messages reach orchestrator but route to wrong agent:
Check orchestrator's routing logic:
# View the decide_route method
grep -A 50 "def decide_route" a2a_communicating_agents/orchestrator_agent/main.py
Check priority keyword mappings:
# View fallback routing keywords
grep -A 20 "priority_mappings = {" a2a_communicating_agents/orchestrator_agent/main.py
Verify agent discovery:
# Check discovered agents in logs
grep "Discovered.*agents" logs/orchestrator.log | tail -5
Common routing issues:
- Agent not discovered → Check agent.json exists and is valid
- Wrong agent selected → Keywords don't match, update priority_mappings
- Null target → No suitable agent found, check agent topics/capabilities
7. Check Environment Variables
Verify API keys and configuration:
# Check if OPENAI_API_KEY is set (don't display value)
env | grep -E "(OPENAI|API_KEY)" | sed 's/=.*/=***HIDDEN***/'
# Check model configuration
grep -E "(model|MODEL)" .env 2>/dev/null | sed 's/=.*/=***HIDDEN***/' || echo "No .env file"
Required environment variables:
OPENAI_API_KEY- For LLM-based routing and code generationORCHESTRATOR_MODELorOPENAI_MODEL- Model to use (default: gpt-5-mini)CODER_MODEL- Model for coder agent (optional, defaults to OPENAI_MODEL)
8. Restart Agents (if needed)
If agents are stuck or not responding:
# Stop all agents
pkill -f "orchestrator_agent/main.py"
pkill -f "coder_agent/main.py"
pkill -f "tester_agent/main.py"
pkill -f "websocket_server.py"
# Wait a moment
sleep 2
# Start WebSocket server (if using)
cd a2a_communicating_agents
nohup python agent_messaging/websocket_server.py > ../logs/websocket.log 2>&1 &
# Start orchestrator
nohup python orchestrator_agent/main.py > ../logs/orchestrator.log 2>&1 &
# Start coder agent
nohup python coder_agent/main.py > ../logs/coder.log 2>&1 &
# Verify they started
sleep 3
ps aux | grep -E "(orchestrator|coder|websocket)" | grep -v grep
9. Common Issues and Solutions
See common_issues.md for a detailed troubleshooting guide covering:
- Messages not being delivered
- Routing to wrong agent
- Agent not generating responses
- Duplicate message processing
- Transport connectivity problems
Quick Diagnostic Checklist
Run through this checklist systematically:
- [ ] All required agents are running (orchestrator, coder, tester)
- [ ] WebSocket server is running (if using WebSocket transport)
- [ ] Agent configuration files are valid JSON
- [ ] Orchestrator discovered all agents (check logs)
- [ ] OPENAI_API_KEY is set in environment
- [ ] Recent log entries show activity
- [ ] No Python exceptions in logs
- [ ] Test message sends and receives successfully
- [ ] Routing decisions select correct agent
Examples
Example 1: Agent Not Responding to Messages
User problem:
I'm sending messages to the orchestrator but getting no response
Debug workflow:
-
Check if orchestrator is running:
ps aux | grep orchestrator_agent | grep -v grepResult: No process found → Orchestrator isn't running
-
Check logs for crash:
tail -50 logs/orchestrator.logResult: ImportError for OpenAI package
-
Solution: Install missing dependency
pip install openai -
Restart orchestrator:
cd a2a_communicating_agents nohup python orchestrator_agent/main.py > ../logs/orchestrator.log 2>&1 & -
Verify it's running:
ps aux | grep orchestrator_agent | grep -v grep tail -10 logs/orchestrator.log
Example 2: Messages Routing to Wrong Agent
User problem:
I asked for code but it routed to dashboard-agent instead of coder-agent
Debug workflow:
-
Check orchestrator discovered coder-agent:
grep "Discovered.*agents" logs/orchestrator.log | tail -1Result: Shows coder-agent in list ✓
-
Check routing decision in logs:
grep -A 5 "please write.*code" logs/orchestrator.logResult: Shows routing to dashboard-agent
-
Check routing logic:
grep -A 30 "priority_mappings = {" a2a_communicating_agents/orchestrator_agent/main.pyResult: Keywords look correct
-
Check LLM routing decision:
grep "Error in decision making" logs/orchestrator.logResult: LLM routing failed, falling back to heuristic
-
Check API key:
env | grep OPENAI_API_KEY | sed 's/=.*/=***HIDDEN***/'Result: Variable not set
-
Solution: Set API key and restart orchestrator:
export OPENAI_API_KEY="your-key-here" # Or add to .env file echo "OPENAI_API_KEY=your-key-here" >> .env -
Restart orchestrator to pick up new environment
Example 3: Coder Agent Acknowledges But Doesn't Generate Code
User problem:
Coder agent receives the message but only acknowledges, doesn't generate code
Debug workflow:
-
Check coder agent logs:
grep -i "generate\|code" logs/coder.log | tail -20Result: "OpenAI package not available. Code generation will be limited."
-
Check if OpenAI is installed:
python -c "import openai; print(openai.__version__)" 2>&1Result: ModuleNotFoundError
-
Install OpenAI package:
pip install openai -
Restart coder agent:
pkill -f "coder_agent/main.py" cd a2a_communicating_agents nohup python coder_agent/main.py > ../logs/coder.log 2>&1 & -
Verify initialization:
grep "Initialized with model" logs/coder.log | tail -1Result: Should show model name (e.g., gpt-5-mini)
-
Send test message and verify code generation
Example 4: Complete System Health Check
User request:
Run a complete diagnostic on the agent system
Complete diagnostic workflow:
-
Check all agents running:
echo "=== Agent Processes ===" ps aux | grep -E "(orchestrator|coder|tester|websocket)" | grep -v grep -
Check agent configs:
echo "=== Agent Configurations ===" for agent in orchestrator_agent coder_agent tester_agent; do if [ -f "a2a_communicating_agents/$agent/agent.json" ]; then echo "--- $agent ---" cat "a2a_communicating_agents/$agent/agent.json" | python -m json.tool fi done -
Check environment:
echo "=== Environment Variables ===" env | grep -E "(OPENAI|MODEL)" | sed 's/=.*/=***HIDDEN***/' -
Check recent logs:
echo "=== Recent Log Activity ===" tail -5 logs/*.log 2>/dev/null -
Check for errors:
echo "=== Recent Errors ===" grep -i "error\|exception" logs/*.log | tail -10 -
Test message sending:
echo "=== Message Transport Test ===" python .claude/skills/agent-debug/scripts/test_message.py -
Provide summary report with:
- Agent status (running/stopped)
- Configuration validity
- Environment completeness
- Recent error count
- Transport test result
Related Tools
orchestrator_chat.py- Interactive chat interface for testingsend_agent_message.py- Send messages programmatically- Agent start/stop scripts in
a2a_communicating_agents/
Summary
This skill provides systematic debugging for the A2A agent communication system. Use it whenever:
- Agents aren't communicating
- Messages aren't being delivered
- Routing is incorrect
- System behavior is unexpected
Follow the diagnostic steps in order, checking status → configuration → logs → transport → routing. Most issues are:
- Agent not running
- Missing dependencies
- Missing API keys
- Invalid configurations
- Routing logic issues
Start with the Quick Diagnostic Checklist and drill down based on what fails.