Long-Running Agent Framework
Framework for enabling AI agents to work effectively across many context windows on complex tasks.
Core Problem
Long-running agents must work in discrete sessions where each new session begins with no memory of previous work. Without proper scaffolding, agents tend to:
- One-shot attempts - Try to complete everything at once, running out of context mid-implementation
- Premature completion - See partial progress and declare the job done
- Undocumented states - Leave code in broken or undocumented states between sessions
Two-Agent Solution
1. Initializer Agent (First Session Only)
Sets up the environment with all context future agents need:
- Create
init.shscript for environment setup - Generate comprehensive
feature_list.jsonwith all requirements - Initialize
claude-progress.txtfor session logging - Make initial git commit
See references/initializer-prompt.md for the full prompt template.
2. Coding Agent (Every Subsequent Session)
Makes incremental progress while maintaining clean state:
- Read progress files and git logs to get bearings
- Run basic tests to verify working state
- Work on ONE feature at a time
- Test end-to-end before marking complete
- Commit progress with descriptive messages
- Update progress file
See references/coding-prompt.md for the full prompt template.
Session Startup Sequence
Every coding agent session should begin:
1. pwd # Understand working directory
2. cat claude-progress.txt # Read recent progress
3. cat feature_list.json # Check feature status
4. git log --oneline -20 # Review recent commits
5. ./init.sh # Start dev environment
6. <run basic test> # Verify app works
7. <select next feature> # Choose one failing feature
Key Files
feature_list.json
Comprehensive list of all features with pass/fail status. Use JSON format to prevent inappropriate edits.
{
"features": [
{
"category": "functional",
"description": "User can create new chat",
"steps": ["Navigate to main", "Click New Chat", "Verify creation"],
"passes": false
}
]
}
Template: assets/feature_list_template.json
claude-progress.txt
Session-by-session log of work completed. Each entry includes:
- Session timestamp
- Features worked on
- Changes made
- Current state
- Next steps
Template: assets/progress_template.md
init.sh
Environment setup script that:
- Installs dependencies
- Starts development servers
- Sets up any required services
Critical Rules
For Feature List
- Never remove or edit test descriptions
- Only change
passesfield status - Mark as passing ONLY after end-to-end verification
For Progress Tracking
- Always commit before session end
- Write descriptive commit messages
- Update progress file with summary
- Leave environment in mergeable state
For Testing
- Use browser automation for web apps (Puppeteer MCP)
- Test as a human user would
- Verify end-to-end, not just unit tests
- Document any known limitations
Common Failure Modes & Solutions
| Problem | Solution | |---------|----------| | Agent one-shots entire project | Create detailed feature list, work one at a time | | Declares victory too early | Check feature_list.json for failing tests | | Leaves broken state | Run basic test at session start, fix first | | Marks features done prematurely | Require end-to-end browser testing | | Wastes time figuring out setup | Read init.sh, use established patterns |
Adapting to Other Domains
This framework generalizes beyond web development. Key principles:
- Comprehensive task decomposition - Break work into testable units
- Progress persistence - Maintain state across sessions
- Incremental verification - Test after each change
- Clean handoffs - Leave work in resumable state