Agent Skills: Multi-AI Code Review

Multi-perspective code review using Claude, Gemini, and Codex as specialized agents. 5-dimensional analysis (security, performance, maintainability, correctness, style) with LLM-as-judge consensus, quality scoring, and CI/CD integration. Use when reviewing PRs, auditing code quality, preparing production releases, or establishing code review workflows.

UncategorizedID: adaptationio/skrillz/multi-ai-code-review

Install this agent skill to your local

pnpm dlx add-skill https://github.com/adaptationio/Skrillz/tree/HEAD/.claude/skills/multi-ai-code-review

Skill Files

Browse the full folder contents for multi-ai-code-review.

Download Skill

Loading file tree…

.claude/skills/multi-ai-code-review/SKILL.md

Skill Metadata

Name
multi-ai-code-review
Description
Multi-perspective code review using Claude, Gemini, and Codex as specialized agents. 5-dimensional analysis (security, performance, maintainability, correctness, style) with LLM-as-judge consensus, quality scoring, and CI/CD integration. Use when reviewing PRs, auditing code quality, preparing production releases, or establishing code review workflows.

Multi-AI Code Review

Overview

multi-ai-code-review provides comprehensive code review using multiple AI models as specialized agents, each analyzing code from a different perspective. Based on 2024-2025 best practices for AI-assisted code review.

Purpose: Multi-perspective code quality assessment using AI ensemble with human oversight

Pattern: Task-based (5 independent review dimensions + orchestration)

Key Principles (validated by tri-AI research):

  1. Multi-Agent Architecture - Specialized agents for each review dimension
  2. LLM-as-Judge Consensus - Flag issues only when 2+ models agree
  3. Progressive Severity - Critical → High → Medium → Low prioritization
  4. Human-in-Loop - AI suggests, human decides
  5. Quality Gates - Block merges for critical unresolved issues
  6. Actionable Feedback - Every comment has What/Where/Why/How

Quality Targets:

  • False Positive Rate: <15%
  • Fix Acceptance Rate: >40%
  • Review Turnaround: <5 minutes
  • Bug Catch Rate: >30% pre-production

When to Use

Use multi-ai-code-review when:

  • Reviewing pull requests (any size)
  • Auditing code quality before release
  • Establishing consistent code review standards
  • Security auditing code changes
  • Performance profiling changes
  • Technical debt assessment
  • Onboarding reviews (mentorship mode)

When NOT to Use:

  • Trivial changes (typos, comments only)
  • Automated dependency updates (use dependabot labels)
  • Generated code (migrations, scaffolds)

Prerequisites

Required

  • Code to review (diff, file, or directory)
  • At least one AI available (Claude required, Gemini/Codex optional)

Recommended

  • Gemini CLI for web research and fast analysis
  • Codex CLI for deep code reasoning
  • Git repository context

Integration

  • GitHub Actions (optional, for CI/CD)
  • Pre-commit hooks (optional, for local checks)

Review Dimensions

5-Dimensional Analysis

| Dimension | Agent | Focus | Weight | |-----------|-------|-------|--------| | Security | Security Specialist | OWASP Top 10, secrets, injection | 25% | | Performance | Performance Engineer | Complexity, memory, latency | 20% | | Maintainability | Architect | Patterns, modularity, DRY | 25% | | Correctness | QA Engineer | Logic, edge cases, tests | 20% | | Style | Nitpicker | Naming, formatting, conventions | 10% |

Severity Levels

| Level | Action | Examples | |-------|--------|----------| | Critical | Block merge | SQL injection, exposed secrets, data loss | | High | Require fix | Race conditions, missing auth, memory leaks | | Medium | Suggest fix | Code duplication, missing tests, complexity | | Low | Optional | Style issues, naming, minor refactors |


Operations

Operation 1: Quick Security Scan

Time: 2-5 minutes Automation: 80% Purpose: Fast security-focused review

Process:

  1. Scan for Critical Issues:
Review this code for security vulnerabilities:
- SQL injection
- XSS vulnerabilities
- Hardcoded secrets/API keys
- Authentication bypasses
- Authorization flaws
- Input validation gaps
- Insecure dependencies

Code:
[PASTE CODE OR DIFF]

For each issue found, provide:
- Severity (Critical/High/Medium)
- Location (file:line)
- Description (what's wrong)
- Fix (specific code change)
  1. Validate with Gemini (optional):
gemini -p "Verify these security findings. Are any false positives?
[PASTE CLAUDE FINDINGS]

Code context:
[PASTE RELEVANT CODE]"
  1. Output: Security report with consensus findings

Operation 2: Comprehensive PR Review

Time: 10-30 minutes Automation: 60% Purpose: Full multi-dimensional review

Process:

Step 1: Gather Context

# Get PR diff
git diff main...HEAD > /tmp/pr_diff.txt

# Identify affected areas
grep -E "^(\\+\\+\\+|---)" /tmp/pr_diff.txt | head -20

Step 2: Run Parallel Agent Reviews

Use Task tool to launch parallel agents:

Launch 3 parallel review agents:

Agent 1 (Security):
"Review this diff for security issues. Focus on:
- OWASP Top 10 vulnerabilities
- Authentication/authorization
- Input validation
- Secrets exposure
Diff: [DIFF]"

Agent 2 (Maintainability):
"Review this diff for maintainability. Focus on:
- Design patterns used correctly
- Code duplication (DRY)
- Modularity and cohesion
- Documentation quality
Diff: [DIFF]"

Agent 3 (Correctness):
"Review this diff for correctness. Focus on:
- Logic errors
- Edge cases not handled
- Test coverage gaps
- Error handling
Diff: [DIFF]"

Step 3: Orchestrate & Deduplicate

Synthesize findings from all agents:
[PASTE ALL AGENT OUTPUTS]

Tasks:
1. Remove duplicate findings
2. Rank by severity (Critical > High > Medium > Low)
3. Group by file
4. Generate summary table
5. Create final report with consensus issues only

Step 4: Generate Report

Output format:

## PR Review Summary

| File | Risk | Issues | Critical | High | Medium |
|------|------|--------|----------|------|--------|
| auth.py | High | 3 | 1 | 2 | 0 |
| api.py | Medium | 2 | 0 | 1 | 1 |

### Critical Issues (Block Merge)
1. **[auth.py:45]** SQL Injection vulnerability
   - Why: User input directly in query
   - Fix: Use parameterized queries

### High Issues (Require Fix)
...

### Consensus Score: 72/100
- Security: 65/100
- Performance: 80/100
- Maintainability: 70/100
- Correctness: 75/100
- Style: 85/100

Operation 3: LLM-as-Judge Tribunal

Time: 5-15 minutes Automation: 70% Purpose: High-confidence findings through consensus

Process:

  1. Run Code Through Multiple Models:

Claude Analysis:

Analyze this code for issues. Rate severity 1-10 for each:
[CODE]

Gemini Analysis (via CLI):

gemini -p "Analyze this code for issues. Rate severity 1-10 for each:
[CODE]"

Codex Analysis (via CLI):

codex "Analyze this code for issues. Rate severity 1-10 for each:
[CODE]"
  1. Calculate Consensus:
Given these analyses from 3 AI models:

Claude: [FINDINGS]
Gemini: [FINDINGS]
Codex: [FINDINGS]

Identify issues where at least 2 models agree:
1. List consensus findings
2. Average severity scores
3. Note any disagreements
4. Final verdict for each issue
  1. Output: High-confidence issue list (≥67% agreement)

Operation 4: Mentorship Review

Time: 15-30 minutes Automation: 40% Purpose: Educational code review for learning

Process:

Review this code in mentorship mode. For a developer learning [LANGUAGE/FRAMEWORK]:

Code: [CODE]

For each finding:
1. **What's the issue** (be encouraging, not critical)
2. **Why it matters** (explain the underlying concept)
3. **How to improve** (show before/after with explanation)
4. **Learn more** (link to relevant documentation)

Also highlight:
- What was done well
- Good patterns to continue using
- Growth opportunities

Tone: Supportive and educational, never condescending.

Operation 5: Pre-Release Audit

Time: 30-60 minutes Automation: 50% Purpose: Comprehensive review before production

Process:

  1. Full Codebase Scan:
# Identify all changes since last release
git diff v1.0.0...HEAD --stat
git log v1.0.0...HEAD --oneline
  1. Security Deep Dive:
  • Run all security checks
  • Verify no new vulnerabilities
  • Check dependency updates
  • Audit secrets management
  1. Performance Review:
  • Identify potential bottlenecks
  • Review database queries
  • Check for N+1 problems
  • Validate caching strategies
  1. Test Coverage:
  • Verify test coverage targets
  • Check critical path coverage
  • Validate edge case tests
  1. Generate Release Report:
## Pre-Release Audit: v1.1.0

### Security Clearance: PASS ✓
- No critical vulnerabilities
- All high issues resolved
- Secrets audit: Clean

### Performance Assessment: PASS ✓
- No new N+1 queries
- Response time within SLA
- Memory usage stable

### Test Coverage: 82% (target: 80%)
- Critical paths: 95%
- Edge cases: 78%

### Release Recommendation: APPROVED

Multi-AI Coordination

Agent Assignment Strategy

| Task | Primary | Verification | Speed | |------|---------|--------------|-------| | Security scan | Claude | Gemini | Fast | | Architecture review | Claude | Codex | Medium | | Logic validation | Codex | Claude | Medium | | Style checking | Gemini | Claude | Fast | | Performance analysis | Claude | Codex | Medium |

Coordination Commands

Launch Multi-Agent Review:

# Using Task tool for parallel execution
# Each agent reviews independently, orchestrator synthesizes

Gemini Quick Check:

gemini -p "Quick security scan of this code: [CODE]"

Codex Deep Analysis:

codex "Analyze this code architecture and suggest improvements: [CODE]"

CI/CD Integration

GitHub Actions Workflow

# .github/workflows/ai-review.yml
name: Multi-AI Code Review
on: [pull_request]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Get PR Diff
        run: |
          git diff origin/main...HEAD > pr_diff.txt

      - name: Claude Review
        uses: anthropics/claude-code-action@v1
        with:
          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
          model: "claude-sonnet-4-5-20250929"
          review_level: "detailed"

      - name: Post Summary
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: `## AI Review Summary\n${process.env.REVIEW_SUMMARY}`
            })

Quality Gate Configuration

# Block merge for critical issues
quality_gates:
  critical_issues: 0      # Must be zero
  high_issues: 3          # Max allowed
  coverage_minimum: 80    # Percent
  score_minimum: 70       # Out of 100

Quality Scoring

Scoring Formula

Overall = (Security × 0.25) + (Performance × 0.20) +
          (Maintainability × 0.25) + (Correctness × 0.20) +
          (Style × 0.10)

Grade Mapping

| Score | Grade | Status | |-------|-------|--------| | ≥90 | A | Excellent - Ship it | | 80-89 | B | Good - Minor fixes | | 70-79 | C | Acceptable - Address issues | | 60-69 | D | Needs work - Significant fixes | | <60 | F | Failing - Major revision needed |


Anti-Patterns to Detect

  1. Hardcoded Secrets - API keys, passwords in code
  2. SQL Injection - Unparameterized queries
  3. XSS Vulnerabilities - Unsanitized output
  4. Race Conditions - Unprotected shared state
  5. Memory Leaks - Unclosed resources
  6. N+1 Queries - Loop database calls
  7. Dead Code - Unreachable branches
  8. God Objects - Classes doing too much
  9. Copy-Paste Code - Duplicated logic
  10. Missing Error Handling - Unhandled exceptions

Example Review Session

User: Review this PR for my authentication module

Claude: I'll perform a comprehensive multi-dimensional review.

[Launches parallel agents for security, maintainability, correctness]

## PR Review: Authentication Module

### Critical Issues (1)
1. **[auth.py:67]** Password stored in plaintext
   - Severity: Critical
   - Consensus: 3/3 models agree
   - Fix: Use bcrypt hashing
   ```python
   # Before
   user.password = request.password

   # After
   import bcrypt
   user.password = bcrypt.hashpw(request.password.encode(), bcrypt.gensalt())

High Issues (2)

  1. [auth.py:45] No rate limiting on login endpoint
  2. [auth.py:89] JWT secret hardcoded

Quality Score: 58/100 (Grade: F)

  • Security: 35/100 (Critical issues)
  • Performance: 70/100
  • Maintainability: 65/100
  • Correctness: 60/100
  • Style: 80/100

Recommendation: BLOCK MERGE

Resolve critical security issues before merging.


---

## Related Skills

- **multi-ai-testing**: Generate tests for reviewed code
- **multi-ai-verification**: Validate fixes
- **multi-ai-implementation**: Implement suggested fixes
- **codex-review**: Codex-specific review patterns
- **review-multi**: Skill-specific reviews

---

## References

- `references/security-checklist.md` - OWASP Top 10 checklist
- `references/performance-patterns.md` - Performance anti-patterns
- `references/ci-cd-integration.md` - Full CI/CD setup guide