Agent Skills: AI UT Code Review

Use when reviewing or scoring AI-generated unit tests/UT code, especially when coverage, assertion effectiveness, or test quality is in question and a numeric score, risk level, or must-fix checklist is needed

UncategorizedID: openharmonyinsight/openharmony-skills/ai-generated-ut-code-review

Install this agent skill to your local

pnpm dlx add-skill https://github.com/openharmonyinsight/openharmony-skills/tree/HEAD/skills/ai-generated-ut-code-review

Skill Files

Browse the full folder contents for ai-generated-ut-code-review.

Download Skill

Loading file tree…

skills/ai-generated-ut-code-review/SKILL.md

Skill Metadata

Name
ai-generated-ut-code-review
Description
Use when reviewing or scoring AI-generated unit tests/UT code, especially when coverage, assertion effectiveness, or test quality is in question and a numeric score, risk level, or must-fix checklist is needed

AI UT Code Review

Overview

Review AI-generated unit tests for effectiveness, coverage, assertions, negative cases, determinism, and maintainability. Output a 0-10 score, a risk level, and a must-fix checklist. Overall line coverage must be >= 80%; otherwise risk is at least High.

When to Use

  • AI-generated UT/test code review or quality evaluation
  • Need scoring, risk level, or must-fix checklist
  • Questions about coverage or assertion validity

Workflow

  1. Confirm tests target the intended business code and key paths.
  2. Check overall line coverage (>= 80% required).
  3. Inspect assertions for behavioral validity; flag missing/ineffective assertions.
  4. Verify negative/edge cases and determinism (no env/time dependency).
  5. Score by rubric, assign risk, list must-fix items with evidence.

Scoring (0-10)

Each dimension 0-2 points. Sum = total score.

| Dimension | 0 | 1 | 2 | | --- | --- | --- | --- | | Coverage | < 80% | 80%+ but shallow | 80%+ and meaningful | | Assertion Quality | No/invalid assertions | Some weak assertions | Behavior-anchored assertions | | Negative & Edge | Missing | Partial | Comprehensive | | Data & Isolation | Flaky/env-dependent | Mixed | Deterministic, isolated | | Maintainability | Hard to read/modify | Mixed quality | Clear structure & naming |

Risk Levels

  • Blocker: Coverage < 80% AND key paths untested, or tests have no meaningful assertions
  • High: Coverage < 80% OR assertions largely ineffective
  • Medium: Coverage OK but weak edge cases or fragile design
  • Low: Minor improvements

Must-Fix Checklist

  • Overall line coverage >= 80%
  • Each test has at least one behavior-relevant assertion
  • Negative/exception cases exist for core logic
  • Tests are deterministic and repeatable

AI-Generated Test Pitfalls (Check Explicitly)

  • No assertions or assertions unrelated to behavior (e.g., only not-null)
  • Over-mocking hides real behavior
  • Only happy-path coverage
  • Tests depend on time/network/env
  • Missing verification of side effects

Output Format (Required, Semi-fixed)

  • Score: x/10 — Coverage x, Assertion Quality x, Negative & Edge x, Data & Isolation x, Maintainability x
  • Risk: Low/Medium/High/Blocker — 简述原因(1 行)
  • Must-fix:
    • [动作 + 证据]
    • [动作 + 证据]
  • Key Evidence:
    • 引用具体测试用例名或覆盖率报告摘要(1-2 条)
  • Notes:
    • 最小修复建议或替代方案(1-2 行)

Rules:

  • 覆盖率 < 80% 风险至少 High,并必须列入 Must-fix
  • 无断言/无效断言直接提升风险级别,必须列入 Must-fix
  • 至少 2 条证据;证据不足需说明并降分

Common Mistakes

  • 仅报告覆盖率,不评价断言有效性
  • 把日志输出当成断言
  • 忽略失败路径/异常路径

Example (Concise)

Score: 5/10 (Coverage 1, Assertion 0, Negative 1, Data 2, Maintainability 1) Risk: High Must-fix:

  • Tests for parseConfig() contain no behavior assertions (only logs)
  • No negative cases for malformed input Key Evidence:
  • parseConfig() tests only assert no crash
  • Coverage report shows 62% lines Notes:
  • Add assertions on outputs and side effects; add invalid input tests.