E2E Behavior Validation for Frontend Modifications Skill

E2E Behavior Validation for Frontend Modifications

Core Principle: Test Product Behavior, Not UI States

CRITICAL: Tests must verify that product features WORK correctly, not just that UI elements render.

What NOT to test (UI States):

❌ "Dropdown opens when clicked"
❌ "Modal appears after button click"
❌ "Loading spinner shows during request"
❌ "Form fields are visible"
❌ "Sidebar collapses"

What TO test (Product Behavior):

✅ "Selecting an LLM provider configures the agent to use that provider"
✅ "Creating a new agent persists it and shows in the agents list"
✅ "Running a tool with parameters returns the expected output"
✅ "Chat messages stream correctly and maintain conversation context"
✅ "Workflow execution triggers tools in the correct order"

Prerequisites

Requires Playwright MCP server. If the browser_navigate tool is unavailable, instruct the user to add it:

claude mcp add playwright -- npx @playwright/mcp@latest

Step 1: Understand the Feature Intent

Before writing ANY test, answer these questions:

What user problem does this feature solve?
What is the expected outcome when the feature works correctly?
What data flows through the system? (user input → API → state → UI)
What should persist after page reload?
What downstream effects should this action have?

Document these answers as comments in your test file.

Step 2: Build and Start

pnpm build:cli
cd packages/playground/e2e/kitchen-sink && pnpm dev

Verify server at http://localhost:4111

Step 3: Map Feature to Behavior Tests

Feature-to-Test Mapping Guide

| Feature Category | What to Test | Example Assertion | | -------------------------- | ------------------------------------------------- | ------------------------------------------------------------ | | Agent Configuration | Config changes affect agent behavior | Send message → verify response uses selected model | | LLM Provider Selection | Selected provider is used in requests | Intercept API call → verify provider in request payload | | Tool Execution | Tool runs with correct params & returns result | Execute tool → verify output matches expected transformation | | Workflow Execution | Steps execute in order, data flows between steps | Run workflow → verify each step's output feeds next step | | Chat/Streaming | Messages persist, context maintained across turns | Multi-turn conversation → verify context awareness | | MCP Server Tools | Server tools are callable and return data | Call MCP tool → verify response structure and content | | Memory/Persistence | Data survives page reload | Create item → reload → verify item exists | | Error Handling | Errors surface correctly to user | Trigger error condition → verify error message + recovery |

Step 4: Write Behavior-Focused Tests

Test Structure Template

import { test, expect, Page } from '@playwright/test';
import { resetStorage } from '../__utils__/reset-storage';
import { selectFixture } from '../__utils__/select-fixture';
import { nanoid } from 'nanoid';

/**
 * FEATURE: [Name of feature]
 * USER STORY: As a user, I want to [action] so that [outcome]
 * BEHAVIOR UNDER TEST: [Specific behavior being validated]
 */

test.describe('[Feature Name] - Behavior Tests', () => {
  let page: Page;

  test.beforeEach(async ({ browser }) => {
    const context = await browser.newContext();
    page = await context.newPage();
  });

  test.afterEach(async () => {
    await resetStorage(page);
  });

  test('should [verb describing behavior] when [trigger condition]', async () => {
    // ARRANGE: Set up preconditions
    // - Navigate to the feature
    // - Configure any required state
    // ACT: Perform the user action that triggers the behavior
    // ASSERT: Verify the OUTCOME, not the UI state
    // - Check data persistence
    // - Verify downstream effects
    // - Confirm API calls made correctly
  });
});

Behavior Test Patterns

Pattern 1: Configuration Affects Behavior

test('selecting LLM provider should use that provider for agent responses', async () => {
  // ARRANGE
  await page.goto('/agents/my-agent/chat');

  // Intercept API to verify provider
  let capturedProvider: string | null = null;
  await page.route('**/api/chat', route => {
    const body = JSON.parse(route.request().postData() || '{}');
    capturedProvider = body.provider;
    route.continue();
  });

  // ACT: Select a different provider
  await page.getByTestId('provider-selector').click();
  await page.getByRole('option', { name: 'OpenAI' }).click();

  // Send a message to trigger the agent
  await page.getByTestId('chat-input').fill('Hello');
  await page.getByTestId('send-button').click();

  // ASSERT: Verify the selected provider was used
  await expect.poll(() => capturedProvider).toBe('openai');
});

Pattern 2: Data Persistence

test('created agent should persist after page reload', async () => {
  // ARRANGE
  await page.goto('/agents');
  const agentName = `Test Agent ${nanoid()}`;

  // ACT: Create new agent
  await page.getByTestId('create-agent-button').click();
  await page.getByTestId('agent-name-input').fill(agentName);
  await page.getByTestId('save-agent-button').click();

  // Wait for creation to complete
  await expect(page.getByText(agentName)).toBeVisible();

  // ASSERT: Verify persistence
  await page.reload();
  await expect(page.getByText(agentName)).toBeVisible({ timeout: 10000 });
});

Pattern 3: Tool Execution Produces Correct Output

test('weather tool should return formatted weather data', async () => {
  // ARRANGE
  await selectFixture(page, 'weather-success');
  await page.goto('/tools/weather-tool');

  // ACT: Execute tool with parameters
  await page.getByTestId('param-city').fill('San Francisco');
  await page.getByTestId('execute-tool-button').click();

  // ASSERT: Verify OUTPUT content, not just that output appears
  const output = page.getByTestId('tool-output');
  await expect(output).toContainText('temperature');
  await expect(output).toContainText('San Francisco');

  // Verify structured data if applicable
  const outputText = await output.textContent();
  const outputData = JSON.parse(outputText || '{}');
  expect(outputData).toHaveProperty('temperature');
  expect(outputData).toHaveProperty('conditions');
});

Pattern 4: Workflow Step Chaining

test('workflow should pass data between steps correctly', async () => {
  // ARRANGE
  await selectFixture(page, 'workflow-multi-step');
  const sessionId = nanoid();
  await page.goto(`/workflows/data-pipeline?session=${sessionId}`);

  // ACT: Trigger workflow execution
  await page.getByTestId('workflow-input').fill('test input data');
  await page.getByTestId('run-workflow-button').click();

  // ASSERT: Verify each step received correct input from previous step
  // Wait for completion
  await expect(page.getByTestId('workflow-status')).toHaveText('completed', { timeout: 30000 });

  // Check step outputs show data transformation chain
  const step1Output = await page.getByTestId('step-1-output').textContent();
  const step2Output = await page.getByTestId('step-2-output').textContent();

  // Verify step 2 received step 1's output as input
  expect(step2Output).toContain(step1Output);
});

Pattern 5: Streaming Chat with Context

test('chat should maintain conversation context across messages', async () => {
  // ARRANGE
  await selectFixture(page, 'contextual-chat');
  const chatId = nanoid();
  await page.goto(`/agents/assistant/chat/${chatId}`);

  // ACT: Multi-turn conversation
  await page.getByTestId('chat-input').fill('My name is Alice');
  await page.getByTestId('send-button').click();
  await expect(page.getByTestId('assistant-message').last()).toBeVisible({ timeout: 20000 });

  await page.getByTestId('chat-input').fill('What is my name?');
  await page.getByTestId('send-button').click();

  // ASSERT: Verify context was maintained
  const response = page.getByTestId('assistant-message').last();
  await expect(response).toContainText('Alice', { timeout: 20000 });
});

Pattern 6: Error Recovery

test('should show actionable error and allow retry when API fails', async () => {
  // ARRANGE: Set up failure fixture
  await selectFixture(page, 'api-failure');
  await page.goto('/tools/flaky-tool');

  // ACT: Trigger the error
  await page.getByTestId('execute-tool-button').click();

  // ASSERT: Error is shown with recovery option
  await expect(page.getByTestId('error-message')).toContainText('failed');
  await expect(page.getByTestId('retry-button')).toBeVisible();

  // Switch to success fixture and retry
  await selectFixture(page, 'api-success');
  await page.getByTestId('retry-button').click();

  // Verify recovery worked
  await expect(page.getByTestId('tool-output')).toBeVisible({ timeout: 10000 });
  await expect(page.getByTestId('error-message')).not.toBeVisible();
});

Step 5: Update Existing Tests

When a test file already exists:

Read the existing tests to understand current coverage
Identify if tests are UI-focused or behavior-focused
Refactor UI-focused tests to verify behavior instead:

Refactoring Example

BEFORE (UI-focused):

test('dropdown opens when clicked', async () => {
  await page.getByTestId('model-dropdown').click();
  await expect(page.getByRole('listbox')).toBeVisible();
});

AFTER (Behavior-focused):

test('selecting model from dropdown updates agent configuration', async () => {
  // Open dropdown and select model
  await page.getByTestId('model-dropdown').click();
  await page.getByRole('option', { name: 'GPT-4' }).click();

  // Verify the selection persists and affects behavior
  await page.reload();
  await expect(page.getByTestId('model-dropdown')).toHaveText('GPT-4');

  // Optionally: verify the model is used in actual requests
  // (via request interception or checking response metadata)
});

Step 6: Kitchen-Sink Fixtures for Behavior Testing

Fixtures should represent realistic scenarios, not just mock data:

Fixture Naming Convention

<feature>-<scenario>.fixture.ts

Examples:
- agent-with-tools.fixture.ts
- chat-multi-turn-context.fixture.ts
- workflow-parallel-execution.fixture.ts
- tool-validation-error.fixture.ts
- mcp-server-timeout.fixture.ts

Fixture Content Requirements

Each fixture must define:

Scenario description (what behavior it enables testing)
Expected outcomes (what assertions should pass)
Edge cases covered (error states, empty states, etc.)

// fixtures/agent-provider-switch.fixture.ts
export const agentProviderSwitch = {
  name: 'agent-provider-switch',
  description: 'Tests that switching LLM providers changes agent behavior',

  // Mock responses for different providers
  responses: {
    openai: { content: 'Response from OpenAI', model: 'gpt-4' },
    anthropic: { content: 'Response from Anthropic', model: 'claude-3' },
  },

  expectedBehavior: {
    // When provider is switched, subsequent messages use new provider
    providerSwitchAffectsNextMessage: true,
    // Provider selection persists across page reload
    providerPersistsOnReload: true,
  },
};

Step 7: Run and Validate

cd packages/playground && pnpm test:e2e

Test Quality Checklist

Before considering tests complete, verify:

[ ] Each test has a clear user story comment
[ ] Tests verify OUTCOMES, not intermediate UI states
[ ] Tests would FAIL if the feature broke (not just if UI changed)
[ ] Persistence is verified via page.reload() where applicable
[ ] Error scenarios are covered
[ ] Tests use appropriate timeouts for async operations
[ ] Fixtures represent realistic usage scenarios

Quick Reference

| Step | Command/Action | | --------- | ----------------------------------------------------- | | Build | pnpm build:cli | | Start | cd packages/playground/e2e/kitchen-sink && pnpm dev | | App URL | http://localhost:4111 | | Routes | @packages/playground/src/App.tsx | | Run tests | cd packages/playground && pnpm test:e2e | | Test dir | packages/playground/e2e/tests/ | | Fixtures | packages/playground/e2e/kitchen-sink/fixtures/ |

Anti-Patterns to Avoid

| ❌ Don't | ✅ Do Instead | | ---------------------------------- | ------------------------------------------------------------ | | Test that modal opens | Test that modal action completes and persists | | Test that button is clickable | Test that clicking button produces expected result | | Test loading spinner appears | Test that loaded data is correct | | Test form validation message shows | Test that invalid form cannot submit AND valid form succeeds | | Test dropdown has options | Test that selecting option changes system behavior | | Test sidebar navigation works | Test that navigated page has correct data/functionality | | Assert element is visible | Assert element contains expected data/state |

Agent Skills: E2E Behavior Validation for Frontend Modifications

Install this agent skill to your local

Skill Files