Agent Skills: Prompt Engineering — Operational Skill

Prompt engineering for production LLMs — structured outputs, RAG, tool workflows, and safety. Use when designing or debugging prompts for LLM APIs.

UncategorizedID: vasilyu1983/ai-agents-public/ai-prompt-engineering

Install this agent skill to your local

pnpm dlx add-skill https://github.com/vasilyu1983/AI-Agents-public/tree/HEAD/frameworks/shared-skills/skills/ai-prompt-engineering

Skill Files

Browse the full folder contents for ai-prompt-engineering.

Download Skill

Loading file tree…

frameworks/shared-skills/skills/ai-prompt-engineering/SKILL.md

Skill Metadata

Name
ai-prompt-engineering
Description
"Prompt engineering for production LLMs — structured outputs, RAG, tool workflows, and safety. Use when designing or debugging prompts for LLM APIs."

Prompt Engineering — Operational Skill

Modern Best Practices (January 2026): versioned prompts, explicit output contracts, regression tests, and safety threat modeling for tool/RAG prompts (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).

This skill provides operational guidance for building production-ready prompts across standard tasks, RAG workflows, agent orchestration, structured outputs, hidden reasoning, and multi-step planning.

All content is operational, not theoretical. Focus on patterns, checklists, and copy-paste templates.

Quick Start (60 seconds)

  1. Pick a pattern from the decision tree (structured output, extractor, RAG, tools/agent, rewrite, classification).
  2. Start from a template in assets/ and fill in TASK, INPUT, RULES, and OUTPUT FORMAT.
  3. Add guardrails: instruction/data separation, “no invented details”, missing → null/explicit missing.
  4. Add validation: JSON parse check, schema check, citations check, post-tool checks.
  5. Add evals: 10–20 cases while iterating, 50–200 before release, plus adversarial injection cases.

Model Notes (2026)

This skill includes Claude Code + Codex CLI optimizations:

  • Action directives: Frame for implementation, not suggestions
  • Parallel tool execution: Independent tool calls can run simultaneously
  • Long-horizon task management: State tracking, incremental progress, context compaction resilience
  • Positive framing: Describe desired behavior rather than prohibitions
  • Style matching: Prompt formatting influences output style
  • Domain-specific patterns: Specialized guidance for frontend, research, and agentic coding
  • Style-adversarial resilience: Stress-test refusals with poetic/role-play rewrites; normalize or decline stylized harmful asks before tool use

Prefer “brief justification” over requesting chain-of-thought. When using private reasoning patterns, instruct: think internally; output only the final answer.

Quick Reference

| Task | Pattern to Use | Key Components | When to Use | |------|----------------|----------------|-------------| | Machine-parseable output | Structured Output | JSON schema, "JSON-only" directive, no prose | API integrations, data extraction | | Field extraction | Deterministic Extractor | Exact schema, missing->null, no transformations | Form data, invoice parsing | | Use retrieved context | RAG Workflow | Context relevance check, chunk citations, explicit missing info | Knowledge bases, documentation search | | Internal reasoning | Hidden Chain-of-Thought | Internal reasoning, final answer only | Classification, complex decisions | | Tool-using agent | Tool/Agent Planner | Plan-then-act, one tool per turn | Multi-step workflows, API calls | | Text transformation | Rewrite + Constrain | Style rules, meaning preservation, format spec | Content adaptation, summarization | | Classification | Decision Tree | Ordered branches, mutually exclusive, JSON result | Routing, categorization, triage |


Decision Tree: Choosing the Right Pattern

User needs: [Prompt Type]
  |-- Output must be machine-readable?
  |     |-- Extract specific fields only? -> **Deterministic Extractor Pattern**
  |     `-- Generate structured data? -> **Structured Output Pattern (JSON)**
  |
  |-- Use external knowledge?
  |     `-- Retrieved context must be cited? -> **RAG Workflow Pattern**
  |
  |-- Requires reasoning but hide process?
  |     `-- Classification or decision task? -> **Hidden Chain-of-Thought Pattern**
  |
  |-- Needs to call external tools/APIs?
  |     `-- Multi-step workflow? -> **Tool/Agent Planner Pattern**
  |
  |-- Transform existing text?
  |     `-- Style/format constraints? -> **Rewrite + Constrain Pattern**
  |
  `-- Classify or route to categories?
        `-- Mutually exclusive rules? -> **Decision Tree Pattern**

Copy/Paste: Minimal Prompt Skeletons

1) Generic "output contract" skeleton

TASK:
{{one_sentence_task}}

INPUT:
{{input_data}}

RULES:
- Follow TASK exactly.
- Use only INPUT (and tool outputs if tools are allowed).
- No invented details. Missing required info -> say what is missing.
- Keep reasoning hidden.
- Follow OUTPUT FORMAT exactly.

OUTPUT FORMAT:
{{schema_or_format_spec}}

2) Tool/agent skeleton (deterministic)

AVAILABLE TOOLS:
{{tool_signatures_or_names}}

WORKFLOW:
- Make a short plan.
- Call tools only when required to complete the task.
- Validate tool outputs before using them.
- If the environment supports parallel tool calls, run independent calls in parallel.

3) RAG skeleton (grounded)

RETRIEVED CONTEXT:
{{chunks_with_ids}}

RULES:
- Use only retrieved context for factual claims.
- Cite chunk ids for each claim.
- If evidence is missing, say what is missing.

Operational Checklists

Use these references when validating or debugging prompts:

  • frameworks/shared-skills/skills/ai-prompt-engineering/references/quality-checklists.md
  • frameworks/shared-skills/skills/ai-prompt-engineering/references/production-guidelines.md

Context Engineering (2026)

True expertise in prompting extends beyond writing instructions to shaping the entire context in which the model operates. Context engineering encompasses:

  • Conversation history: What prior turns inform the current response
  • Retrieved context (RAG): External knowledge injected into the prompt
  • Structured inputs: JSON schemas, system/user message separation
  • Tool outputs: Results from previous tool calls that shape next steps

Context Engineering vs Prompt Engineering

| Aspect | Prompt Engineering | Context Engineering | |--------|-------------------|---------------------| | Focus | Instruction text | Full input pipeline | | Scope | Single prompt | RAG + history + tools | | Optimization | Word choice, structure | Information architecture | | Goal | Clear instructions | Optimal context window |

Key Context Engineering Patterns

1. Context Prioritization: Place most relevant information first; models attend more strongly to early context.

2. Context Compression: Summarize history, truncate tool outputs, select most relevant RAG chunks.

3. Context Separation: Use clear delimiters (<system>, <user>, <context>) to separate instruction types.

4. Dynamic Context: Adjust context based on task complexity - simple tasks need less context, complex tasks need more.


Core Concepts vs Implementation Practices

Core Concepts (Vendor-Agnostic)

  • Prompt contract: inputs, allowed tools, output schema, max tokens, and refusal rules.
  • Context engineering: conversation history, RAG context, tool outputs, and structured inputs shape model behavior.
  • Determinism controls: temperature/top_p, constrained decoding/structured outputs, and strict formatting.
  • Cost & latency budgets: prompt length and max output drive tokens and tail latency; enforce hard limits and measure p95/p99.
  • Evaluation: golden sets + regression gates + A/B + post-deploy monitoring.
  • Security: prompt injection, data exfiltration, and tool misuse are primary threats (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).

Implementation Practices (Model/Platform-Specific)

  • Use model-specific structured output features when available; keep a schema validator as the source of truth.
  • Align tracing/metrics with OpenTelemetry GenAI semantic conventions (https://opentelemetry.io/docs/specs/semconv/gen-ai/).

Do / Avoid

Do

  • Do keep prompts small and modular; centralize shared fragments (policies, schemas, style).
  • Do add a prompt eval harness and block merges on regressions.
  • Do prefer "brief justification" over requesting chain-of-thought; treat hidden reasoning as model-internal.

Avoid

  • Avoid prompt sprawl (many near-duplicates with no owner or tests).
  • Avoid brittle multi-step chains without intermediate validation.
  • Avoid mixing policy and product copy in the same prompt (harder to audit and update).

Navigation: Core Patterns

  • Core Patterns - 7 production-grade prompt patterns
    • Structured Output (JSON), Deterministic Extractor, RAG Workflow
    • Hidden Chain-of-Thought, Tool/Agent Planner, Rewrite + Constrain, Decision Tree
    • Each pattern includes structure template and validation checklist

Navigation: Best Practices

  • Best Practices (Core) - Foundation rules for production-grade prompts

    • System instruction design, output contract specification, action directives
    • Context handling, error recovery, positive framing, style matching, style-adversarial red teaming
    • Anti-patterns, Claude 4+ specific optimizations
  • Production Guidelines - Deployment and operational guidance

    • Evaluation & testing (Prompt CI/CD), model parameters, few-shot selection
    • Safety & guardrails, conversation memory, context compaction resilience
    • Answer engineering, decomposition, multilingual/multimodal, benchmarking
    • CI/CD Tools (2026): Promptfoo, DeepEval integration patterns
    • Security (2026): PromptGuard 4-layer defense, Microsoft Prompt Shields, taint tracking
  • Quality Checklists - Validation checklists before deployment

    • Prompt QA, JSON validation, agent workflow checks
    • RAG workflow, safety & security, performance optimization
    • Testing coverage, anti-patterns, quality score rubric
  • Domain-Specific Patterns - Claude 4+ optimized patterns for specialized domains

    • Frontend/visual code: Creativity encouragement, design variations, micro-interactions
    • Research tasks: Success criteria, verification, hypothesis tracking
    • Agentic coding: No speculation rule, principled implementation, investigation patterns
    • Cross-domain best practices and quality modifiers

Navigation: Specialized Patterns

  • RAG Patterns - Retrieval-augmented generation workflows

    • Context grounding, chunk citation, missing information handling
  • Agent and Tool Patterns - Tool use and agent orchestration

    • Plan-then-act workflows, tool calling, multi-step reasoning, generate-verify-revise chains
    • Multi-Agent Orchestration (2026): centralized, handoff, federated patterns; plan-and-execute (90% cost reduction)
  • Extraction Patterns - Deterministic field extraction

    • Schema-based extraction, null handling, no hallucinations
  • Reasoning Patterns (Hidden CoT) - Internal reasoning without visible output

    • Hidden reasoning, final answer only, classification workflows
    • Extended Thinking API (Claude 4+): budget management, think tool, multishot patterns
  • Additional Patterns - Extended prompt engineering techniques

    • Advanced patterns, edge cases, optimization strategies
  • Prompt Testing & CI/CD - Automated prompt evaluation pipelines

    • Promptfoo, DeepEval integration, regression detection, A/B testing, quality gates
  • Multimodal Prompt Patterns - Vision, audio, and document input patterns

    • Image description, OCR+LLM, bounding box prompts, Whisper conditioning, video frame analysis
  • Prompt Security & Defense - Securing LLM applications against adversarial attacks

    • Injection detection (PromptGuard, Prompt Shields), defense-in-depth, taint tracking, red team testing

Navigation: Templates

Templates are copy-paste ready and organized by complexity:

Quick Templates

Standard Templates


External Resources

External references are listed in data/sources.json:

  • Official documentation (OpenAI, Anthropic, Google)
  • LLM frameworks (LangChain, LlamaIndex)
  • Vector databases (Pinecone, Weaviate, FAISS)
  • Evaluation tools (OpenAI Evals, HELM)
  • Safety guides and standards
  • RAG and retrieval resources

Freshness Rule (2026)

When asked for “latest” prompting recommendations, prefer provider docs and standards from data/sources.json. If web search is unavailable, state the constraint and avoid overconfident “current best” claims.


Related Skills

This skill provides foundational prompt engineering patterns. For specialized implementations:

AI/LLM Skills:

Software Development Skills:


Usage Notes

For Claude Code:

  • Reference this skill when building prompts for agents, commands, or integrations
  • Use Quick Reference table for fast pattern lookup
  • Follow Decision Tree to select appropriate pattern
  • Validate outputs with Quality Checklists before deployment
  • Use templates as starting points, customize for specific use cases

For Codex CLI:

  • Use the same patterns and templates; adapt tool-use wording to the local tool interface
  • For long-horizon tasks, track progress explicitly (a step list/plan) and update it as work completes
  • Run independent reads/searches in parallel when the environment supports it; keep writes/edits serialized
  • AGENTS.md Integration: Place project-specific prompt guidance in AGENTS.md files at global (~/.codex/AGENTS.md), project-level (./AGENTS.md), or subdirectory scope for layered instructions
  • Reasoning Effort: Use medium for interactive coding (default), high/xhigh for complex autonomous multi-hour tasks

Fact-Checking

  • Use web search/web fetch to verify current external facts, versions, pricing, deadlines, regulations, or platform behavior before final answers.
  • Prefer primary sources; report source links and dates for volatile information.
  • If web access is unavailable, state the limitation and mark guidance as unverified.