Agent Skills: Structured Outputs

Techniques for ensuring LLM responses adhere to strict JSON schemas, utilizing Pydantic models, JSON mode, and schema-based refusals. Triggers: structured-output, pydantic, json-schema, json-mode, llm-response-parsing.

UncategorizedID: cuba6112/skillfactory/structured-outputs

Install this agent skill to your local

pnpm dlx add-skill https://github.com/cuba6112/skillfactory/tree/HEAD/skills/structured-outputs

Skill Files

Browse the full folder contents for structured-outputs.

Download Skill

Loading file tree…

skills/structured-outputs/SKILL.md

Skill Metadata

Name
structured-outputs
Description
Techniques for ensuring LLM responses adhere to strict JSON schemas, utilizing Pydantic models, JSON mode, and schema-based refusals. Triggers: structured-output, pydantic, json-schema, json-mode, llm-response-parsing.

Structured Outputs

Overview

Structured Outputs ensure that an LLM's response always conforms to a predefined JSON schema. This moves beyond simple prompt engineering to guaranteed syntactic and structural correctness, essential for programmatic integration.

When to Use

  • API Integration: When LLM output must be parsed by a machine (e.g., generating database records).
  • UI Components: When the LLM generates data to populate a specific frontend interface.
  • Data Extraction: Converting unstructured text into standardized JSON objects.

Decision Tree

  1. Do you need 100% guarantee of schema adherence?
    • YES: Use Structured Output mode (e.g., OpenAI's strict: true or Gemini's response_mime_type).
    • NO: JSON Mode may suffice.
  2. Is the structure complex or nested?
    • YES: Use Pydantic models and model_rebuild if recursive.
  3. Do you need to handle safety refusals programmatically?
    • YES: Check for the refusal field in the response.

Workflows

1. Defining a Pydantic-based Schema

  1. Define a Python class inheriting from BaseModel.
  2. Use specific types (e.g., int, List[str]) and Field descriptions to guide the model.
  3. Pass the model directly to the SDK's parse method (e.g., client.beta.chat.completions.parse).
  4. Access the parsed result via response.choices[0].message.parsed.

2. Handling Optional and Nullable Fields

  1. In the JSON schema, define types as an array: ["string", "null"].
  2. In Pydantic, use Optional[str] = None.
  3. Instruct the model in the field description when to use null vs. an empty value.

3. Streaming Structured JSON

  1. Initiate a generate_content_stream call with the JSON schema configuration.
  2. Iterate over chunks as they arrive from the model.
  3. Concatenate the partial JSON strings; note that the full object is only valid JSON once the stream finishes.

Non-Obvious Insights

  • Order Matters: Models typically produce outputs in the same order as the keys defined in the schema.
  • Refusals as First-Class Citizens: Safety-based model refusals are now programmatically detectable as a separate field, preventing the parser from failing on malformed JSON when the model won't answer.
  • Beyond JSON Mode: Structured Outputs guarantee schema adherence, whereas JSON Mode only guarantees valid JSON syntax without specific structure enforcement.

Evidence

  • "Structured Outputs is a feature that ensures the model will always generate responses that adhere to your supplied JSON Schema." - OpenAI
  • "The model will produce outputs in the same order as the keys in the schema." - Google AI
  • "Safety-based model refusals are now programmatically detectable." - OpenAI

Scripts

  • scripts/structured-outputs_tool.py: Pydantic model definition and parsing logic.
  • scripts/structured-outputs_tool.js: Equivalent JSON schema definition for Node.js.

Dependencies

  • pydantic
  • openai >= 1.40.0 or google-generativeai

References

Structured Outputs Skill | Agent Skills