AWS Step Functions
AWS Step Functions is a serverless orchestration service that lets you build and run workflows using state machines. Coordinate multiple AWS services into business-critical applications.
Table of Contents
Core Concepts
Workflow Types
| Type | Description | Pricing | |------|-------------|---------| | Standard | Long-running, durable, exactly-once | Per state transition | | Express | High-volume, short-duration | Per execution (time + memory) |
State Types
| State | Description | |-------|-------------| | Task | Execute work (Lambda, API call) | | Choice | Conditional branching | | Parallel | Execute branches concurrently | | Map | Iterate over array | | Wait | Delay execution | | Pass | Pass input to output | | Succeed | End successfully | | Fail | End with failure |
Amazon States Language (ASL)
JSON-based language for defining state machines.
Common Patterns
Simple Lambda Workflow
{
"Comment": "Process order workflow",
"StartAt": "ValidateOrder",
"States": {
"ValidateOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:ValidateOrder",
"Next": "ProcessPayment"
},
"ProcessPayment": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:ProcessPayment",
"Next": "FulfillOrder"
},
"FulfillOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:FulfillOrder",
"End": true
}
}
}
Create State Machine
AWS CLI:
aws stepfunctions create-state-machine \
--name OrderWorkflow \
--definition file://workflow.json \
--role-arn arn:aws:iam::123456789012:role/StepFunctionsRole \
--type STANDARD
boto3:
import boto3
import json
sfn = boto3.client('stepfunctions')
definition = {
"Comment": "Order workflow",
"StartAt": "ProcessOrder",
"States": {
"ProcessOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:...",
"End": True
}
}
}
response = sfn.create_state_machine(
name='OrderWorkflow',
definition=json.dumps(definition),
roleArn='arn:aws:iam::123456789012:role/StepFunctionsRole',
type='STANDARD'
)
Start Execution
import boto3
import json
sfn = boto3.client('stepfunctions')
response = sfn.start_execution(
stateMachineArn='arn:aws:states:us-east-1:123456789012:stateMachine:OrderWorkflow',
name='order-12345',
input=json.dumps({
'order_id': '12345',
'customer_id': 'cust-789',
'items': [{'product_id': 'prod-1', 'quantity': 2}]
})
)
execution_arn = response['executionArn']
Choice State (Conditional Logic)
{
"StartAt": "CheckOrderValue",
"States": {
"CheckOrderValue": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.total",
"NumericGreaterThan": 1000,
"Next": "HighValueOrder"
},
{
"Variable": "$.priority",
"StringEquals": "rush",
"Next": "RushOrder"
}
],
"Default": "StandardOrder"
},
"HighValueOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:ProcessHighValue",
"End": true
},
"RushOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:ProcessRush",
"End": true
},
"StandardOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:ProcessStandard",
"End": true
}
}
}
Parallel Execution
{
"StartAt": "ProcessInParallel",
"States": {
"ProcessInParallel": {
"Type": "Parallel",
"Branches": [
{
"StartAt": "UpdateInventory",
"States": {
"UpdateInventory": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:UpdateInventory",
"End": true
}
}
},
{
"StartAt": "SendNotification",
"States": {
"SendNotification": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:SendNotification",
"End": true
}
}
},
{
"StartAt": "UpdateAnalytics",
"States": {
"UpdateAnalytics": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:UpdateAnalytics",
"End": true
}
}
}
],
"Next": "Complete"
},
"Complete": {
"Type": "Succeed"
}
}
}
Map State (Iteration)
{
"StartAt": "ProcessItems",
"States": {
"ProcessItems": {
"Type": "Map",
"ItemsPath": "$.items",
"MaxConcurrency": 10,
"Iterator": {
"StartAt": "ProcessItem",
"States": {
"ProcessItem": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:ProcessItem",
"End": true
}
}
},
"ResultPath": "$.processedItems",
"End": true
}
}
}
Error Handling
{
"StartAt": "ProcessWithRetry",
"States": {
"ProcessWithRetry": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:Process",
"Retry": [
{
"ErrorEquals": ["Lambda.ServiceException", "Lambda.TooManyRequestsException"],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
},
{
"ErrorEquals": ["States.Timeout"],
"IntervalSeconds": 5,
"MaxAttempts": 3,
"BackoffRate": 1.5
}
],
"Catch": [
{
"ErrorEquals": ["CustomError"],
"ResultPath": "$.error",
"Next": "HandleCustomError"
},
{
"ErrorEquals": ["States.ALL"],
"ResultPath": "$.error",
"Next": "HandleAllErrors"
}
],
"End": true
},
"HandleCustomError": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:function:HandleCustom",
"End": true
},
"HandleAllErrors": {
"Type": "Fail",
"Error": "ProcessingFailed",
"Cause": "An error occurred during processing"
}
}
}
CLI Reference
State Machine Management
| Command | Description |
|---------|-------------|
| aws stepfunctions create-state-machine | Create state machine |
| aws stepfunctions update-state-machine | Update definition |
| aws stepfunctions delete-state-machine | Delete state machine |
| aws stepfunctions list-state-machines | List state machines |
| aws stepfunctions describe-state-machine | Get details |
Executions
| Command | Description |
|---------|-------------|
| aws stepfunctions start-execution | Start execution |
| aws stepfunctions stop-execution | Stop execution |
| aws stepfunctions describe-execution | Get execution details |
| aws stepfunctions list-executions | List executions |
| aws stepfunctions get-execution-history | Get execution history |
Best Practices
Design
- Keep states focused — one purpose per state
- Use meaningful state names
- Implement comprehensive error handling
- Use Parallel for independent tasks
- Use Map for batch processing
Performance
- Use Express workflows for high-volume, short tasks
- Set appropriate timeouts
- Limit Map concurrency to avoid throttling
- Use SDK integrations when possible (avoid Lambda wrapper)
Reliability
- Retry transient errors
- Catch and handle specific errors
- Use idempotent operations
- Enable X-Ray tracing
Cost Optimization
- Use Express for short workflows (< 5 minutes)
- Combine related operations to reduce transitions
- Use Wait states instead of Lambda delays
Troubleshooting
Execution Failed
# Get execution history
aws stepfunctions get-execution-history \
--execution-arn arn:aws:states:us-east-1:123456789012:execution:MyWorkflow:exec-123 \
--query 'events[?type==`TaskFailed` || type==`ExecutionFailed`]'
Lambda Timeout
Causes:
- Lambda running too long
- Task timeout too short
Fix:
{
"Type": "Task",
"Resource": "arn:aws:lambda:...",
"TimeoutSeconds": 300,
"HeartbeatSeconds": 60
}
State Stuck
Check:
- Task state waiting for callback
- Wait state not yet elapsed
- Activity worker not responding
Invalid State Machine
# Validate definition
aws stepfunctions validate-state-machine-definition \
--definition file://workflow.json