Palantir Production Checklist Skill

Palantir Production Checklist

Overview

Complete go-live checklist for deploying Foundry-integrated applications to production. Covers credential management, health checks, monitoring, and rollback procedures.

Prerequisites

Staging environment tested and verified
Production OAuth2 credentials from Developer Console
Deployment pipeline configured
Monitoring infrastructure ready

Instructions

Pre-Deployment: Credentials & Config

[ ] OAuth2 client credentials in secrets manager (not personal tokens)
[ ] Scopes are minimal: only what the app actually needs
[ ] FOUNDRY_HOSTNAME points to production enrollment
[ ] Separate credentials from staging (not shared)
[ ] Credential rotation schedule documented (90-day max)

Code Quality

[ ] All tests passing including Foundry integration tests
[ ] No hardcoded hostnames, tokens, or RIDs
[ ] Error handling covers all Foundry ApiError status codes
[ ] Rate limiting with exponential backoff implemented
[ ] Logging uses structured format (JSON) with request IDs

Infrastructure

[ ] Health check endpoint verifies Foundry connectivity

@app.get("/health")
async def health():
    try:
        client.ontologies.Ontology.list()
        return {"status": "healthy", "foundry": "connected"}
    except foundry.ApiError as e:
        return {"status": "degraded", "foundry": f"error_{e.status_code}"}

[ ] Circuit breaker pattern for Foundry API calls
[ ] Graceful degradation when Foundry is unreachable
[ ] Timeout configuration: 30s for reads, 60s for writes
[ ] Connection pooling configured

Monitoring & Alerting

[ ] Metrics: request count, latency p50/p99, error rate by status code
[ ] Alert: 5xx error rate > 5% for 5 minutes → P1
[ ] Alert: p99 latency > 10s for 10 minutes → P2
[ ] Alert: 429 rate > 10/min → P2 (tune rate limiter)
[ ] Alert: 401/403 errors → P1 (credential issue)
[ ] Dashboard with Foundry API health summary

Documentation

[ ] Incident runbook: palantir-incident-runbook
[ ] Credential rotation procedure documented
[ ] Rollback procedure documented and tested
[ ] On-call escalation path defined
[ ] Foundry support contact info available

Deploy

set -euo pipefail
# Pre-flight
curl -sf "https://$FOUNDRY_HOSTNAME/api/v2/ontologies" \
  -H "Authorization: Bearer $FOUNDRY_TOKEN" > /dev/null \
  && echo "Foundry API reachable" || echo "BLOCKED: Foundry unreachable"

# Deploy with canary
kubectl set image deployment/my-app app=myimage:v2.0.0 --record
kubectl rollout status deployment/my-app --timeout=300s

Rollback

kubectl rollout undo deployment/my-app
kubectl rollout status deployment/my-app

Output

Production deployment with verified Foundry connectivity
Health checks passing
Monitoring and alerting active
Rollback procedure tested

Error Handling

| Alert | Condition | Severity | |-------|-----------|----------| | Foundry Unreachable | Health check fails 3x | P1 | | Auth Failure | Any 401/403 | P1 | | Rate Limited | 429 > 10/min | P2 | | High Latency | p99 > 10s | P2 |

Resources

Next Steps

For version upgrades, see palantir-upgrade-migration.

Agent Skills: Palantir Production Checklist

Install this agent skill to your local

Skill Files