Mistral AI Deploy Integration Skill

Mistral AI Deploy Integration

Overview

Deploy Mistral AI-powered applications to production with secure API key management. Covers Vercel (Edge + Serverless), Docker, Cloud Run, and self-hosted vLLM deployments. All connect to api.mistral.ai or your own inference endpoint.

Prerequisites

Mistral AI production API key
Platform CLI installed (vercel, docker, or gcloud)
Application using @mistralai/mistralai SDK

Instructions

Step 1: Platform Secret Configuration

set -euo pipefail
# Vercel
vercel env add MISTRAL_API_KEY production
vercel env add MISTRAL_MODEL production  # optional: default model

# Cloud Run
echo -n "your-key" | gcloud secrets create mistral-api-key --data-file=-

# Docker
echo "MISTRAL_API_KEY=your-key" > .env.production
echo ".env.production" >> .gitignore

Step 2: Vercel Edge Function

// api/chat.ts — Vercel Edge Function with streaming
import { Mistral } from '@mistralai/mistralai';

export const config = { runtime: 'edge' };

export default async function handler(req: Request) {
  const client = new Mistral({ apiKey: process.env.MISTRAL_API_KEY! });
  const { messages, stream = false } = await req.json();

  if (stream) {
    const streamResponse = await client.chat.stream({
      model: process.env.MISTRAL_MODEL ?? 'mistral-small-latest',
      messages,
    });

    const encoder = new TextEncoder();
    const readable = new ReadableStream({
      async start(controller) {
        for await (const event of streamResponse) {
          const content = event.data?.choices?.[0]?.delta?.content;
          if (content) {
            controller.enqueue(encoder.encode(`data: ${JSON.stringify({ content })}\n\n`));
          }
        }
        controller.enqueue(encoder.encode('data: [DONE]\n\n'));
        controller.close();
      },
    });

    return new Response(readable, {
      headers: {
        'Content-Type': 'text/event-stream',
        'Cache-Control': 'no-cache',
      },
    });
  }

  const response = await client.chat.complete({
    model: process.env.MISTRAL_MODEL ?? 'mistral-small-latest',
    messages,
  });

  return Response.json(response);
}

Step 3: Docker Deployment

FROM node:20-slim AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --production=false
COPY . .
RUN npm run build

FROM node:20-slim
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./

ENV NODE_ENV=production
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=5s \
  CMD curl -sf http://localhost:3000/health || exit 1
CMD ["node", "dist/index.js"]

set -euo pipefail
docker build -t mistral-app .
docker run -d --name mistral-app \
  -p 3000:3000 \
  -e MISTRAL_API_KEY="$MISTRAL_API_KEY" \
  -e MISTRAL_MODEL="mistral-small-latest" \
  mistral-app

Step 4: Cloud Run Deployment

set -euo pipefail
# Build and push
gcloud builds submit --tag gcr.io/$PROJECT_ID/mistral-app

# Deploy with secret injection
gcloud run deploy mistral-service \
  --image gcr.io/$PROJECT_ID/mistral-app \
  --region us-central1 \
  --platform managed \
  --set-secrets=MISTRAL_API_KEY=mistral-api-key:latest \
  --set-env-vars=MISTRAL_MODEL=mistral-small-latest \
  --min-instances=1 \
  --max-instances=10 \
  --memory=512Mi \
  --timeout=60s

Step 5: Self-Hosted with vLLM

For data sovereignty or latency requirements, self-host open-weight Mistral models:

set -euo pipefail
# Serve Mistral with vLLM (OpenAI-compatible API)
docker run --runtime nvidia --gpus all \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -p 8000:8000 \
  -e HF_TOKEN="$HF_TOKEN" \
  vllm/vllm-openai:latest \
  --model mistralai/Mistral-Small-24B-Instruct-2501 \
  --dtype auto \
  --api-key "your-local-key"

Point the SDK at your local endpoint:

import { Mistral } from '@mistralai/mistralai';

const client = new Mistral({
  apiKey: 'your-local-key',
  serverURL: 'http://localhost:8000', // vLLM endpoint
});

Step 6: Health Check Endpoint

import { Mistral } from '@mistralai/mistralai';

export async function GET() {
  const start = performance.now();
  try {
    const client = new Mistral({ apiKey: process.env.MISTRAL_API_KEY! });
    await client.models.list();
    return Response.json({
      status: 'healthy',
      provider: 'mistral',
      latencyMs: Math.round(performance.now() - start),
    });
  } catch (error: any) {
    return Response.json(
      { status: 'unhealthy', error: error.message },
      { status: 503 },
    );
  }
}

Error Handling

| Issue | Cause | Solution | |-------|-------|----------| | API key not found | Missing env/secret | Verify secret config on platform | | Function timeout | Long completion | Increase timeout, use streaming | | Cold start latency | Serverless spin-up | Set min-instances=1 or use edge | | vLLM OOM | Model too large for GPU | Use quantized model or smaller variant |

Resources

Output

Platform-specific deployment configurations
Secure API key management per platform
Streaming support for Edge/Serverless
Health check endpoint
Self-hosted option with vLLM

Agent Skills: Mistral AI Deploy Integration

Install this agent skill to your local

Skill Files