Back to tags
Tag

Agent Skills with tag: model-deployment

11 skills match this tag. Use tags to discover related Agent Skills and explore similar workflows.

senior-ml-engineer

World-class ML engineering skill for productionizing ML models, MLOps, and building scalable ML systems. Expertise in PyTorch, TensorFlow, model deployment, feature stores, model monitoring, and ML infrastructure. Includes LLM integration, fine-tuning, RAG systems, and agentic AI. Use when deploying ML models, building ML platforms, implementing MLOps, or integrating LLMs into production systems.

mlopsmodel-deploymentfeature-storellm-integration
ovachiever
ovachiever
81

speculative-decoding

Accelerate LLM inference using speculative decoding, Medusa multiple heads, and lookahead decoding techniques. Use when optimizing inference speed (1.5-3.6× speedup), reducing latency for real-time applications, or deploying models with limited compute. Covers draft models, tree-based attention, Jacobi iteration, parallel token generation, and production deployment strategies.

llm-optimizationinference-accelerationspeculative-decodingparallel-token-generation
ovachiever
ovachiever
81

llama-cpp

Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10× speedup vs PyTorch on CPU.

llmmodel-deploymentquantizationcpu-inference
ovachiever
ovachiever
81

knowledge-distillation

Compress large language models using knowledge distillation from teacher to student models. Use when deploying smaller models with retained performance, transferring GPT-4 capabilities to open-source models, or reducing inference costs. Covers temperature scaling, soft targets, reverse KLD, logit distillation, and MiniLLM training strategies.

model-compressionknowledge-distillationlarge-language-modelsmini-llm
ovachiever
ovachiever
81

mlflow

Track ML experiments, manage model registry with versioning, deploy models to production, and reproduce experiments with MLflow - framework-agnostic ML lifecycle platform

ml-pipelinesmodel-deploymentexperiment-trackingmodel-lifecycle
ovachiever
ovachiever
81

model-deployment

LLM deployment strategies including vLLM, TGI, and cloud inference endpoints.

model-deploymentvllmtgicloud-inference
pluginagentmarketplace
pluginagentmarketplace
1

ml-deployment

Deploy ML models to production - APIs, containerization, monitoring, and MLOps

mlopsmodel-deploymentcontainerizationmonitoring
pluginagentmarketplace
pluginagentmarketplace
11

secure-deployment

Security best practices for deploying AI/ML models to production environments

model-deploymentbest-practicesaiml
pluginagentmarketplace
pluginagentmarketplace
1

model-serving

Master model serving - inference optimization, scaling, deployment, edge serving

model-deploymentinference-optimizationscalingedge-computing
pluginagentmarketplace
pluginagentmarketplace
1

ml-pipeline-workflow

Build end-to-end MLOps pipelines from data preparation through model training, validation, and production deployment. Use when creating ML pipelines, implementing MLOps practices, or automating model training and deployment workflows.

ml-pipelinesmodel-trainingmodel-deploymentworkflow-automation
camoneart
camoneart
4

funsloth-upload

Generate comprehensive model cards and upload fine-tuned models to Hugging Face Hub with professional documentation

huggingface-hubmodel-cardsmodel-deploymenttechnical-writing
chrisvoncsefalvay
chrisvoncsefalvay
4