Back to tags
Tag

Agent Skills with tag: llm

52 skills match this tag. Use tags to discover related Agent Skills and explore similar workflows.

sentry-setup-ai-monitoring

Setup Sentry AI Agent Monitoring in any project. Use this when asked to add AI monitoring, track LLM calls, monitor AI agents, or instrument OpenAI/Anthropic/Vercel AI/LangChain/Google GenAI. Automatically detects installed AI SDKs and configures the appropriate Sentry integration.

sentryai-monitoringintegrationllm
getsentry
getsentry
1

instruction-engineering

Use when: (1) constructing prompts for subagents, (2) invoking the Task tool, or (3) writing/improving skill instructions or any LLM prompts for maximum effectiveness

prompt-engineeringtask-toolinstruction-writingllm
axiomantic
axiomantic
0

serving-llms-vllm

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GPU memory. Supports OpenAI-compatible endpoints, quantization (GPTQ/AWQ/FP8), and tensor parallelism.

llmvllminference-optimizationgpu-memory-management
ovachiever
ovachiever
81

openai-responses

|

openaichatgptllmai-responses
ovachiever
ovachiever
81

sglang

Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflows with tool calls, or when you need 5× faster inference than vLLM with prefix sharing. Powers 300,000+ GPUs at xAI, AMD, NVIDIA, and LinkedIn.

llmstructured-outputprefix-cachingfast-inference
ovachiever
ovachiever
81

tensorrt-llm

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.

gpuinference-optimizationtensorrtllm
ovachiever
ovachiever
81

google-gemini-api

|

google-geminiapiintegrationllm
ovachiever
ovachiever
81

llama-cpp

Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10× speedup vs PyTorch on CPU.

llmmodel-deploymentquantizationcpu-inference
ovachiever
ovachiever
81

google-gemini-embeddings

|

google-geminiembeddingsvector-representationllm
ovachiever
ovachiever
81

langchain-architecture

Design LLM applications using the LangChain framework with agents, memory, and tool integration patterns. Use when building LangChain applications, implementing AI agents, or creating complex LLM workflows.

langchainllmagent-frameworkai-agents
ovachiever
ovachiever
81

model-pruning

Reduce LLM size and accelerate inference using pruning techniques like Wanda and SparseGPT. Use when compressing models without retraining, achieving 50% sparsity with minimal accuracy loss, or enabling faster inference on hardware accelerators. Covers unstructured pruning, structured pruning, N:M sparsity, magnitude pruning, and one-shot methods.

model-compressionpruningllminference-optimization
ovachiever
ovachiever
81

quantizing-models-bitsandbytes

Quantizes LLMs to 8-bit or 4-bit for 50-75% memory reduction with minimal accuracy loss. Use when GPU memory is limited, need to fit larger models, or want faster inference. Supports INT8, NF4, FP4 formats, QLoRA training, and 8-bit optimizers. Works with HuggingFace Transformers.

model-compressionquantizationllmhuggingface
ovachiever
ovachiever
81

claude-api

|

anthropic-sdkllmapi-integrationbackend
ovachiever
ovachiever
81

prompt-engineer

Build, analyze, and optimize LLM prompts and technical documentation. Activates when user wants to create, modify, review, or improve prompts, or when requests are ambiguous and need clarification before writing.

llmprompt-generationprompt-refinementtechnical-documentation
Shavakan
Shavakan
21

slipstream-finetune

Finetune LLMs to speak Slipstream natively - complete guide with GLM-4-9B

fine-tuningllmglm-4language-model
anthony-maio
anthony-maio
1

prompt-engineering

Prompt design, optimization, few-shot learning, and chain of thought techniques for LLM applications.

prompt-engineeringfew-shot-learningchain-of-thoughtllm
pluginagentmarketplace
pluginagentmarketplace
1

llm-basics

LLM architecture, tokenization, transformers, and inference optimization. Use for understanding and working with language models.

llmtransformerstokenizationinference-optimization
pluginagentmarketplace
pluginagentmarketplace
1

model-deployment

LLM deployment strategies including vLLM, TGI, and cloud inference endpoints.

model-deploymentvllmtgicloud-inference
pluginagentmarketplace
pluginagentmarketplace
1

Page 1 of 3 · 52 results