Agent-Skills.md

Agent Skills: model

Algorithm/model development and fine-tuning skill. Use for tasks like dataset design/cleaning, supervised fine-tuning (SFT), preference optimization (DPO/RLHF concepts), LoRA/QLoRA, training configs, evaluation (offline/online), safety checks, deployment packaging, and cost/performance trade-offs.

UncategorizedID: muzhicaomingwang/ai-ideas/model

Author

muzhicaomingwang

muzhicaomingwang

https://github.com/muzhicaomingwang View all skills

Repository

muzhicaomingwang/ai-ideas

muzhicaomingwang

32

Install this agent skill to your local

pnpm dlx add-skill https://github.com/muzhicaomingwang/ai-ideas/tree/HEAD/.project/ai/model/skills/model

Skill Files

Browse the full folder contents for model.

Loading file tree…

.project/ai/model/skills/model/SKILL.md

Skill Metadata

Name: model
Description: Algorithm/model development and fine-tuning skill. Use for tasks like dataset design/cleaning, supervised fine-tuning (SFT), preference optimization (DPO/RLHF concepts), LoRA/QLoRA, training configs, evaluation (offline/online), safety checks, deployment packaging, and cost/performance trade-offs.

model

Use this skill for 算法/模型开发/模型微调：从数据到训练到评测再到上线。

Defaults / assumptions to confirm

Goal: improve quality, reduce cost/latency, add domain knowledge, safety alignment?
Base model and license constraints
Hardware: local GPU / multi-GPU / cloud
Target inference stack (vLLM, TGI, llama.cpp, etc.)

Workflow

Define the objective and success metrics

Task definition and input/output format.
Primary metrics (task-specific) + guardrails (safety, latency, cost).
Failure analysis categories (hallucination, format errors, refusal, toxicity).

Data strategy (most important)

Collect/curate dataset; define labeling guidelines.
Remove duplicates, leakage, PII, and near-duplicates.
Balance by scenario; ensure coverage of edge cases.
Split train/val/test with strict leakage prevention.

Choose training approach

SFT for instruction following and domain formatting.
LoRA/QLoRA for efficient fine-tuning (default for most cases).
DPO/Preference tuning when “style/quality preference” is the target.
Avoid fine-tuning when RAG or prompting solves it cheaper.

Training setup

Pick tokenizer/model family compatibility.
Hyperparameters: LR, batch size, sequence length, warmup, weight decay.
Checkpoints and resume strategy; deterministic seeds.
Track experiments (configs, metrics, artifacts).

Evaluation

Offline eval set: small but representative; include hard negatives.
Automatic metrics where meaningful; human eval for subjective qualities.
Regression tests: keep a fixed “golden set” across iterations.

Safety & compliance

Filter sensitive data; define refusal policy and tests.
Measure unsafe outputs; create adversarial eval prompts.

Deployment

Export adapters/merged weights; document inference requirements.
Quantization plan if needed; benchmark latency and throughput.
Monitor in production: quality signals, drift, safety incidents.

Outputs

Data spec: sources, schema, labeling rules, splits.
Training plan: method (SFT/LoRA/DPO), configs, compute estimate.
Eval plan: datasets, metrics, sampling, acceptance thresholds.
Deployment plan: packaging, quantization, benchmarks, monitoring.