knowledge-distillation
Compress large language models using knowledge distillation from teacher to student models. Use when deploying smaller models with retained performance, transferring GPT-4 capabilities to open-source models, or reducing inference costs. Covers temperature scaling, soft targets, reverse KLD, logit distillation, and MiniLLM training strategies.
model-compressionknowledge-distillationlarge-language-modelsmini-llm
ovachiever
81
compound
Distill observations into universal wisdom. Use to promote project patterns to soul wisdom when they reach critical mass.
pattern-analysisknowledge-distillationwisdom-captureproject-patterns
genomewalker
0