Sync Models
Add descriptions for new models available in the HuggingFace router to chat-ui's prod.yaml and dev.yaml. Also flag models that support the OpenAI-compatible reasoning_effort parameter so chat-ui shows the thinking-effort selector for them, and enable artifacts for models with 32B or more total parameters. Finally, prune deprecated models — entries in the config whose ids the router no longer returns.
The comparison runs in two directions:
- Missing = in the router but not in the config → add them (steps 4–8).
- Deprecated = in the config but no longer in the router → remove them (step 9).
Both directions share the same source of truth: whatever https://router.huggingface.co/v1/models returns right now.
Workflow
-
Fetch models from router
WebFetch https://router.huggingface.co/v1/modelsExtract all model IDs from the response.
-
Read current configuration
- Read
chart/env/prod.yaml(andchart/env/dev.yaml— the two share the same model set) - Extract model IDs from the
MODELSJSON array inenvVars
- Read
-
Identify missing and deprecated models Compare the router list against the config in both directions:
- Missing = in router but not in the config → candidates to add (continue to step 4).
- Deprecated = in the config but not in the router → candidates to remove (jump to step 9).
Compute both sets with a script so the diff is exact rather than eyeballed — the list is long and near-duplicate ids (
GLM-4.7vsGLM-4.7-FP8,-Instructvs-Thinking) are easy to miss:python3 - <<'EOF' import json, re, subprocess raw = subprocess.check_output(["curl","-sS","https://router.huggingface.co/v1/models"]).decode() router = {m["id"] for m in json.loads(raw)["data"]} txt = open("chart/env/prod.yaml").read() block = re.search(r"MODELS:\s*>\s*\n(.*?)\n\S", txt, re.S).group(1) ids = [e["id"] for e in json.loads(block)] keep = {"omni"} # never treat the router alias (PUBLIC_LLM_ROUTER_ALIAS_ID) as deprecated print("MISSING (add): ", sorted(router - set(ids))) print("DEPRECATED (rm): ", [i for i in ids if i not in router and i not in keep]) EOFOnly operate on the missing set for the add/research steps (4–8). Never edit, re-flag, or re-describe entries that already exist in
prod.yaml/dev.yaml— even if you think their reasoning capability or description could be improved. Existing entries are intentionally curated and may have been hand-tuned for known quirks. Out of scope unless the user explicitly asks for a re-audit.Exclude the router alias from the deprecated set. The
omnientry (whateverPUBLIC_LLM_ROUTER_ALIAS_IDis set to) is a synthetic alias, not a router model, so it never appears in/v1/models. Never remove it. The same goes for any other intentionally-synthetic id that isn't meant to come from the router. -
Research each missing model For each missing model, search the web for its specifications:
- Model architecture (dense, MoE, parameters)
- Key capabilities (coding, reasoning, vision, multilingual, etc.)
- Target use cases
- Whether it's a reasoning model (see step 5)
-
Decide if the model is reasoning-capable A model is "reasoning-capable" for chat-ui purposes if it accepts the OpenAI-style
reasoning_effort: low|medium|highparameter via the HF router and meaningfully changes its chain-of-thought depth in response. Whether that holds depends on both the model and the providers serving it — the router is a transparent proxy, so behavior comes from each provider's implementation. Don't decide from the name alone.Heuristic shortlist (candidates worth verifying):
- Name contains
gpt-oss,-Thinking,-thinking,-Reasoning,-reasoning,QwQ,R1,MiniMax-M,Kimi-K2-Thinking,cogito- - Hybrid models with a thinking switch: DeepSeek V3.1+, GLM-4.5 / 4.6 / 4.7 / 5.x, Qwen3 thinking variants
- Model card mentions "thinking mode", "reasoning traces", "extended thinking", "test-time compute", or shows
<think>...</think>examples
Skip without further checking:
- Generic "good at reasoning" marketing copy — every modern LLM claims this. Only flag when reasoning is the mode of operation.
- Non-thinking siblings (
Qwen3-235B-A22B-Instruct-2507≠Qwen3-235B-A22B-Thinking-2507). - Translation / vision-only / guard / coder-only models with no documented thinking mode.
Verify each candidate via provider docs before flagging:
For each model on the heuristic shortlist, look up its
liveproviders in the/v1/modelspayload, then check those providers' chat-completions documentation forreasoning_effort,reasoning_content,enable_thinking, or athinkingparameter. If at least one live provider documents it for this model (or for the model family in general), flag it as reasoning-capable. The HF router will proxy the parameter to whichever provider it picks.Provider docs to consult (use WebFetch / WebSearch):
- fireworks-ai: https://docs.fireworks.ai/api-reference/post-chatcompletions
- groq: https://console.groq.com/docs/reasoning
- cerebras: https://inference-docs.cerebras.ai/capabilities/reasoning
- together: https://docs.together.ai/docs/gpt-oss
- novita: https://novita.ai/docs/guides/llm-interleaved-thinking
- sambanova: search "sambanova reasoning_effort"
- deepinfra: https://docs.deepinfra.com/chat/overview
- nscale, scaleway, ovhcloud, hyperbolic, zai-org, cohere, featherless-ai: search "<provider> reasoning_effort" or check their model catalog pages
If none of the live providers document reasoning support for the model, don't flag it — even if the name pattern-matches. If documentation is ambiguous, lean toward not flagging and mention it in the commit so it can be revisited.
- Name contains
-
Decide if the model gets artifacts Enable artifacts for any new model with 32B or more total parameters by appending
"supportsArtifacts": trueto its entry. This makes chat-ui instruct the model to emit<artifact>blocks rendered in the side panel.- Use the total parameter count, not active parameters. A
35B-A3BMoE qualifies (35B total ≥ 32B) even though only 3B are active. - The count is usually in the model name (
Qwen3.6-27B,550B-A55B). When it isn't, use the parameter count found while researching the model in step 4. - This is independent of reasoning capability — a model can have both flags, either one, or neither.
- Models under 32B don't get the flag; users can still enable artifacts per-model via settings overrides.
- Use the total parameter count, not active parameters. A
-
Write descriptions Match existing style:
- 8-12 words
- Sentence fragments (no period needed)
- No articles ("a", "the") unless necessary
- Focus on: architecture, specialization, key capability
Examples:
"Flagship GLM MoE for coding, reasoning, and agentic tool use.""MoE agent model with multilingual coding and fast outputs.""Vision-language Qwen for documents, GUI agents, and visual reasoning.""Mobile agent for multilingual Android device automation."
-
Add new models to both files Add new models at the TOP of the MODELS array in:
chart/env/prod.yamlchart/env/dev.yaml
Base format:
{ "id": "org/model-name", "description": "Description here." }Append
"supportsReasoning": truefor reasoning-capable models (step 5) and"supportsArtifacts": truefor 32B+ models (step 6). A model can carry both:{ "id": "org/model-name", "description": "Description here.", "supportsReasoning": true, "supportsArtifacts": true }supportsReasoningis what makes chat-ui render the Thinking-effort dropdown in the chat footer for that model and forwardreasoning_effortto the router.supportsArtifactsenables the artifacts side panel for the model. -
Remove deprecated models from both files Delete the full entry line for every id in the deprecated set (step 3) from both
chart/env/prod.yamlandchart/env/dev.yaml. Match on the exact"id"value so near-duplicate ids aren't removed by accident, and keep the removal symmetric — the two files must end with the same model set.Removing a deprecated entry is safe and low-risk:
MODELSis an overrides map, not the model list.src/lib/server/models.tsbuilds the catalog from the router's/v1/modelsresponse and only applies aMODELSentry when its id is present in that response (it maps over the router models and looks each up in the override map). An entry whose id the router no longer serves is a dead override — it never renders in the UI — so pruning it changes nothing at runtime; it just keeps the config honest and readable.After editing, re-parse the
MODELSblock in each file as JSON to confirm it's still valid and that no deprecated id remains (reuse the script from step 3 — the deprecated set should now be empty).Do not touch models that are merely referenced by env vars but still present in the router (e.g.
TASK_MODEL,LLM_ROUTER_TOOLS_MODEL,LLM_ROUTER_MULTIMODAL_MODEL). If a deprecated id is referenced by one of those env vars, stop and flag it to the user instead of silently removing it — that indicates a config that needs a replacement model, not just a pruned line. -
Commit changes In the commit message, mention how many models were added (and how many of those are reasoning-capable / get artifacts) and how many deprecated models were removed, so it's easy to review.
git add chart/env/prod.yaml chart/env/dev.yaml git commit -m "feat: sync models from router (+N added, M reasoning-capable, K artifacts, -D removed)"
Notes
- Prune-only runs are normal: the router may drop models without adding any. If the missing set is empty but the deprecated set isn't, skip steps 4–8 entirely and just do the removal (step 9) plus commit.
- Add vs. remove are judged per-id: the router routinely keeps a base model while dropping its FP8/variant sibling (or the reverse). Treat
GLM-4.7andGLM-4.7-FP8, or-Instructand-Thinking, as independent ids — presence of one says nothing about the other. Trust the script's diff, not the family name. - FP8 variants: describe as "FP8 [base model] for efficient inference with [key capability]". If the base model is reasoning-capable, the FP8 variant is too — flag both. Same for artifacts: quantization doesn't change the parameter count, so a 32B+ base means the FP8 variant gets
supportsArtifactstoo. - Vision models: mention "vision-language" and key visual tasks. A vision model can still be reasoning-capable (e.g.
Qwen3-VL-*-Thinking) — judge by the same rules. - Agent models: mention "agent" and automation capabilities.
- Regional models: mention language focus (e.g., "European multilingual", "Southeast Asian").