Model-ID glossary - every part in model identifiers

Jun 29, 2026 #ai #llm #reference

Model-ID Glossary: every “part” in the model identifiers across Artificial Analysis, Ollama, and OpenRouter

A reference that decodes every meaningful stem found in the model identifiers inside three datasets:

artificialanalysis_benchmark_data.json - Artificial Analysis benchmark catalog (the slug and name fields; the id field is an opaque UUID and is ignored).
ollama_models.json - the Ollama library (the full_tag field of every tags / detailed_tags entry, e.g. qwen3.6:35b-a3b-mtp-q4_K_M, plus the structured family / quantization / variant / parameters fields).
openrouter_models.json - the OpenRouter catalog (the id, canonical_slug, and hugging_face_id fields, e.g. qwen/qwen3.7-max-20260520).

Every claim about what a stem means was verified by web search; sources are linked inline and collected in the Bibliography. Where a stem could not be confidently decoded even after research, it is listed in Stems not confidently decoded.

Scope note: this is a glossary of identifier parts (the tokens you get by splitting IDs on / : - _ .). It is not a catalog of individual models.

How a model ID is structured
Inventory and method
Categorical dimension: model families (Ollama)
Categorical dimension: provider / namespace prefixes (OpenRouter)
Categorical dimension: model creators (Artificial Analysis)
Parameter-size and architecture notation
Quantization and numeric precision (the GGUF block)
Advanced block / microscaling formats: NVFP4, MXFP8, MXFP4
Runtime / packaging stems: MLX, GGUF
Training / architecture technique stems (rich)
Capability and variant suffixes
Context-window stems
Versioning and date-stamp conventions
Vendor model-series codenames
Scraping artifacts (not real stems)
Stems not confidently decoded
Bibliography

1. How a model ID is structured

Each source assembles IDs slightly differently, but they share the same building blocks: <namespace>/<family><version>-<size>-<variant/capability>-<quant/format>[-<date>].

Ollama full_tag = model_name:tag, where tag is a --joined stack of size + variant + format. Examples:

llama3.1:70b -> family llama3.1, size 70b.
gemma3:27b-it-qat -> size 27b, it (instruction-tuned), qat (quantization-aware training).
qwen3.6:35b-a3b-mtp-q4_K_M -> 35b total, a3b active (MoE), mtp (multi-token-prediction build), q4_K_M (GGUF quant).

OpenRouter id = <provider>/<model>[:<suffix>], with a parallel canonical_slug that usually appends a YYYYMMDD date, plus an optional hugging_face_id. Examples:

anthropic/claude-opus-4.8 / canonical anthropic/claude-4.8-opus-20260528.
nvidia/nemotron-3-ultra-550b-a55b:free -> 550b total, a55b active, :free tier.
qwen/qwen3.7-max-20260520.

Artificial Analysis slug = a normalized handle (gpt-oss-120b-low), and name = the human label that exposes the reasoning-effort tier in parentheses (gpt-oss-120b (low), GPT-5.5 (xhigh)).

2. Inventory and method

Counts produced by the extraction (see the run dumps: run 01, run 02, run 03, run 04, run 05 - perf notes, run 06 - combination matrix, run 07 - verification):

Source	Identifier strings	Notable distinct fields
Ollama	7,388 `full_tag`s	236 families, 13 structured quant labels, 5 variants, 66 param sizes
OpenRouter	337 `id`s	324 canonical slugs, 149 HF ids, 57 provider prefixes
Artificial Analysis	537 slugs / 537 names	51 model creators

Tokenizing every identifier (split on / : - _ . whitespace) yields ~720 distinct tokens. The sections below group them by what kind of thing they encode.

3. Categorical dimension: model families (Ollama)

The Ollama family field is a small controlled vocabulary (236 values). These name the model lineage (base architecture / publisher series). Full list as found:

alfred, all-minilm, athene-v2, aya, aya-expanse, bakllava, bespoke-minicheck, bge-large, bge-m3,
codebooga, codegeex4, codegemma, codellama, codeqwen, codestral, codeup, cogito, cogito-2.1,
command-a, command-r, command-r-plus, command-r7b, command-r7b-arabic, dbrx, deepcoder,
deepscaler, deepseek-coder, deepseek-coder-v2, deepseek-llm, deepseek-ocr, deepseek-r1,
deepseek-v2, deepseek-v2.5, deepseek-v3, deepseek-v3.1, deepseek-v3.2, deepseek-v4-flash,
deepseek-v4-pro, devstral, devstral-2, devstral-small-2, dolphin-llama3, dolphin-mistral,
dolphin-mixtral, dolphin-phi, dolphin3, dolphincoder, duckdb-nsql, embeddinggemma, everythinglm,
exaone-deep, exaone3.5, falcon, falcon2, falcon3, firefunction-v2, functiongemma,
gemini-3-flash-preview, gemma, gemma2, gemma3, gemma3n, gemma4, glm-4.6, glm-4.7, glm-4.7-flash,
glm-5, glm-5.1, glm-ocr, glm4, goliath, gpt-oss, gpt-oss-safeguard, granite-code,
granite-embedding, granite3-dense, granite3-guardian, granite3-moe, granite3.1-dense,
granite3.1-moe, granite3.2, granite3.2-vision, granite3.3, granite4, granite4.1,
granite4.1-guardian, hermes3, internlm2, kimi-k2, kimi-k2-thinking, kimi-k2.5, kimi-k2.6,
kimi-k2.7-code, laguna-xs.2, lfm2, lfm2.5, lfm2.5-thinking, llama-guard3, llama-pro, llama2,
llama2-chinese, llama2-uncensored, llama3, llama3-chatqa, llama3-gradient, llama3-groq-tool-use,
llama3.1, llama3.2, llama3.2-vision, llama3.3, llama4, llava, llava-llama3, llava-phi3, magicoder,
magistral, marco-o1, mathstral, medgemma, medgemma1.5, meditron, medllama2, megadolphin,
minicpm-v, minicpm-v4.5, minicpm-v4.6, minimax-m2, minimax-m2.1, minimax-m2.5, minimax-m2.7,
minimax-m3, ministral-3, mistral, mistral-large, mistral-large-3, mistral-medium-3.5,
mistral-nemo, mistral-openorca, mistral-small, mistral-small3.1, mistral-small3.2, mistrallite,
mixtral, moondream, mxbai-embed-large, nemotron, nemotron-3-nano, nemotron-3-super,
nemotron-3-ultra, nemotron-cascade-2, nemotron-mini, nemotron3, neural-chat, nexusraven,
nomic-embed-text, nomic-embed-text-v2-moe, notus, notux, nous-hermes, nous-hermes2,
nous-hermes2-mixtral, nuextract, olmo-3, olmo-3.1, olmo2, open-orca-platypus2, openchat,
opencoder, openhermes, openthinker, orca-mini, orca2, paraphrase-multilingual, phi, phi3, phi3.5,
phi4, phi4-mini, phi4-mini-reasoning, phi4-reasoning, phind-codellama, qwen, qwen2, qwen2-math,
qwen2.5, qwen2.5-coder, qwen2.5vl, qwen3, qwen3-coder, qwen3-coder-next, qwen3-embedding,
qwen3-next, qwen3-vl, qwen3.5, qwen3.6, qwq, r1-1776, reader-lm, reflection, rnj-1, sailor2,
samantha-mistral, shieldgemma, smallthinker, smollm, smollm2, snowflake-arctic-embed,
snowflake-arctic-embed2, solar, solar-pro, sqlcoder, stable-beluga, stable-code, stablelm-zephyr,
stablelm2, starcoder, starcoder2, starling-lm, tinydolphin, tinyllama, translategemma, tulu3,
vicuna, wizard-math, wizard-vicuna, wizard-vicuna-uncensored, wizardcoder, wizardlm,
wizardlm-uncensored, wizardlm2, xwinlm, yarn-llama2, yarn-mistral, yi, yi-coder, zephyr

Recurring family stems and what they signal:

Publisher base series: llama* (Meta), gemma* (Google), qwen* (Alibaba), mistral*/mixtral/ministral/codestral/devstral/magistral (Mistral), phi* (Microsoft), granite* (IBM), command* (Cohere), deepseek* (DeepSeek), glm* (Zhipu / Z.ai), falcon* (TII), olmo* (AI2), nemotron* (NVIDIA), kimi-k2* (Moonshot), minimax-m* (MiniMax), solar* (Upstage), yi* (01.AI), exaone* (LG), internlm* (Shanghai AI Lab), smollm* (HF), lfm* (Liquid AI).
Community fine-tune lineages: dolphin*, hermes*/openhermes/nous-hermes*, wizard*, vicuna, zephyr, orca*, notus/notux, samantha-*, starling-lm, xwinlm, stable-beluga, reflection. These are tunes of a base model (named in this glossary’s suffix sections) by independent groups.
Task-specialized families: *coder/codellama/codegemma/codeqwen/opencoder/ magicoder/sqlcoder/duckdb-nsql (code), *math/mathstral (math), *-embed*/bge-*/ nomic-embed-*/mxbai-embed-large/all-minilm/paraphrase-multilingual (embeddings), llava*/bakllava/moondream/minicpm-v/*-vision/*vl (vision), med* (medical), *guard*/shieldgemma/*-guardian/*safeguard (safety), reader-lm/nuextract (extraction).

4. Categorical dimension: provider / namespace prefixes (OpenRouter)

The token before the first / in an OpenRouter id is the routing namespace (publisher or hosting org), 57 distinct values:

ai21, aion-labs, allenai, amazon, anthracite-org, anthropic, arcee-ai, baidu, bytedance,
bytedance-seed, cognitivecomputations, cohere, deepcogito, deepseek, essentialai, google, gryphe,
ibm-granite, inception, inclusionai, inflection, kwaipilot, liquid, mancer, meta-llama, microsoft,
minimax, mistralai, moonshotai, morph, nex-agi, nousresearch, nvidia, openai, openrouter,
perceptron, perplexity, poolside, prime-intellect, qwen, rekaai, relace, sao10k, stepfun,
switchpoint, tencent, thedrummer, undi95, upstage, writer, x-ai, xiaomi, z-ai,
~anthropic, ~google, ~moonshotai, ~openai

A leading ~ (e.g. ~anthropic/claude-fable-latest) marks OpenRouter’s “floating” / alias namespace - a moving pointer (such as -latest) rather than a pinned dated build.
openrouter/ is OpenRouter’s own meta-models (openrouter/fusion, openrouter/owl-alpha, openrouter/pareto-code, the auto router).
Community/roleplay finetuners appear as namespaces too: sao10k, thedrummer, undi95, gryphe, anthracite-org, cognitivecomputations, mancer.

5. Categorical dimension: model creators (Artificial Analysis)

Artificial Analysis stores a normalized model_creator (51 values), the organization that trained the model:

AI21 Labs, Alibaba, Allen Institute for AI, Amazon, Anthropic, Arcee AI, Baidu, ByteDance Seed,
China Mobile, Cohere, Databricks, Deep Cogito, DeepSeek, Google, IBM, Inception, InclusionAI,
Kimi, Korea Telecom, KwaiKAT, LG AI Research, Liquid AI, LongCat, MBZUAI Institute of Foundation
Models, Meta, Microsoft, MiniMax, Mistral, Motif Technologies, NVIDIA, Nanbeige, Naver,
Nous Research, OpenAI, OpenBMB, OpenChat, Perplexity, Prime Intellect, Reka AI, Sarvam,
ServiceNow, Snowflake, StepFun, Swiss AI Initiative, TII UAE, Tencent, Trillion Labs, Upstage,
Xiaomi, Z AI, xAI

This is the cleanest “who made it” axis across the three datasets; the OpenRouter namespace and Ollama family stems above are noisier proxies for the same idea.

6. Parameter-size and architecture notation

6.1 Plain parameter counts

<n>b = billions of parameters (7b, 8b, 70b, 405b, 671b, 1t = ~1 trillion). Ollama’s structured parameters field enumerates: 0.5B 0.6B 0.8B 1.1B 1.2B 1.3B 1.5B 1.6B 1.7B 1.8B 2B 2.4B 2.7B 3B 3.8B 4B 6B 6.7B 7B 7.8B 8B 9B 10B 10.7B 11B 12B 13B 14B 15B 16B 17B 20B 22B 24B 26B 27B 30B 31B 32B 33B 34B 35B 40B 67B 70B 72B 80B 90B 104B 110B 111B 120B 122B 123B 128B 132B 141B 180B 235B 236B 397B 405B 480B 671B 675B.
<n>m = millions of parameters (135m, 270m, 350m, 360m, 567m) - small/embedding models.

6.2 Mixture-of-Experts (MoE) notation

A sparse MoE model has many “expert” sub-networks; a router activates only a few per token, so active parameters « total parameters. The IDs encode this three ways:

<E>x<n>b - number of experts x size each. 8x7b (Mixtral) = 8 experts of ~7B; also 8x22b, 16x17b, 128x17b. (Mixture-of-experts overview)
<n>E / <n>e - number of experts. Llama 4 Scout-17B-16E (16 experts), Maverick-17B-128E (128 experts). The 17B is the active count. (HF Llama-4-Scout-17B-16E, HF Llama-4-Maverick-17B-128E-Instruct)
a<n>b - Active billions per forward pass. qwen3:235b-a22b = 235B total / 22B active (128 experts, top-8 routed). Active stems observed: a1b a2b a3b a4b a9b a10b a12b a13b a17b a22b a35b a47b a55b. (OpenRouter Qwen3-235B-A22B, Qwen3 Technical Report, EmergentMind Qwen3-235B-A22B)

moe itself appears as a stem (e.g. granite3-moe, nomic-embed-text-v2-moe); its counterpart dense marks a non-MoE model (granite3-dense).

6.3 Effective parameters (Matryoshka / MatFormer): `e2b`, `e4b`

Gemma 3n / Gemma 4 nano use E2B / E4B = Effective ~2B / ~4B parameters. The architecture is MatFormer (Matryoshka Transformer): one model with nested smaller submodels (E4B contains E2B), so you can slice a smaller model out of the larger one. Because of Per-Layer Embeddings (PLE) - large embedding tables used only for lookup - the total stored weights exceed the effective compute parameters.

Resources:

7. Quantization and numeric precision (the GGUF block)

Quantization stores weights in fewer bits to shrink the model and speed up inference. The Ollama tags carry the GGUF scheme used by llama.cpp. General pattern: Q<bits>_<type>[_<size>].

7.1 What each part means

Q = quantized (integer block quantization).
<bits> = nominal bits per weight (Q2=2-bit … Q8=8-bit). Effective bits-per-weight is a bit higher because of stored scales (e.g. Q4_K_M ~4.5 bpw).
K = k-quants: a super-block structure (blocks of 256 weights split into sub-blocks) with their own quantized scales/mins, so bits are allocated more cleverly than the legacy scheme. (llama.cpp discussion #2094)
_S / _M / _L = small / medium / large mix: how many tensors get bumped to higher precision. S = smallest/most-aggressive, L = largest/highest-fidelity, M = the common balanced choice.
Legacy _0 / _1 (no K): older round-to-nearest block quant. _0 stores a scale only; _1 stores scale and a min offset (asymmetric) - slightly better and slightly larger. Seen: Q4_0 Q4_1 Q5_0 Q5_1 Q8_0.

7.2 Quant labels found (Ollama)

Label	Bits (nominal)	Notes
`Q2_K`	2	k-quant, smallest, largest quality hit
`Q3_K_S` / `Q3_K_M` / `Q3_K_L`	3	k-quant, S/M/L mixes
`Q4_0` / `Q4_1`	4	legacy block quant (`_1` asymmetric)
`Q4_K_S` / `Q4_K_M`	4	k-quant; `Q4_K_M` is the common sweet spot (~4.5 bpw)
`Q5_0` / `Q5_1`	5	legacy block quant
`Q5_K_S` / `Q5_K_M`	5	k-quant
`Q6_K`	6	k-quant, near-lossless
`Q8_0` / `Q8`	8	8-bit, effectively lossless for most uses

7.3 Floating-point and integer formats (full precision and low precision)

F16 / FP16 = 16-bit float (1 sign / 5 exponent / 10 mantissa). Common “full precision” baseline for shipped weights.
BF16 = bfloat16 (1 / 8 / 7): FP32’s exponent range with less mantissa precision; the preferred training format because it avoids loss-scaling.
FP8 = 8-bit float, either E4M3 or E5M2.
INT8 / INT4 = uniform integer quantization (equal-width buckets), 8- and 4-bit.

Resources for the GGUF/precision block:

8. Advanced block / microscaling formats: NVFP4, MXFP8, MXFP4

These are the newer 4-/8-bit block-scaled formats that appear on recent Gemma 4 / Qwen 3.5+ / Laguna tags (gemma4:27b-nvfp4, qwen3.5:27b-mxfp8).

8.1 NVFP4 (NVIDIA FP4)

NVIDIA’s 4-bit float for Blackwell GPUs. Element type E2M1 (1 sign / 2 exponent / 1 mantissa). Uses a small block size of 16 values with an E4M3 FP8 per-block scale, plus an optional FP32 outer scale. The small block (vs MXFP4’s 32) reduces quantization error: ~3.5x smaller than FP16, <1% accuracy loss, ~2x FP8 throughput on Blackwell.

Resources:

8.2 MX formats: MXFP8, MXFP6, MXFP4 (and MXINT8)

MX = “Microscaling”, an Open Compute Project (OCP) standard. Block-wise quantization with a fixed block size of 32 elements sharing one scale factor. Name = MX + element type + bits: MXFP4 = E2M1, MXFP8 = E4M3 or E5M2. (NVFP4 is essentially “MX-style but block-16 with an FP8 scale”.)

Resources:

9. Runtime / packaging stems: MLX, GGUF

mlx - the model is packaged for Apple MLX, Apple’s array/ML framework for Apple silicon (unified memory, lazy evaluation, NumPy-like API). The team is ml-explore; “MLX” has no officially expanded acronym. An mlx-bf16 tag = MLX weights kept at bfloat16. Resources: GitHub ml-explore/mlx, Apple Open Source - MLX, MLX framework site, WWDC25 - Get started with MLX.
GGUF - the file/container format that the Q*/F16 Ollama tags imply (successor to GGML). Covered by the llama.cpp and APXML links in §7.

10. Training / architecture technique stems (rich)

These stems describe how a model was built or tuned, each with its own body of literature.

10.1 `qat` - Quantization-Aware Training

Simulates low-precision math during the forward pass while training, so the model learns to compensate for quantization error - giving higher quality than post-training quantization (PTQ) at the same bit width. Gemma QAT checkpoints reach near-FP16 quality at ~4-bit memory.

10.2 `mtp` - Multi-Token Prediction

A training objective (and inference trick) where extra heads predict tokens t+2, t+3, … At inference the MTP head acts as a speculative-decoding draft module (DeepSeek-V3, Qwen3-Next), giving ~1.8x speedups. In Ollama tags it marks a build that ships the MTP head (qwen3.6:27b-mtp-q4_K_M).

10.3 MoE - Mixture of Experts

Covered in §6.2. Conditional computation: route each token to a few of many experts; dense is the non-MoE opposite.

10.4 `yarn` (and `gradient`) - context-window extension

YaRN = “Yet another RoPE extensioN”: piecewise “NTK-by-parts” frequency scaling of Rotary Position Embeddings plus an attention-softmax temperature, extending context length with <0.1% extra training. Appears as yarn-llama2, yarn-mistral. llama3-gradient is a similarly context-extended Llama 3 (by Gradient AI).

10.5 `distill` - Knowledge distillation

Train a small “student” to mimic a large “teacher”. deepseek-r1 distilled Qwen/Llama students inherit R1’s chain-of-thought reasoning from ~800k R1-generated examples.

10.6 `dpo` - Direct Preference Optimization

An RLHF-free preference-alignment method: directly raises the probability of preferred responses over dispreferred ones, with no separate reward model or PPO loop. Appears as a suffix on some community tunes.

10.7 `laser` - Layer-Selective Rank Reduction

LASER = LAyer-SElective Rank Reduction: after training, replace selected weight matrices with low-rank (SVD) approximations; counterintuitively this can improve reasoning. Used by cognitivecomputations’ Dolphin “laser” tunes.

10.8 `uncensored` / abliteration

uncensored = a fine-tune with safety/refusals removed. The related technique abliteration (“ablate” + “obliterate”) removes the model’s “refusal direction” in activation space via representation engineering - editing weights rather than prompting around them.

11. Capability and variant suffixes

11.1 Post-training / tuning stage

Stem	Meaning
`base`	raw pretrained LM, no instruction following
`instruct`	instruction-tuned to follow prompts
`it`	instruction-tuned (Gemma’s label, same idea as `instruct`)
`chat`	tuned for multi-turn dialogue (often RLHF)
`text`	base/completion text variant (Ollama `variant`)
`code` / `coder` / `coding`	code-specialized
`dpo`	preference-aligned (see §10.6)

Resources: Red Hat - How to navigate LLM model names, Alex Ewerlof - Base vs Instruct vs Thinking, Medium - Base, Instruct, and Chat architectures.

11.2 Reasoning

Stem	Meaning
`reasoning` / `thinking` / `think` / `thinker`	model emits chain-of-thought before answering
`non` (as in `non-reasoning`)	the same model with thinking disabled
`deep` (`exaone-deep`, `deepscaler`)	reasoning-oriented variant
`low` / `medium` / `high`	reasoning-effort tier (compute vs latency), set in the prompt
`minimal` / `xhigh`	extra effort tiers exposed by newer GPT-5.x in the AA `name` field

qwq = “Qwen with Questions” reasoning series. Resources: Sebastian Raschka - Understanding Reasoning LLMs, NVIDIA - Chain-of-Thought prompting glossary, Qwen - QwQ-32B blog, OpenAI - Introducing gpt-oss (reasoning-effort), gpt-oss model card (arXiv).

11.3 Modality

Stem	Meaning
`vl` / `vision`	Vision-Language (text + image input)
`omni`	omni-modal (text/image/audio/video in, often speech out)
`ocr`	optical character recognition / document parsing
`audio` / `voxtral`	audio/speech input
`image` / `lyria`	image generation / audio generation specialist
`embed` / `embedding`	embedding model (vectors, not chat)
`reader` / `nuextract`	text extraction / reading

Resources: Qwen-VL (arXiv), Qwen3-VL Technical Report (arXiv), Qwen2.5-Omni (HF), LlamaIndex - What is Qwen-VL.

11.4 Safety / guardrail

guard / guardian / shield / safeguard = safety-classifier models that score prompts and responses against a risk taxonomy (Llama Guard, ShieldGemma, Granite Guardian, gpt-oss-safeguard). Resources: EmergentMind - Llama Guard 3, Medium - How Llama Guard improves AI safety.

11.5 Size / capability tiers (marketing scale)

Ordered roughly small -> large: nano, micro, mini, xs, lite, small, medium, large, and the “premium” stems pro, plus, max, ultra, premier, super. These are relative within a publisher’s lineup, not absolute sizes.

11.6 Speed / serving variants

flash, fast, turbo, instant, air, edge = latency-optimized serving variants. flash-lite = the cheapest/fastest tier. scout / maverick are Llama 4 codenames (not generic size words). nemo = Mistral-NeMo (Mistral x NVIDIA collaboration). next = a next-generation/preview architecture (qwen3-next, qwen3-coder-next).

11.7 Lifecycle / availability

preview, exp / experimental, alpha, beta = pre-release maturity. latest = moving pointer to the newest build. free (:free) = OpenRouter’s free-tier endpoint. cloud = cloud-hosted (vs local) Ollama endpoint. terminus / speciale = named point-release variants (DeepSeek). instant = Anthropic/AA fast tier.

12. Context-window stems

Tokens like 4k 8k 16k 32k 64k 128k 200k 256k 1m 1023 1048k encode the context window (maximum tokens). 128k = 128,000 tokens; 1m = ~1,000,000; 1048k = 1,048,576 (1 Mi). Context extension is done with techniques named in the ID: yarn and gradient (see §10.4).

Note: 1t is not a context size - it means ~1 trillion parameters (ling-1t, ring-1t, nemotron-3-ultra-...). Disambiguate by position: a bare 1t/671b near the size slot is parameters; a 1m/128k is context.

13. Versioning and date-stamp conventions

v0 v1 v2 v3 v4, and dotted 3.1, 4.8, 2.5 = model version numbers.
r1, r7b, o1/o3/o4, k2, m2/m3 = series/generation labels baked into a name (DeepSeek R-series, OpenAI o-series, Kimi K2, MiniMax M-series).
4-digit YYMM date = release year+month: 2407 = Jul 2024, 2501 = Jan 2025, 2507 = Jul 2025 (Mistral, Qwen, Magistral, Devstral, Voxtral, Codestral). Many such stems appear: 2024 2025 2402 2407 2409 2411 2501 2502 2503 2505 2506 2507 2508 2509 2512 2603 ... and 4-digit MMDD forms (0106 0324 0528 0905 1210 ...).
8-digit YYYYMMDD = full release date in OpenRouter canonical_slug (...-20260528).
Date words also appear in AA names: (May 2026), dec, june, sep.

Resources: Mistral - Changelog, Mistral - Large 2407 announcement, Starmorph - LLM Model Names Decoded.

14. Vendor model-series codenames

These tokens are proprietary product names (not technique stems). Listed with the creator so the ID parses cleanly:

Stem(s)	Creator / source	Notes
`nova`	Amazon	Amazon Nova family
`command`, `command-a`, `command-r`, `r7b`	Cohere	Command / Command-R series
`nemotron`, `nemo`	NVIDIA / Mistral	NVIDIA Nemotron; Mistral-NeMo
`granite`	IBM	Granite series
`jamba`	AI21	SSM-Transformer hybrid
`arctic`	Snowflake	Snowflake Arctic
`dbrx`	Databricks
`seed`, `doubao`, `ui-tars`	ByteDance	Seed / Doubao / UI-TARS GUI agent
`kimi`, `k2`	Moonshot	Kimi K2
`glm`	Z.ai / Zhipu	GLM series
`ernie`	Baidu
`hunyuan`	Tencent
`step`	StepFun
`ling`, `ring`	InclusionAI	Ling / Ring (1T MoE)
`minimax`, `m2`/`m3`	MiniMax	M-series
`exaone`	LG AI Research
`solar`	Upstage
`sonar`	Perplexity	search-grounded
`r1-1776`	Perplexity	uncensored DeepSeek-R1 retune (1776 = US bicentennial)
`trinity`, `virtuoso`	Arcee AI
`apriel`	ServiceNow
`apertus`	Swiss AI Initiative
`laguna`	Poolside	`m.1`, `xs.2`
`mimo`	Xiaomi
`kat`	KwaiKAT (Kuaishou)	KAT-Coder
`longcat`	LongCat / Meituan
`nanbeige`	Nanbeige
`motif`	Motif Technologies
`midm`	Korea Telecom (KT)	Mi:dm
`hyperclova`	Naver	HyperCLOVA X
`palmyra`	Writer
`saba`	Mistral	regional (Arabic/South Asian) model
`magnum`, `rocinante`, `cydonia`, `skyfall`, `unslopnemo`, `euryale`, `lunaris`, `mythomax`, `remm`, `hanami`	community RP finetuners (TheDrummer, sao10k, Undi95, Gryphe, anthracite)	roleplay/creative tunes
`owl`, `fusion`, `pareto`	OpenRouter	meta/aggregate models
`mercury`	Inception	diffusion LLM
`perceptron`, `mk1`	Perceptron
`fable`, `opus`, `sonnet`, `haiku`	Anthropic	Claude tiers
`grok`	xAI
`gpt`, `chatgpt`, `oss`, `codex`, `o1`/`o3`/`o4`, `4o`	OpenAI	`oss` = open-weight
`gemini`, `gemma`	Google
`phi`	Microsoft

15. Scraping artifacts (not real stems)

Some Ollama full_tag values were polluted by HTML/CSS during scraping and are not model-ID parts. Excluded from the analysis but listed for transparency:

border-bottom:1px, margin-top:12px, padding:16px, font-size:11px, padding-left:20px, padding:7px,
padding:10px, font-size:13px, font-family:-apple-system, font-weight:600, max-width:900px,
border-bottom:2px, font-weight:500, padding:8px

Resulting noise tokens (padding, font, px, margin, border, weight, collapse, align, center, mailto, etc.) should be ignored.

16. Stems not confidently decoded

After web research, these identifier fragments could not be expanded with confidence. They appear to be proprietary names or undocumented internal tags; no authoritative source was found:

rsnsft - in midm-250-pro-rsnsft (KT Mi:dm). Plausibly “reasoning SFT” but not confirmed by any source; left undecoded.
hy3 - the Hy3-preview model in the AA data; creator/expansion not identified.
jt - JT-35B-Flash / JT-MINI (creator “China Mobile” in AA). The JT initials were not tied to a documented expansion.
rnj-1 - EssentialAI rnj-1-instruct; “rnj” is an undocumented codename.
x1 (in l3.1-70b-hanami-x1), mk1 (perceptron-mk1), n2 (nex-n2-pro) - internal revision tags with no published meaning beyond “mark 1 / version 2”.
speciale / terminus - DeepSeek point-release codenames; descriptive only, no technical definition published.
trinity, virtuoso, owl, pareto, tars, muse, spark, kat - confirmed as product codenames (see §14) but with no decomposable technical meaning.

Model-ID Glossary: every “part” in the model identifiers across Artificial Analysis, Ollama, and OpenRouter

Contents

1. How a model ID is structured

2. Inventory and method

3. Categorical dimension: model families (Ollama)

4. Categorical dimension: provider / namespace prefixes (OpenRouter)

5. Categorical dimension: model creators (Artificial Analysis)

6. Parameter-size and architecture notation

6.1 Plain parameter counts

6.2 Mixture-of-Experts (MoE) notation

6.3 Effective parameters (Matryoshka / MatFormer): e2b, e4b

7. Quantization and numeric precision (the GGUF block)

7.1 What each part means

7.2 Quant labels found (Ollama)

7.3 Floating-point and integer formats (full precision and low precision)

8. Advanced block / microscaling formats: NVFP4, MXFP8, MXFP4

8.1 NVFP4 (NVIDIA FP4)

8.2 MX formats: MXFP8, MXFP6, MXFP4 (and MXINT8)

9. Runtime / packaging stems: MLX, GGUF

10. Training / architecture technique stems (rich)

10.1 qat - Quantization-Aware Training

10.2 mtp - Multi-Token Prediction

10.3 MoE - Mixture of Experts

10.4 yarn (and gradient) - context-window extension

10.5 distill - Knowledge distillation

10.6 dpo - Direct Preference Optimization

10.7 laser - Layer-Selective Rank Reduction

10.8 uncensored / abliteration

11. Capability and variant suffixes

11.1 Post-training / tuning stage

11.2 Reasoning

11.3 Modality

11.4 Safety / guardrail

11.5 Size / capability tiers (marketing scale)

11.6 Speed / serving variants

11.7 Lifecycle / availability

12. Context-window stems

13. Versioning and date-stamp conventions

14. Vendor model-series codenames

15. Scraping artifacts (not real stems)

16. Stems not confidently decoded

Bibliography

GGUF / quantization / numeric precision

NVFP4

MX / microscaling formats

Apple MLX

QAT

MTP (Multi-Token Prediction)

MoE / parameter notation

Effective parameters / MatFormer (Gemma 3n / 4)

YaRN / context extension

Distillation

DPO / base-instruct-chat suffixes

LASER

Reasoning / thinking models

Uncensored / abliteration

Modality (VL / Omni)

Safety / guard models

Date / version conventions

Discussion

6.3 Effective parameters (Matryoshka / MatFormer): `e2b`, `e4b`

10.1 `qat` - Quantization-Aware Training

10.2 `mtp` - Multi-Token Prediction

10.4 `yarn` (and `gradient`) - context-window extension

10.5 `distill` - Knowledge distillation

10.6 `dpo` - Direct Preference Optimization

10.7 `laser` - Layer-Selective Rank Reduction

10.8 `uncensored` / abliteration