feat: add mistral small 3.2 24b model to catalog (#3516)

bmahabirbu · web-flow · commit e27b184c562a · 2025-09-02T23:56:52.000-04:00
Signed-off-by: Brian &lt;bmahabir@bu.edu&gt;
diff --git a/packages/backend/src/assets/ai.json b/packages/backend/src/assets/ai.json
@@ -289,6 +289,19 @@
     }
   ],
   "models": [
+    {
+      "id": "hf.mistralai.mistral-small-3.2-24b-instruct-2506",
+      "name": "mistralai/Mistral-Small-3.2-24B-Instruct-2506",
+      "description": "Mistral-Small-3.2-24B-Instruct-2506 is a minor update of [Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503).\r\n\r\nSmall-3.2 improves in the following categories:\r\n- **Instruction following**: Small-3.2 is better at following precise instructions\r\n- **Repetition errors**: Small-3.2 produces less infinite generations or repetitive answers\r\n- **Function calling**: Small-3.2's function calling template is more robust (see [here](https://github.com/mistralai/mistral-common/blob/535b4d0a0fc94674ea17db6cf8dc2079b81cbcfa/src/mistral_common/tokens/tokenizers/instruct.py#L778) and [examples](#function-calling))\r\n\r\nIn all other categories Small-3.2 should match or slightly improve compared to [Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503).\r\n\r\n## Key Features\r\n- same as [Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503#key-features)\r\n\r\n## Benchmark Results\r\nWe compare Mistral-Small-3.2-24B to [Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503).\r\nFor more comparison against other models of similar size, please check [Mistral-Small-3.1's Benchmarks'](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503#benchmark-results)\r\n\r\n### Text \r\n#### Instruction Following / Chat / Tone\r\n| Model | Wildbench v2 | Arena Hard v2 | IF (Internal; accuracy) |\r\n|-------|---------------|---------------|------------------------|\r\n| Small 3.1 24B Instruct | 55.6% | 19.56% | 82.75% |\r\n| **Small 3.2 24B Instruct** | **65.33%** | **43.1%** | **84.78%** |\r\n\r\n#### Infinite Generations\r\nSmall 3.2 reduces infinite generations by 2x on challenging, long and repetitive prompts.\r\n| Model | Infinite Generations (Internal; Lower is better) |\r\n|-------|-------|\r\n| Small 3.1 24B Instruct | 2.11% |\r\n| **Small 3.2 24B Instruct** | **1.29%** |\r\n\r\n#### STEM\r\n| Model | MMLU | MMLU Pro (5-shot CoT) | MATH | GPQA Main (5-shot CoT) | GPQA Diamond (5-shot CoT) | MBPP Plus - Pass@5 | HumanEval Plus - Pass@5 | SimpleQA (TotalAcc) |\r\n|-------|------|---------------------|------|------------------------|---------------------------|-------------------|-------------------------|-------------------|\r\n| Small 3.1 24B Instruct | 80.62% | 66.76% | 69.30% | 44.42% | 45.96% | 74.63% | 88.99% | 10.43% |\r\n| **Small 3.2 24B Instruct** | 80.50% | **69.06%** | 69.42% | 44.22% | 46.13% | **78.33%** | **92.90%** | **12.10%** |\r\n\r\n### Vision\r\n| Model | MMMU | Mathvista | ChartQA | DocVQA | AI2D |\r\n|-------|------|-----------|---------|--------|------|\r\n| Small 3.1 24B Instruct | **64.00%** | **68.91%** | 86.24% | 94.08% | 93.72% |\r\n| **Small 3.2 24B Instruct** | 62.50% | 67.09% | **87.4%** | 94.86% | 92.91% |\r\n\r\n## Usage\r\nThe model can be used with the following frameworks:\r\n- [`vllm (recommended)`](https://github.com/vllm-project/vllm)\r\n- [`transformers`](https://github.com/huggingface/transformers)\r\n\r\n**Note 1**: We recommend using a relatively low temperature, such as `temperature=0.15`.\r\n**Note 2**: Add a system prompt from [SYSTEM_PROMPT.txt](https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506/blob/main/SYSTEM_PROMPT.txt) for best results.\r\n\r\n### vLLM (recommended)\r\n#### Installation\r\n```\r\npip install vllm --upgrade\r\n```\r\nCheck installation:\r\n```\r\npython -c \"import mistral_common; print(mistral_common.__version__)\"\r\n```\r\n#### Serve\r\n```\r\nvllm serve mistralai/Mistral-Small-3.2-24B-Instruct-2506 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --limit_mm_per_prompt 'image=10' --tensor-parallel-size 2\r\n```\r\nRequires ~55 GB GPU RAM in bf16/fp16.\r\n\r\n#### Function Calling, Vision Reasoning & Instruction Following\r\nSupports multi-modal reasoning, function/tool calls, and precise instruction following using vLLM API or Transformers. See examples in original README.\r\n\r\n### Transformers\r\nInstall:\r\n```\r\npip install mistral-common --upgrade\r\n```\r\nUse `MistralTokenizer` and `Mistral3ForConditionalGeneration` with the system prompt and optional images for reasoning. Multi-modal inputs and outputs supported. Refer to Python snippets for examples of instruction following, vision reasoning, and function calls.",
+      "license": "Apache-2.0",
+      "url": "https://huggingface.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF/resolve/main/Mistral-Small-3.2-24B-Instruct-2506-Q4_K_M.gguf",
+      "memory": 14300000000,
+      "sha256": "a3cc56310807ed0d145eaf9f018ccda9ae7ad8edb41ec870aa2454b0d4700b3c",
+      "backend": "llama-cpp",
+      "properties": {
+        "jinja": "true"
+      }
+    },
     {
       "id": "hf.qwen.qwen3-4b-GGUF",
       "name": "qwen/qwen3-4b-GGUF",