Tencent · ali-88123 · Nov 4, 2025 · Nov 3, 2025 · Nov 3, 2025 · Nov 3, 2025
diff --git a/README.md b/README.md
@@ -17,7 +17,7 @@ A more accessible, comprehensive, and efficient toolkit for large model compress
 </p>
 
 ## 📣Latest News
-- [25/11/03] We have released v0.2. Quantization support for new models, such as `GLM-4.6` and `Qwen3-VL`, open-sources the Eagle3 speculative decoding training framework, and updates the Diffusion model quantization tools.
+- [25/11/05] We have released v0.2. Quantization support for new models, such as `GLM-4.6`, `Qwen3-VL` and `Qwen3-Omni`, open-sources the Eagle3 speculative decoding training framework, and updates the Diffusion model quantization tools.
 - [25/09/30] We have released **SpecExit**, the reasoning early-exit algorithm: [[Paper]](http://arxiv.org/abs/2509.24248) | [[Docs]](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html) | [[vLLM Code]](https://github.com/vllm-project/vllm/pull/27192)🔥🔥🔥
 - [25/09/26] We have released **TEQUILA**, the ternary quantization algorithm [[Paper]](https://arxiv.org/abs/2509.23809) | [[Code]](https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant)🔥🔥🔥
 - [25/09/24] We now support the PTQ quantification of NVFP4 for the Qwen3 series models. We also opensource [Qwen3-32B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-32B_nvfp4) and [Qwen3-235B-A22B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-235B-A22B_nvfp4) weights.
@@ -171,7 +171,7 @@ A more accessible, comprehensive, and efficient toolkit for large model compress
       </td>
       <td>
         <ul style="padding-left: 0; list-style-position: inside;">
-          <li>Under Development</li>
+          <li><a href="https://github.com/Tencent/AngelSlim/blob/main/docs/source/models/qwen3_omni/qwen3_omni_quant.md">FP8-Static/Dynamic</a></li>
         </ul>
       </td>
       <td>
@@ -510,7 +510,40 @@ Benchmark results for Qwen2.5VL series models with `BF16`、`FP8-Static`、`FP8-
 
 </details>
 
-#### 1.5 Other Models
+#### 1.5 Qwen-Omni Series Models
+
+**Qwen3-Omni Text to Text Benchmark**
+
+Benchmark results for Qwen3-Omni series models in BF16, FP8-Static, and FP8-Dynamic on aime25, gpqa_diamond, and mmlu_redux are as follows:
+
+<table>
+  <thead>
+    <tr><th>Model</th><th>Quantization</th><th>aime25</th><th>gpqa_diamond</th><th>mmlu_redux</th></tr>
+  </thead>
+  <tbody>
+    <tr><td rowspan="3">Qwen3-Omni-30B-A3B-Instruct</td><td>BF16</td><td>73.32</td><td>56.77</td><td>88.09</td></tr>
+    <tr><td>FP8-Static</td><td>71.33</td><td>56.57</td><td>87.91</td></tr>
+    <tr><td>FP8-Dynamic</td><td>73.33</td><td>55.15</td><td>88.07</td></tr>
+  </tbody>
+</table>
+
+<details>
+<summary>Note</summary>
+
+> - The above evaluation results were obtained by deploying with the vLLM framework and averaging over 5 runs (vLLM only supports the thinker component).
+> - The hyperparameters used during evaluation are as follows:
+> ```json
+>{
+>  "top_p": 0.95,
+>  "temperature": 0.6,
+>  "do_sample": true,
+>  "max-model-len 65536": 65536
+>}
+>```
+
+</details>
+
+#### 1.6 Other Models
 
 Other models such as GLM-4.6, Qwen2.5, and Seed-OSS have been evaluated on benchmarks like `CEVAL`, `MMLU`, and `GSM8K` using quantization strategies including `FP8-Static`, `FP8-Dynamic`, `INT4-GPTQ`, and `INT4-AWQ`.
 

diff --git a/README_cn.md b/README_cn.md
@@ -17,7 +17,7 @@
 </p>
 
 ## 📣最新进展
-- [25/11/03] 我们发布V0.2版本，支持了包括GLM-4.6/Qwen3-VL等更多模型的量化，开源投机采样Eagle3训练框架，更新Diffusion模型量化工具。
+- [25/11/05] 我们发布V0.2版本，支持了包括GLM-4.6/Qwen3-VL/Qwen3-Omni等更多模型的量化，开源投机采样Eagle3训练框架，更新Diffusion模型量化工具。
 - [25/09/30] 我们开源了思考早退新算法 **SpecExit** [[论文]](http://arxiv.org/abs/2509.24248) | [[文档]](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html) | [[vLLM代码]](https://github.com/vllm-project/vllm/pull/27192)🔥🔥🔥
 - [25/09/30] 我们发布了三值量化新算法 **Tequila** [[论文]](https://arxiv.org/abs/2509.23809) | [[代码]](https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant)。🔥🔥🔥
 - [25/09/24] 我们支持了Qwen3系列模型的NVFP4的PTQ量化，我们还开源了[Qwen3-32B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-32B_nvfp4)、[Qwen3-235B-A22B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-235B-A22B_nvfp4)权重。
@@ -172,7 +172,7 @@
       </td>
       <td>
         <ul style="padding-left: 0; list-style-position: inside;">
-          <li>建设中</li>
+          <li><a href="https://github.com/Tencent/AngelSlim/blob/main/docs/source/models/qwen3_omni/qwen3_omni_quant.md">FP8-Static/Dynamic</a></li>
         </ul>
       </td>
       <td>
@@ -517,7 +517,40 @@ Qwen2.5VL系列模型的`BF16`、`FP8-Static`、`FP8-Dynamic`、`INT4-GPTQ`、`I
 
 </details>
 
-#### 1.5 其他模型
+#### 1.5 Qwen-Omni 系列模型
+
+**Qwen3-Omni Text to Text Benchmark**
+
+Qwen3-Omni系列模型的`BF16`、`FP8-Static`、`FP8-Dynamic`在`aime25`、`gpqa_diamond`、`mmlu_redux`上的评测结果如下：
+
+<table>
+  <thead>
+    <tr><th>Model</th><th>Quantization</th><th>aime25</th><th>gpqa_diamond</th><th>mmlu_redux</th></tr>
+  </thead>
+  <tbody>
+    <tr><td rowspan="3">Qwen3-Omni-30B-A3B-Instruct</td><td>BF16</td><td>73.32</td><td>56.77</td><td>88.09</td></tr>
+    <tr><td>FP8-Static</td><td>71.33</td><td>56.57</td><td>87.91</td></tr>
+    <tr><td>FP8-Dynamic</td><td>73.33</td><td>55.15</td><td>88.07</td></tr>
+  </tbody>
+</table>
+
+<details>
+<summary>备注</summary>
+
+> - 以上评测结果使用vllm框架部署测试5次求平均(vllm只支持thinker部分)
+> - 评测时使用的超参如下:
+> ```json
+>{
+>  "top_p": 0.95,
+>  "temperature": 0.6,
+>  "do_sample": true,
+>  "max-model-len 65536": 65536
+>}
+>```
+
+</details>
+
+#### 1.6 其他模型
 
 其他模型比如GLM、Qwen2.5、Seed-OSS等模型利用`FP8-Static`、`FP8-Dynamic`、`INT4-GPTQ`、`INT4-AWQ`量化等策略在`CEVAL`、`MMLU`、`GSM8K`上进行了评测。
 

diff --git a/angelslim/compressor/quant/ptq.py b/angelslim/compressor/quant/ptq.py
@@ -14,6 +14,7 @@
 
 import json
 import os
+import warnings
 
 import torch
 from safetensors.torch import load_file
@@ -193,9 +194,19 @@ def _convert(self):
                 )
                 is not None
             ):
-                self.quant_model.act_scales_dict[name] = self.ptq_hook.observer_dict[
-                    sub_layer
-                ].act_observer.scales()
+                try:
+                    self.quant_model.act_scales_dict[name] = (
+                        self.ptq_hook.observer_dict[sub_layer].act_observer.scales()
+                    )
+                except ValueError:
+                    self.quant_model.act_scales_dict[name] = torch.tensor(
+                        1.0, device=torch.cuda.current_device()
+                    )
+                    warnings.warn(
+                        f"Not calibrated for {name}. Using default act scale 1.0.",
+                        RuntimeWarning,
+                        stacklevel=2,
+                    )
             if (
                 getattr(  # noqa: B009
                     self.ptq_hook.observer_dict[sub_layer], "kv_cache_observer"

diff --git a/angelslim/data/__init__.py b/angelslim/data/__init__.py
@@ -6,5 +6,6 @@
 
 from .dataloader import DataLoaderFactory  # noqa: F401
 from .multimodal_dataset import MultiModalDataset  # noqa: F401
+from .omni_dataset import OmniDataset  # noqa: F401
 from .text2image_dataset import Text2ImageDataset  # noqa: F401
 from .text_dataset import TextDataset  # noqa: F401
diff --git a/angelslim/data/dataloader.py b/angelslim/data/dataloader.py
@@ -20,6 +20,7 @@
 
 from .base_dataset import BaseDataset
 from .multimodal_dataset import MultiModalDataset
+from .omni_dataset import OmniDataset
 from .text2image_dataset import Text2ImageDataset
 from .text_dataset import TextDataset
 
@@ -39,6 +40,7 @@ def create_data_loader(
         data_type: str = "auto",
         num_workers: int = 0,
         inference_settings: Dict = None,
+        use_audio_in_video: bool = False,
         model_name: str = None,
     ) -> DataLoader:
         """
@@ -98,6 +100,16 @@ def create_data_loader(
                 num_samples=num_samples,
                 inference_settings=inference_settings,
             )
+        elif data_type == "OmniDataset":
+            dataset = OmniDataset(
+                processor=processor,
+                device=device,
+                max_length=max_length,
+                num_samples=num_samples,
+                data_source=data_source,
+                is_hf_dataset=not os.path.isfile(data_source),
+                use_audio_in_video=use_audio_in_video,
+            )
         else:
             raise ValueError(f"Unsupported data type: {data_type}")
 

diff --git a/angelslim/data/multimodal_dataset.py b/angelslim/data/multimodal_dataset.py
@@ -16,12 +16,12 @@
 import os
 from typing import Dict, List, Union
 
-import qwen_vl_utils
 from datasets import load_dataset
 from PIL import Image
 from tqdm import tqdm
 from transformers import ProcessorMixin
 
+from ..utils.lazy_imports import qwen_vl_utils
 from .base_dataset import BaseDataset
 
 

diff --git a/angelslim/data/omni_dataset.py b/angelslim/data/omni_dataset.py
@@ -0,0 +1,127 @@
+# Copyright 2025 Tencent Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os
+from pathlib import Path
+from typing import Dict, List, Union
+
+from transformers import ProcessorMixin
+
+from ..utils.lazy_imports import qwen_omni_utils
+from .base_dataset import BaseDataset
+
+
+class OmniDataset(BaseDataset):
+    """Dataset for multimodal (text + image) data"""
+
+    def __init__(
+        self,
+        processor: ProcessorMixin,
+        device: str = "cpu",
+        max_length: int = 4096,
+        num_samples: int = -1,
+        data_source: Union[str, Dict] = None,
+        is_hf_dataset: bool = False,
+        use_audio_in_video: bool = False,
+    ):
+        super().__init__(processor, device, max_length)
+        self.is_hf_dataset = is_hf_dataset
+        self.use_audio_in_video = use_audio_in_video
+
+        self._load_file_based_dataset(data_source, num_samples)
+
+    def _load_file_based_dataset(self, data_path: str, num_samples: int):
+        """Load dataset from local file system"""
+        path_obj = Path(data_path)
+        data_dir = path_obj.parent
+
+        line_count = 0
+        with open(data_path, "r") as f:
+            for line in f:
+                if num_samples > 0 and line_count >= num_samples:
+                    break
+                data = json.loads(line.strip())
+                video_path = None
+                audio_path = None
+                image_path = None
+
+                if "video_path" in data:
+                    video_path = os.path.normpath(
+                        os.path.join(data_dir, data["video_path"])
+                    )
+                if "audio_path" in data:
+                    audio_path = os.path.normpath(
+                        os.path.join(data_dir, data["audio_path"])
+                    )
+                if "image_path" in data:
+                    image_path = os.path.normpath(
+                        os.path.join(data_dir, data["image_path"])
+                    )
+
+                ms = data.get("messages")
+
+                conversation = []
+                for m in ms:
+                    if m["role"] == "system":
+                        conversation.append(
+                            {
+                                "role": "system",
+                                "content": [{"type": "text", "text": m["content"]}],
+                            }
+                        )
+                    elif m["role"] == "user":
+                        content = []
+                        text_content = m["content"]
+                        text_content = (
+                            text_content.replace("<video>", "")
+                            .replace("<audio>", "")
+                            .replace("<image>", "")
+                        )
+                        content.append({"type": "text", "text": text_content})
+                        if video_path:
+                            content.append({"type": "video", "video": video_path})
+                        if audio_path:
+                            content.append({"type": "audio", "audio": audio_path})
+                        if image_path:
+                            content.append({"type": "image", "image": image_path})
+                        conversation.append(
+                            {
+                                "role": "user",
+                                "content": content,
+                            }
+                        )
+                self._process_and_append(conversation)
+                line_count += 1
+
+    def _process_and_append(self, messages: List[Dict]):
+        """Process messages and append to dataset"""
+        text = self.processor.apply_chat_template(
+            messages, tokenize=False, add_generation_prompt=True
+        )
+        audios, images, videos = qwen_omni_utils.process_mm_info(
+            messages, use_audio_in_video=self.use_audio_in_video
+        )
+
+        # Process inputs
+        inputs = self.processor(
+            text=text,
+            images=images,
+            audios=audios,
+            videos=videos,
+            padding=True,
+            return_tensors="pt",
+            use_audio_in_video=self.use_audio_in_video,
+        )
+        self.data.append(inputs)
diff --git a/angelslim/engine.py b/angelslim/engine.py
@@ -73,6 +73,7 @@ def prepare_model(
         cache_dir=None,
         deploy_backend="vllm",
         using_multi_nodes=False,
+        use_audio_in_video=False,
     ) -> Any:
         """Load pretrained model and tokenizer
         Args:
@@ -116,6 +117,16 @@ def prepare_model(
                     using_multi_nodes=using_multi_nodes,
                 )
                 self.model_path = model_path
+        elif self.series in ["Omni"]:
+            if not model:
+                self.slim_model.from_pretrained(
+                    model_path,
+                    torch_dtype=torch_dtype,
+                    device_map=device_map,
+                    trust_remote_code=trust_remote_code,
+                    use_audio_in_video=use_audio_in_video,
+                )
+                self.model_path = model_path
         else:
             raise ValueError(f"Unsupported series: {self.series}")
 
@@ -131,6 +142,7 @@ def prepare_data(
         num_samples=128,
         shuffle=True,
         inference_settings=None,
+        use_audio_in_video=False,
         model_name=None,
     ) -> Optional[Any]:
         """Prepare compression dataset"""
@@ -145,7 +157,7 @@ def prepare_data(
             data_type=data_type,
             processor=(
                 self.slim_model.processor
-                if self.series == "VLM"
+                if self.series == "VLM" or self.series == "Omni"
                 else self.slim_model.tokenizer
             ),
             device=self.slim_model.model.device,
@@ -155,6 +167,7 @@ def prepare_data(
             num_samples=num_samples,
             data_source=data_path,
             inference_settings=inference_settings,
+            use_audio_in_video=use_audio_in_video,
             model_name=model_name,
         )
         self.max_seq_length = max_length
@@ -187,7 +200,7 @@ def prepare_compressor(
                     f"Compression method '{method_name}' not registered. "
                     f"Available methods: {CompressorFactory.get_available_compressor()}"
                 )
-        if self.series in ["LLM", "VLM"]:
+        if self.series in ["LLM", "VLM", "Omni"]:
             global_config.update(self.model_path, self.max_seq_length)
 
         if default_method:

diff --git a/angelslim/models/__init__.py b/angelslim/models/__init__.py
@@ -15,4 +15,5 @@
 from .diffusion import *  # noqa: F401 F403
 from .llm import *  # noqa: F401 F403
 from .model_factory import SlimModelFactory  # noqa: F401
+from .omni import *  # noqa: F401 F403
 from .vlm import *  # noqa: F401 F403