Visual-language assistant with Pixtral and OpenVINO #2489

matrix1233 · 2024-10-29T21:38:34Z

Discussed in #2479

^{Originally posted by matrix1233 October 28, 2024}
Hello,

I followed the exact solution provided in the OpenVINO documentation here: https://docs.openvino.ai/2024/notebooks/pixtral-with-output.html, but I am encountering a persistent error during the model conversion to ONNX.

Error:

RuntimeError: The serialized model is larger than the 2GiB limit imposed by the protobuf library. Therefore the output file must be a file path, so that the ONNX external data can be written to the same directory. Please specify the output file name.
Details:

For reference, I tested this both on Hugging Face Spaces and on my own server, with the same result.

Log:


optimum-cli export openvino -m "mistral-community/pixtral-12b" --weight-format int8 pixtral-12b/INT8
2024-10-27 20:34:11.520991: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-10-27 20:34:11.552790: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
No ROCm runtime is found, using ROCM_HOME='/opt/rocm-6.2.2'
config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 997/997 [00:00<00:00, 13.9MB/s]
model.safetensors.index.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 57.9k/57.9k [00:00<00:00, 717kB/s]
model-00001-of-00006.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 4.99G/4.99G [01:58<00:00, 42.0MB/s]
model-00002-of-00006.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 4.96G/4.96G [01:57<00:00, 42.1MB/s]
model-00003-of-00006.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 4.91G/4.91G [01:56<00:00, 42.2MB/s]
model-00004-of-00006.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 4.91G/4.91G [01:56<00:00, 42.0MB/s]
model-00005-of-00006.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 4.26G/4.26G [01:41<00:00, 42.1MB/s]
model-00006-of-00006.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 1.34G/1.34G [00:31<00:00, 42.4MB/s]
Downloading shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [10:04<00:00, 100.77s/it]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00,  8.05it/s]
generation_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 116/116 [00:00<00:00, 1.74MB/s]
tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 177k/177k [00:00<00:00, 1.05MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.26M/9.26M [00:00<00:00, 13.9MB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 414/414 [00:00<00:00, 5.97MB/s]
processor_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 162/162 [00:00<00:00, 2.54MB/s]
chat_template.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.63k/1.63k [00:00<00:00, 24.9MB/s]
preprocessor_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 483/483 [00:00<00:00, 7.16MB/s]
We detected that you are passing past_key_values as a tuple of tuples. This is deprecated and will be removed in v4.47. Please convert your cache or use an appropriate Cache class (https://huggingface.co/docs/transformers/kv_cache#legacy-cache-format)
/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/transformers/cache_utils.py:447: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  or len(self.key_cache[layer_idx]) == 0  # the layer has no cache
/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/transformers/cache_utils.py:432: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  elif len(self.key_cache[layer_idx]) == 0:  # fills previously skipped layers; checking for tensor causes errors
Starting from v4.46, the logits model output will have the same type as the model (except at train time, where it will always be FP32)
[ WARNING ] Unexpectedly found already patched module language_model.model.embed_tokens while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
[ WARNING ] Unexpectedly found already patched module language_model.model.layers.0.self_attn.q_proj while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
[ WARNING ] Unexpectedly found already patched module language_model.model.layers.0.self_attn.k_proj while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
[ WARNING ] Unexpectedly found already patched module language_model.model.layers.0.self_attn.v_proj while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
[ WARNING ] Unexpectedly found already patched module language_model.model.layers.0.self_attn.o_proj while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.

model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
[ WARNING ] Unexpectedly found already patched module language_model.model.layers.38.mlp.down_proj while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
[ WARNING ] Unexpectedly found already patched module language_model.model.layers.39.self_attn.q_proj while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
[ WARNING ] Unexpectedly found already patched module language_model.model.layers.39.self_attn.k_proj while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
[ WARNING ] Unexpectedly found already patched module language_model.model.layers.39.self_attn.v_proj while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
[ WARNING ] Unexpectedly found already patched module language_model.model.layers.39.self_attn.o_proj while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
.
.
.

[ WARNING ] Unexpectedly found already patched module language_model.model.layers.39.mlp.gate_proj while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
[ WARNING ] Unexpectedly found already patched module language_model.model.layers.39.mlp.up_proj while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
[ WARNING ] Unexpectedly found already patched module language_model.model.layers.39.mlp.down_proj while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
[ WARNING ] Unexpectedly found already patched module language_model.lm_head while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/transformers/models/pixtral/modeling_pixtral.py:492: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  patch_embeds_list = [self.patch_conv(img.unsqueeze(0).to(self.dtype)) for img in pixel_values]
/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/nncf/torch/dynamic_graph/wrappers.py:86: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  op1 = operator(*args, **kwargs)
/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/transformers/models/pixtral/modeling_pixtral.py:448: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  for start, end in zip(block_start_idx, block_end_idx):
[ WARNING ] Unexpectedly found already patched module  while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
Export model to OpenVINO directly failed with: 
Config dummy inputs are not a subset of the model inputs: {'input'} vs {'kwargs', 'args'}.
Model will be exported to ONNX
Traceback (most recent call last):
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/optimum/exporters/openvino/convert.py", line 382, in export_pytorch
    check_dummy_inputs_are_allowed(model, dummy_inputs)
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py", line 97, in check_dummy_inputs_are_allowed
    raise ValueError(
ValueError: Config dummy inputs are not a subset of the model inputs: {'input'} vs {'kwargs', 'args'}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/local/miniconda3/envs/onnx/bin/optimum-cli", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/optimum/commands/optimum_cli.py", line 208, in main
    service.run()
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/optimum/commands/export/openvino.py", line 349, in run
    main_export(
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/optimum/exporters/openvino/__main__.py", line 393, in main_export
    submodel_paths = export_from_model(
                     ^^^^^^^^^^^^^^^^^^
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/optimum/exporters/openvino/convert.py", line 701, in export_from_model
    export_models(
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/optimum/exporters/openvino/convert.py", line 504, in export_models
    export(
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/optimum/exporters/openvino/convert.py", line 144, in export
    return export_pytorch(
           ^^^^^^^^^^^^^^^
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/optimum/exporters/openvino/convert.py", line 408, in export_pytorch
    return export_pytorch_via_onnx(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/optimum/exporters/openvino/convert.py", line 256, in export_pytorch_via_onnx
    input_names, output_names = export_pytorch_to_onnx(
                                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py", line 584, in export_pytorch
    onnx_export(
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/torch/onnx/__init__.py", line 375, in export
    export(
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/torch/onnx/utils.py", line 502, in export
    _export(
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/torch/onnx/utils.py", line 1564, in _export
    graph, params_dict, torch_out = _model_to_graph(
                                    ^^^^^^^^^^^^^^^^
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/torch/onnx/utils.py", line 1117, in _model_to_graph
    graph = _optimize_graph(
            ^^^^^^^^^^^^^^^^
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/torch/onnx/utils.py", line 663, in _optimize_graph
    _C._jit_pass_onnx_graph_shape_type_inference(
RuntimeError: The serialized model is larger than the 2GiB limit imposed by the protobuf library. Therefore the output file must be a file path, so that the ONNX external data can be written to the same directory. Please specify the output file name.

```</div>

The text was updated successfully, but these errors were encountered:

eaidova · 2024-10-30T03:59:27Z

@matrix1233 could you please try to install optimum-intel from this branch huggingface/optimum-intel#968?

matrix1233 · 2024-11-01T08:51:25Z

Thanks! It works now

YuChern-Intel self-assigned this Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Visual-language assistant with Pixtral and OpenVINO #2489

Visual-language assistant with Pixtral and OpenVINO #2489

matrix1233 commented Oct 29, 2024

eaidova commented Oct 30, 2024

matrix1233 commented Nov 1, 2024

Visual-language assistant with Pixtral and OpenVINO #2489

Visual-language assistant with Pixtral and OpenVINO #2489

Comments

matrix1233 commented Oct 29, 2024

Discussed in #2479

eaidova commented Oct 30, 2024

matrix1233 commented Nov 1, 2024