Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visual-language assistant with Pixtral and OpenVINO #2489

Open
matrix1233 opened this issue Oct 29, 2024 Discussed in #2479 · 2 comments
Open

Visual-language assistant with Pixtral and OpenVINO #2489

matrix1233 opened this issue Oct 29, 2024 Discussed in #2479 · 2 comments
Assignees

Comments

@matrix1233
Copy link

Discussed in #2479

Originally posted by matrix1233 October 28, 2024
Hello,

I followed the exact solution provided in the OpenVINO documentation here: https://docs.openvino.ai/2024/notebooks/pixtral-with-output.html, but I am encountering a persistent error during the model conversion to ONNX.

Error:

RuntimeError: The serialized model is larger than the 2GiB limit imposed by the protobuf library. Therefore the output file must be a file path, so that the ONNX external data can be written to the same directory. Please specify the output file name.
Details:

For reference, I tested this both on Hugging Face Spaces and on my own server, with the same result.

Log:


optimum-cli export openvino -m "mistral-community/pixtral-12b" --weight-format int8 pixtral-12b/INT8
2024-10-27 20:34:11.520991: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-10-27 20:34:11.552790: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
No ROCm runtime is found, using ROCM_HOME='/opt/rocm-6.2.2'
config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 997/997 [00:00<00:00, 13.9MB/s]
model.safetensors.index.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 57.9k/57.9k [00:00<00:00, 717kB/s]
model-00001-of-00006.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 4.99G/4.99G [01:58<00:00, 42.0MB/s]
model-00002-of-00006.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 4.96G/4.96G [01:57<00:00, 42.1MB/s]
model-00003-of-00006.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 4.91G/4.91G [01:56<00:00, 42.2MB/s]
model-00004-of-00006.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 4.91G/4.91G [01:56<00:00, 42.0MB/s]
model-00005-of-00006.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 4.26G/4.26G [01:41<00:00, 42.1MB/s]
model-00006-of-00006.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 1.34G/1.34G [00:31<00:00, 42.4MB/s]
Downloading shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [10:04<00:00, 100.77s/it]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00,  8.05it/s]
generation_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 116/116 [00:00<00:00, 1.74MB/s]
tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 177k/177k [00:00<00:00, 1.05MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.26M/9.26M [00:00<00:00, 13.9MB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 414/414 [00:00<00:00, 5.97MB/s]
processor_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 162/162 [00:00<00:00, 2.54MB/s]
chat_template.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.63k/1.63k [00:00<00:00, 24.9MB/s]
preprocessor_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 483/483 [00:00<00:00, 7.16MB/s]
We detected that you are passing past_key_values as a tuple of tuples. This is deprecated and will be removed in v4.47. Please convert your cache or use an appropriate Cache class (https://huggingface.co/docs/transformers/kv_cache#legacy-cache-format)
/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/transformers/cache_utils.py:447: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  or len(self.key_cache[layer_idx]) == 0  # the layer has no cache
/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/transformers/cache_utils.py:432: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  elif len(self.key_cache[layer_idx]) == 0:  # fills previously skipped layers; checking for tensor causes errors
Starting from v4.46, the logits model output will have the same type as the model (except at train time, where it will always be FP32)
[ WARNING ] Unexpectedly found already patched module language_model.model.embed_tokens while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
[ WARNING ] Unexpectedly found already patched module language_model.model.layers.0.self_attn.q_proj while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
[ WARNING ] Unexpectedly found already patched module language_model.model.layers.0.self_attn.k_proj while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
[ WARNING ] Unexpectedly found already patched module language_model.model.layers.0.self_attn.v_proj while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
[ WARNING ] Unexpectedly found already patched module language_model.model.layers.0.self_attn.o_proj while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.

model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
[ WARNING ] Unexpectedly found already patched module language_model.model.layers.38.mlp.down_proj while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
[ WARNING ] Unexpectedly found already patched module language_model.model.layers.39.self_attn.q_proj while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
[ WARNING ] Unexpectedly found already patched module language_model.model.layers.39.self_attn.k_proj while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
[ WARNING ] Unexpectedly found already patched module language_model.model.layers.39.self_attn.v_proj while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
[ WARNING ] Unexpectedly found already patched module language_model.model.layers.39.self_attn.o_proj while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
.
.
.

[ WARNING ] Unexpectedly found already patched module language_model.model.layers.39.mlp.gate_proj while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
[ WARNING ] Unexpectedly found already patched module language_model.model.layers.39.mlp.up_proj while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
[ WARNING ] Unexpectedly found already patched module language_model.model.layers.39.mlp.down_proj while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
[ WARNING ] Unexpectedly found already patched module language_model.lm_head while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/transformers/models/pixtral/modeling_pixtral.py:492: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  patch_embeds_list = [self.patch_conv(img.unsqueeze(0).to(self.dtype)) for img in pixel_values]
/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/nncf/torch/dynamic_graph/wrappers.py:86: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  op1 = operator(*args, **kwargs)
/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/transformers/models/pixtral/modeling_pixtral.py:448: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  for start, end in zip(block_start_idx, block_end_idx):
[ WARNING ] Unexpectedly found already patched module  while applying ModuleExtension during PyTorch model conversion. Result of the conversion maybe broken. Depending on the exact issue it may lead to broken original model.
Export model to OpenVINO directly failed with: 
Config dummy inputs are not a subset of the model inputs: {'input'} vs {'kwargs', 'args'}.
Model will be exported to ONNX
Traceback (most recent call last):
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/optimum/exporters/openvino/convert.py", line 382, in export_pytorch
    check_dummy_inputs_are_allowed(model, dummy_inputs)
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py", line 97, in check_dummy_inputs_are_allowed
    raise ValueError(
ValueError: Config dummy inputs are not a subset of the model inputs: {'input'} vs {'kwargs', 'args'}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/local/miniconda3/envs/onnx/bin/optimum-cli", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/optimum/commands/optimum_cli.py", line 208, in main
    service.run()
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/optimum/commands/export/openvino.py", line 349, in run
    main_export(
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/optimum/exporters/openvino/__main__.py", line 393, in main_export
    submodel_paths = export_from_model(
                     ^^^^^^^^^^^^^^^^^^
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/optimum/exporters/openvino/convert.py", line 701, in export_from_model
    export_models(
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/optimum/exporters/openvino/convert.py", line 504, in export_models
    export(
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/optimum/exporters/openvino/convert.py", line 144, in export
    return export_pytorch(
           ^^^^^^^^^^^^^^^
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/optimum/exporters/openvino/convert.py", line 408, in export_pytorch
    return export_pytorch_via_onnx(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/optimum/exporters/openvino/convert.py", line 256, in export_pytorch_via_onnx
    input_names, output_names = export_pytorch_to_onnx(
                                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py", line 584, in export_pytorch
    onnx_export(
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/torch/onnx/__init__.py", line 375, in export
    export(
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/torch/onnx/utils.py", line 502, in export
    _export(
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/torch/onnx/utils.py", line 1564, in _export
    graph, params_dict, torch_out = _model_to_graph(
                                    ^^^^^^^^^^^^^^^^
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/torch/onnx/utils.py", line 1117, in _model_to_graph
    graph = _optimize_graph(
            ^^^^^^^^^^^^^^^^
  File "/home/local/miniconda3/envs/onnx/lib/python3.11/site-packages/torch/onnx/utils.py", line 663, in _optimize_graph
    _C._jit_pass_onnx_graph_shape_type_inference(
RuntimeError: The serialized model is larger than the 2GiB limit imposed by the protobuf library. Therefore the output file must be a file path, so that the ONNX external data can be written to the same directory. Please specify the output file name.

```</div>
@YuChern-Intel YuChern-Intel self-assigned this Oct 29, 2024
@eaidova
Copy link
Collaborator

eaidova commented Oct 30, 2024

@matrix1233 could you please try to install optimum-intel from this branch huggingface/optimum-intel#968?

@matrix1233
Copy link
Author

Thanks! It works now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants