Skip to content

Floating-point precision issues in iamge_processing_deepseek_vl.py #43186

@frankenliu

Description

@frankenliu

System Info

  • transformers version: 4.57.1
  • Platform: Linux-6.8.0-54-generic-x86_64-with-glibc2.39
  • Python version: 3.12.12
  • Huggingface_hub version: 0.36.0
  • Safetensors version: 0.6.2
  • Accelerate version: 1.11.0
  • Accelerate config: not found
  • DeepSpeed version: 0.18.1
  • PyTorch version (accelerator?): 2.9.0+cu128 (CUDA)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?:
  • Using GPU in script?:

Who can help?

@yonigozlan @molbap

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

In image_processing_deepseek_vl.py line 164-182

    if input_data_format is None:
        input_data_format = infer_channel_dimension_format(image)

    height, width = get_image_size(image, input_data_format)
    max_size = max(height, width)

    size = get_size_dict(size, default_to_square=True)
    if size["height"] != size["width"]:
        raise ValueError(
            f"Output height and width must be the same. Got height={size['height']} and width={size['width']}"
        )
    size = size["height"]

    delta = size / max_size
    # Largest side becomes `size` and the other side is scaled according to the aspect ratio.
    output_size_nonpadded = [
        max(int(height * delta), self.min_size),
        max(int(width * delta), self.min_size),
    ]

when height=2522, size=384, run:

    height, width = 2522, 928
    size = 384
    delta = size / height
    out_size = [int(height * delta), int(width * delta)]
    print(f"out_size:{out_size}")
    new_out_size = [round(height * delta), round(width * delta)]
    print(f"new_out_size:{new_out_size}")

out_size:[383, 141]
new_out_size:[384, 141]

Expected behavior

change
output_size_nonpadded = [
max(int(height * delta), self.min_size),
max(int(width * delta), self.min_size),
]
to
output_size_nonpadded = [
max(round(height * delta), self.min_size),
max(round(width * delta), self.min_size),
]

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions