-
Notifications
You must be signed in to change notification settings - Fork 31.7k
Open
Labels
Description
System Info
transformersversion: 4.57.1- Platform: Linux-6.8.0-54-generic-x86_64-with-glibc2.39
- Python version: 3.12.12
- Huggingface_hub version: 0.36.0
- Safetensors version: 0.6.2
- Accelerate version: 1.11.0
- Accelerate config: not found
- DeepSpeed version: 0.18.1
- PyTorch version (accelerator?): 2.9.0+cu128 (CUDA)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?:
- Using GPU in script?:
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
In image_processing_deepseek_vl.py line 164-182
if input_data_format is None:
input_data_format = infer_channel_dimension_format(image)
height, width = get_image_size(image, input_data_format)
max_size = max(height, width)
size = get_size_dict(size, default_to_square=True)
if size["height"] != size["width"]:
raise ValueError(
f"Output height and width must be the same. Got height={size['height']} and width={size['width']}"
)
size = size["height"]
delta = size / max_size
# Largest side becomes `size` and the other side is scaled according to the aspect ratio.
output_size_nonpadded = [
max(int(height * delta), self.min_size),
max(int(width * delta), self.min_size),
]
when height=2522, size=384, run:
height, width = 2522, 928
size = 384
delta = size / height
out_size = [int(height * delta), int(width * delta)]
print(f"out_size:{out_size}")
new_out_size = [round(height * delta), round(width * delta)]
print(f"new_out_size:{new_out_size}")
out_size:[383, 141]
new_out_size:[384, 141]
Expected behavior
change
output_size_nonpadded = [
max(int(height * delta), self.min_size),
max(int(width * delta), self.min_size),
]
to
output_size_nonpadded = [
max(round(height * delta), self.min_size),
max(round(width * delta), self.min_size),
]