RuntimeError: Given groups=1, weight of size [1024, 3, 14, 14], expected input[16, 9, 336, 336] to have 3 channels, but got 9 channels instead #5

piantic · 2024-03-21T14:28:15Z

First of all, thank you for publishing a good paper.
As you mentioned in the issue, the benchmark performance is overall good.

Unfortunately, the weights are not public now, so I am trying to train the model myself.
I was able to train pretrain stage, so it is okay.

But there are some issues in fine-tuning stage.
Runtime errors keep occurring during this stage.
RuntimeError: Given groups=1, weight of size [1024, 3, 14, 14], expected input[16, 9, 336, 336] to have 3 channels, but got 9 channels instead

I checked the loss and the loss did not change from 0.0.
{'loss': 0.0, 'learning_rate': 1.6279069767441862e-06, 'epoch': 0.0}

I suspected your slice_logic and noticed that the output was unusual.
But other issues say it's normal, so I don't think this is the problem.

Weird slice_logic test output #2

Could you please give me some advice on this?

The text was updated successfully, but these errors were encountered:

gordonhu608 · 2024-03-24T19:24:27Z

Got this runtime error too, "RuntimeError: Given groups=1, weight of size [1024, 3, 14, 14], expected input[2, 9, 336, 336] to have 3 channels, but got 9 channels instead". Has this been solved?

piantic · 2024-03-29T08:41:27Z

@BubvieyKevin Thank you, let's wait until the code is ready again.

guozonghao96 · 2024-03-31T02:31:23Z

Thank you for identifying some issues with our code. We have also noticed the same problems and are currently working on resolving them.

gordonhu608 · 2024-04-04T04:55:50Z

Thanks all authors for this great work. How's the progress concerning addressing this issue?

xrorrim · 2024-04-14T16:18:42Z

Thanks for report this problem and we have fixed it in the latest version of code.

piantic · 2024-04-16T04:03:21Z

Thanks a lot. we will test it again

gordonhu608 · 2024-04-16T17:13:04Z

I just tested the code again and still got this error, RuntimeError: Given groups=1, weight of size [1024, 3, 14, 14], expected input[4, 15, 336, 336] to have 3 channels, but got 15 channels instead. Does this problem also happen to other people?

lucasjinreal · 2024-04-18T04:08:25Z

Am not able to train either.

However, I still quite not very understand the code, the process_image part actually turns every single image into 336 resolution, why it still interpolate in vit?

Anyone knows on this part?

zyddnys · 2024-04-30T21:28:10Z

Change https://github.com/thunlp/LLaVA-UHD/blob/main/llava_uhd/train/llava-uhd/train.py#L766
to

if all(x is not None and x.shape == images[0].shape for x in images) and False:

gordonhu608 · 2024-04-30T21:36:39Z

Change https://github.com/thunlp/LLaVA-UHD/blob/main/llava_uhd/train/llava-uhd/train.py#L766 to
if all(x is not None and x.shape == images[0].shape for x in images) and False:

Does this change fix the training? And how's the training results of replicating LLaVA-UHD?

YFCYFC · 2024-05-08T03:05:08Z

https://github.com/thunlp/LLaVA-UHD/blob/main/llava_uhd/train/llava-uhd/train.py#L766

No, this does not fix the bug, I still meet the same bug.

ParadoxZW · 2024-06-13T07:55:30Z

Hi, guys @piantic @zyddnys @lucasjinreal @YFCYFC @gordonhu608 @guozonghao96

I've released another implementation of LLaVA-UHD here, which I believe is more stable and elegant. The code of the new repo originates from this repo, but its overall quality is improved, and the training program is tested to be able to normally run without bugs.

When I reviewed this old repo and tried to fix this RuntimeError issue, I found it contains a lot of hidden bugs and calculations with wrong logic (violating the spirit of the original paper), and misses some necessary process (such as, image normalization). So I decided to rewrite the code and try my best to fix all these issues. Now I open-sourced my rewritten version.

You are very welcome to use it, and I look forward to your feedback.

guozonghao96 · 2025-01-04T02:14:41Z

Our repository has been fully improved, and almost all bugs have been eliminated. For details, please refer to the main branch and the LLaVA-UHD v1 branch. This issue is now closed. If there are any new problems, feel free to open a new issue.

thunlp deleted a comment from BubvieyKevin Apr 14, 2024

guozonghao96 closed this as completed Jan 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Given groups=1, weight of size [1024, 3, 14, 14], expected input[16, 9, 336, 336] to have 3 channels, but got 9 channels instead #5

RuntimeError: Given groups=1, weight of size [1024, 3, 14, 14], expected input[16, 9, 336, 336] to have 3 channels, but got 9 channels instead #5

piantic commented Mar 21, 2024

gordonhu608 commented Mar 24, 2024

piantic commented Mar 29, 2024

guozonghao96 commented Mar 31, 2024

gordonhu608 commented Apr 4, 2024

xrorrim commented Apr 14, 2024

piantic commented Apr 16, 2024

gordonhu608 commented Apr 16, 2024

lucasjinreal commented Apr 18, 2024

zyddnys commented Apr 30, 2024

gordonhu608 commented Apr 30, 2024

YFCYFC commented May 8, 2024

ParadoxZW commented Jun 13, 2024

guozonghao96 commented Jan 4, 2025

RuntimeError: Given groups=1, weight of size [1024, 3, 14, 14], expected input[16, 9, 336, 336] to have 3 channels, but got 9 channels instead #5

RuntimeError: Given groups=1, weight of size [1024, 3, 14, 14], expected input[16, 9, 336, 336] to have 3 channels, but got 9 channels instead #5

Comments

piantic commented Mar 21, 2024

gordonhu608 commented Mar 24, 2024

piantic commented Mar 29, 2024

guozonghao96 commented Mar 31, 2024

gordonhu608 commented Apr 4, 2024

xrorrim commented Apr 14, 2024

piantic commented Apr 16, 2024

gordonhu608 commented Apr 16, 2024

lucasjinreal commented Apr 18, 2024

zyddnys commented Apr 30, 2024

gordonhu608 commented Apr 30, 2024

YFCYFC commented May 8, 2024

ParadoxZW commented Jun 13, 2024

guozonghao96 commented Jan 4, 2025