-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Support any shape of Llava (from llava 1.6) #460
base: main
Are you sure you want to change the base?
Conversation
Not working with zero3: #432 (comment) |
qlora does not currently support zero3. |
It is not the issue with 4bit. I used full and no lora, however the 'newline' is somehow not compatible with Zero3. |
@hhaAndroid @tpoisonooo 这个PR啥时候合并呢?我急需 |
当我开始pretrain的时候,报错了。。这是什么错误呢? RuntimeError: The expanded size of the tensor (4096) must match the existing size (0) at non-singleton dimension 0. Target sizes: [4096, 32, 1]. Tensor sizes: [0, 1, 1]
model = self.train_loop.run() # type: ignore
File "/usr/local/lib/python3.10/dist-packages/mmengine/runner/loops.py", line 270, in run
self.runner.call_hook('before_train')
File "/usr/local/lib/python3.10/dist-packages/mmengine/runner/_flexible_runner.py", line 1271, in call_hook
getattr(hook, fn_name)(self, **kwargs)
File "/export/App/training_platform/PinoModel/xtuner/xtuner/engine/hooks/evaluate_chat_hook.py", line 221, in before_train
self._generate_samples(runner, max_new_tokens=50)
File "/export/App/training_platform/PinoModel/xtuner/xtuner/engine/hooks/evaluate_chat_hook.py", line 207, in _generate_samples
self._eval_images(runner, model, device, max_new_tokens,
File "/export/App/training_platform/PinoModel/xtuner/xtuner/engine/hooks/anyshape_evaluate_chat_hook.py", line 53, in _eval_images
image_features = model.preprocess_for_pixel_values({
File "/export/App/training_platform/PinoModel/xtuner/xtuner/model/anyshape_llava.py", line 109, in preprocess_for_pixel_values
self.image_newline[:, None, None].expand(
RuntimeError: The expanded size of the tensor (4096) must match the existing size (0) at non-singleton dimension 0. Target sizes: [4096, 32, 1]. Tensor sizes: [0, 1, 1]
[2024-04-24 13:52:01,685] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 2444983) of binary: /usr/bin/python
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 806, in main
run(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: |
@hhaAndroid this pr can support llama3 8B ? |
No description provided.