Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when training peft model example #18

Open
Tachyon5 opened this issue Jun 8, 2023 · 6 comments
Open

Error when training peft model example #18

Tachyon5 opened this issue Jun 8, 2023 · 6 comments

Comments

@Tachyon5
Copy link

Tachyon5 commented Jun 8, 2023

Hi, I am trying to train the example training/peft-flan-t5-int8-summarization.ipynb
I am using a

p3dn.24xlarge | 8GPU | 96 | 768 | 256(vram). I am simply trying to run the example directly on the machine exactly as written however I am getting this error when calling train().

trainer.train()

trainer.train() /home/ubuntu/.local/lib/python3.8/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=Trueto disable this warning warnings.warn( 0%| | 0/1155 [00:00<?, ?it/s]You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using thecallmethod is faster than using a method to encode the text followed by a call to thepad method to get a padded encoding. /home/ubuntu/.local/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:298: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization") ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ in <module>:1 │ │ │ │ /home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py:1633 in train │ │ │ │ 1630 │ │ inner_training_loop = find_executable_batch_size( │ │ 1631 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │ │ 1632 │ │ ) │ │ ❱ 1633 │ │ return inner_training_loop( │ │ 1634 │ │ │ args=args, │ │ 1635 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │ │ 1636 │ │ │ trial=trial, │ │ │ │ /home/ubuntu/.local/lib/python3.8/site-packages/accelerate/utils/memory.py:124 in decorator │ │ │ │ 121 │ │ │ if batch_size == 0: │ │ 122 │ │ │ │ raise RuntimeError("No executable batch size found, reached zero.") │ │ 123 │ │ │ try: │ │ ❱ 124 │ │ │ │ return function(batch_size, *args, **kwargs) │ │ 125 │ │ │ except Exception as e: │ │ 126 │ │ │ │ if should_reduce_batch_size(e): │ │ 127 │ │ │ │ │ gc.collect() │ │ │ │ /home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py:1902 in │ │ _inner_training_loop │ │ │ │ 1899 │ │ │ │ │ with model.no_sync(): │ │ 1900 │ │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │ │ 1901 │ │ │ │ else: │ │ ❱ 1902 │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │ │ 1903 │ │ │ │ │ │ 1904 │ │ │ │ if ( │ │ 1905 │ │ │ │ │ args.logging_nan_inf_filter │ │ │ │ /home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py:2645 in training_step │ │ │ │ 2642 │ │ │ return loss_mb.reduce_mean().detach().to(self.args.device) │ │ 2643 │ │ │ │ 2644 │ │ with self.compute_loss_context_manager(): │ │ ❱ 2645 │ │ │ loss = self.compute_loss(model, inputs) │ │ 2646 │ │ │ │ 2647 │ │ if self.args.n_gpu > 1: │ │ 2648 │ │ │ loss = loss.mean() # mean() to average on multi-gpu parallel training │ │ │ │ /home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py:2677 in compute_loss │ │ │ │ 2674 │ │ │ labels = inputs.pop("labels") │ │ 2675 │ │ else: │ │ 2676 │ │ │ labels = None │ │ ❱ 2677 │ │ outputs = model(**inputs) │ │ 2678 │ │ # Save past state if it exists │ │ 2679 │ │ # TODO: this needs to be fixed and made cleaner later. │ │ 2680 │ │ if self.args.past_index >= 0: │ │ │ │ /home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py:1501 in _call_impl │ │ │ │ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │ │ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │ │ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │ │ 1502 │ │ # Do not call functions when jit is used │ │ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1504 │ │ backward_pre_hooks = [] │ │ │ │ /home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py:171 in │ │ forward │ │ │ │ 168 │ │ │ if len(self.device_ids) == 1: │ │ 169 │ │ │ │ return self.module(*inputs[0], **kwargs[0]) │ │ 170 │ │ │ replicas = self.replicate(self.module, self.device_ids[:len(inputs)]) │ │ ❱ 171 │ │ │ outputs = self.parallel_apply(replicas, inputs, kwargs) │ │ 172 │ │ │ return self.gather(outputs, self.output_device) │ │ 173 │ │ │ 174 │ def replicate(self, module, device_ids): │ │ │ │ /home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py:181 in │ │ parallel_apply │ │ │ │ 178 │ │ return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim) │ │ 179 │ │ │ 180 │ def parallel_apply(self, replicas, inputs, kwargs): │ │ ❱ 181 │ │ return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) │ │ 182 │ │ │ 183 │ def gather(self, outputs, output_device): │ │ 184 │ │ return gather(outputs, output_device, dim=self.dim) │ │ │ │ /home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py:89 in │ │ parallel_apply │ │ │ │ 86 │ for i in range(len(inputs)): │ │ 87 │ │ output = results[i] │ │ 88 │ │ if isinstance(output, ExceptionWrapper): │ │ ❱ 89 │ │ │ output.reraise() │ │ 90 │ │ outputs.append(output) │ │ 91 │ return outputs │ │ 92 │ │ │ │ /home/ubuntu/.local/lib/python3.8/site-packages/torch/_utils.py:644 in reraise │ │ │ │ 641 │ │ │ # If the exception takes multiple arguments, don't try to │ │ 642 │ │ │ # instantiate since we don't know how to │ │ 643 │ │ │ raise RuntimeError(msg) from None │ │ ❱ 644 │ │ raise exception │ │ 645 │ │ 646 │ │ 647 def _get_available_device_type(): │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker output = module(*input, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/peft/peft_model.py", line 667, in forward return self.base_model( File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 1667, in forward encoder_outputs = self.encoder( File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 1061, in forward layer_outputs = checkpoint( File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint return CheckpointFunction.apply(function, preserve, *args) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 107, in forward outputs = run_function(*args) File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 1057, in custom_forward return tuple(module(*inputs, use_cache, output_attentions)) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 693, in forward self_attention_outputs = self.layer[0]( File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 600, in forward attention_output = self.SelfAttention( File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 572, in forward attn_output = self.o(attn_output) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/bitsandbytes/nn/modules.py", line 242, in forward out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state) File "/home/ubuntu/.local/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul return MatMul8bitLt.apply(A, B, out, bias, state) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/home/ubuntu/.local/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 397, in forward output += torch.matmul(subA, state.subB) RuntimeError: mat1 and mat2 shapes cannot be multiplied (2048x2 and 1x4096)

@philschmid
Copy link
Owner

Exact dataset? Exact code? no changes at all?

@Tachyon5
Copy link
Author

Tachyon5 commented Jun 9, 2023

Yes, I just ran the code as it is in the notebook. Only difference is the machine and I ran in the ipython REPL.

@tomdzh
Copy link

tomdzh commented Jun 9, 2023

Got the same issue. It seems to happen only on multi-GPU machines. For instance, g5.4xlarge works but g5.12xlarge doesn't work.

@Tachyon5
Copy link
Author

I suspected exactly that. I spun up a single GPU machine and it works but it's slow slow slow. I'm going to see if creating a custom device_map has any affect.

@philschmid
Copy link
Owner

Thank you, @tomdzh ! Yes, the example with int-8 is not yet working on a multi-GPU setup. You would need to combine PEFT with DS or FSDP for that.

@tomdzh
Copy link

tomdzh commented Jun 10, 2023

@philschmid it would be great to publish a blog about combing PEFT with FSDP :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants