Error when training peft model example #18

Tachyon5 · 2023-06-08T21:46:06Z

Hi, I am trying to train the example training/peft-flan-t5-int8-summarization.ipynb
I am using a

p3dn.24xlarge | 8GPU | 96 | 768 | 256(vram). I am simply trying to run the example directly on the machine exactly as written however I am getting this error when calling train().

trainer.train()

trainer.train() /home/ubuntu/.local/lib/python3.8/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=Trueto disable this warning warnings.warn( 0%| | 0/1155 [00:00<?, ?it/s]You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using thecallmethod is faster than using a method to encode the text followed by a call to thepad method to get a padded encoding. /home/ubuntu/.local/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:298: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization") ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ in <module>:1 │ │ │ │ /home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py:1633 in train │ │ │ │ 1630 │ │ inner_training_loop = find_executable_batch_size( │ │ 1631 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │ │ 1632 │ │ ) │ │ ❱ 1633 │ │ return inner_training_loop( │ │ 1634 │ │ │ args=args, │ │ 1635 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │ │ 1636 │ │ │ trial=trial, │ │ │ │ /home/ubuntu/.local/lib/python3.8/site-packages/accelerate/utils/memory.py:124 in decorator │ │ │ │ 121 │ │ │ if batch_size == 0: │ │ 122 │ │ │ │ raise RuntimeError("No executable batch size found, reached zero.") │ │ 123 │ │ │ try: │ │ ❱ 124 │ │ │ │ return function(batch_size, *args, **kwargs) │ │ 125 │ │ │ except Exception as e: │ │ 126 │ │ │ │ if should_reduce_batch_size(e): │ │ 127 │ │ │ │ │ gc.collect() │ │ │ │ /home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py:1902 in │ │ _inner_training_loop │ │ │ │ 1899 │ │ │ │ │ with model.no_sync(): │ │ 1900 │ │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │ │ 1901 │ │ │ │ else: │ │ ❱ 1902 │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │ │ 1903 │ │ │ │ │ │ 1904 │ │ │ │ if ( │ │ 1905 │ │ │ │ │ args.logging_nan_inf_filter │ │ │ │ /home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py:2645 in training_step │ │ │ │ 2642 │ │ │ return loss_mb.reduce_mean().detach().to(self.args.device) │ │ 2643 │ │ │ │ 2644 │ │ with self.compute_loss_context_manager(): │ │ ❱ 2645 │ │ │ loss = self.compute_loss(model, inputs) │ │ 2646 │ │ │ │ 2647 │ │ if self.args.n_gpu > 1: │ │ 2648 │ │ │ loss = loss.mean() # mean() to average on multi-gpu parallel training │ │ │ │ /home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py:2677 in compute_loss │ │ │ │ 2674 │ │ │ labels = inputs.pop("labels") │ │ 2675 │ │ else: │ │ 2676 │ │ │ labels = None │ │ ❱ 2677 │ │ outputs = model(**inputs) │ │ 2678 │ │ # Save past state if it exists │ │ 2679 │ │ # TODO: this needs to be fixed and made cleaner later. │ │ 2680 │ │ if self.args.past_index >= 0: │ │ │ │ /home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py:1501 in _call_impl │ │ │ │ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │ │ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │ │ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │ │ 1502 │ │ # Do not call functions when jit is used │ │ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1504 │ │ backward_pre_hooks = [] │ │ │ │ /home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py:171 in │ │ forward │ │ │ │ 168 │ │ │ if len(self.device_ids) == 1: │ │ 169 │ │ │ │ return self.module(*inputs[0], **kwargs[0]) │ │ 170 │ │ │ replicas = self.replicate(self.module, self.device_ids[:len(inputs)]) │ │ ❱ 171 │ │ │ outputs = self.parallel_apply(replicas, inputs, kwargs) │ │ 172 │ │ │ return self.gather(outputs, self.output_device) │ │ 173 │ │ │ 174 │ def replicate(self, module, device_ids): │ │ │ │ /home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py:181 in │ │ parallel_apply │ │ │ │ 178 │ │ return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim) │ │ 179 │ │ │ 180 │ def parallel_apply(self, replicas, inputs, kwargs): │ │ ❱ 181 │ │ return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) │ │ 182 │ │ │ 183 │ def gather(self, outputs, output_device): │ │ 184 │ │ return gather(outputs, output_device, dim=self.dim) │ │ │ │ /home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py:89 in │ │ parallel_apply │ │ │ │ 86 │ for i in range(len(inputs)): │ │ 87 │ │ output = results[i] │ │ 88 │ │ if isinstance(output, ExceptionWrapper): │ │ ❱ 89 │ │ │ output.reraise() │ │ 90 │ │ outputs.append(output) │ │ 91 │ return outputs │ │ 92 │ │ │ │ /home/ubuntu/.local/lib/python3.8/site-packages/torch/_utils.py:644 in reraise │ │ │ │ 641 │ │ │ # If the exception takes multiple arguments, don't try to │ │ 642 │ │ │ # instantiate since we don't know how to │ │ 643 │ │ │ raise RuntimeError(msg) from None │ │ ❱ 644 │ │ raise exception │ │ 645 │ │ 646 │ │ 647 def _get_available_device_type(): │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker output = module(*input, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/peft/peft_model.py", line 667, in forward return self.base_model( File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 1667, in forward encoder_outputs = self.encoder( File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 1061, in forward layer_outputs = checkpoint( File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint return CheckpointFunction.apply(function, preserve, *args) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 107, in forward outputs = run_function(*args) File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 1057, in custom_forward return tuple(module(*inputs, use_cache, output_attentions)) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 693, in forward self_attention_outputs = self.layer[0]( File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 600, in forward attention_output = self.SelfAttention( File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 572, in forward attn_output = self.o(attn_output) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/bitsandbytes/nn/modules.py", line 242, in forward out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state) File "/home/ubuntu/.local/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul return MatMul8bitLt.apply(A, B, out, bias, state) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/home/ubuntu/.local/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 397, in forward output += torch.matmul(subA, state.subB) RuntimeError: mat1 and mat2 shapes cannot be multiplied (2048x2 and 1x4096)

The text was updated successfully, but these errors were encountered:

philschmid · 2023-06-09T06:55:03Z

Exact dataset? Exact code? no changes at all?

Tachyon5 · 2023-06-09T16:11:06Z

Yes, I just ran the code as it is in the notebook. Only difference is the machine and I ran in the ipython REPL.

tomdzh · 2023-06-09T23:42:51Z

Got the same issue. It seems to happen only on multi-GPU machines. For instance, g5.4xlarge works but g5.12xlarge doesn't work.

Tachyon5 · 2023-06-10T02:46:49Z

I suspected exactly that. I spun up a single GPU machine and it works but it's slow slow slow. I'm going to see if creating a custom device_map has any affect.

philschmid · 2023-06-10T06:57:52Z

Thank you, @tomdzh ! Yes, the example with int-8 is not yet working on a multi-GPU setup. You would need to combine PEFT with DS or FSDP for that.

tomdzh · 2023-06-10T17:23:05Z

@philschmid it would be great to publish a blog about combing PEFT with FSDP :).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when training peft model example #18

Error when training peft model example #18

Tachyon5 commented Jun 8, 2023

philschmid commented Jun 9, 2023

Tachyon5 commented Jun 9, 2023

tomdzh commented Jun 9, 2023

Tachyon5 commented Jun 10, 2023

philschmid commented Jun 10, 2023

tomdzh commented Jun 10, 2023

Error when training peft model example #18

Error when training peft model example #18

Comments

Tachyon5 commented Jun 8, 2023

philschmid commented Jun 9, 2023

Tachyon5 commented Jun 9, 2023

tomdzh commented Jun 9, 2023

Tachyon5 commented Jun 10, 2023

philschmid commented Jun 10, 2023

tomdzh commented Jun 10, 2023