We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encountered a MPS error (see title) when running this demo script. Please advise.
## train ST st.train_and_fit( callbacks=[cb_early_stopping], logger=[log_tb], )
Error
GPU available: True (mps), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs | Name | Type | Params ------------------------------ ------------------------------ 0 Trainable params 0 Non-trainable params 0 Total params 0.000 Total estimated model params size (MB) /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:436: Consider setting `persistent_workers=True` in 'train_dataloader' to speed up the dataloader worker initialization. /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py:293: The number of training batches (27) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch. Epoch 0: 0% 0/27 [00:00<?, ?it/s] --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Input In [10], in <cell line: 2>() 1 ## train ST ----> 2 st.train_and_fit( 3 callbacks=[cb_early_stopping], 4 logger=[log_tb], 5 ) File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/starling/starling.py:353, in ST.train_and_fit(self, accelerator, strategy, devices, num_nodes, precision, logger, callbacks, fast_dev_run, max_epochs, min_epochs, max_steps, min_steps, max_time, limit_train_batches, limit_val_batches, limit_test_batches, limit_predict_batches, overfit_batches, val_check_interval, check_val_every_n_epoch, num_sanity_val_steps, log_every_n_steps, enable_checkpointing, enable_progress_bar, enable_model_summary, accumulate_grad_batches, gradient_clip_val, gradient_clip_algorithm, deterministic, benchmark, inference_mode, use_distributed_sampler, profiler, detect_anomaly, barebones, plugins, sync_batchnorm, reload_dataloaders_every_n_epochs, default_root_dir) 349 _locals.pop("self") 351 trainer = pl.Trainer(**_locals) --> 353 trainer.fit(self) File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:545, in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path) 543 self.state.status = TrainerStatus.RUNNING 544 self.training = True --> 545 call._call_and_handle_interrupt( 546 self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path 547 ) File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py:44, in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs) 42 if trainer.strategy.launcher is not None: 43 return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) ---> 44 return trainer_fn(*args, **kwargs) 46 except _TunerExitException: 47 _call_teardown_hook(trainer) File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:581, in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path) 574 assert self.state.fn is not None 575 ckpt_path = self._checkpoint_connector._select_ckpt_path( 576 self.state.fn, 577 ckpt_path, 578 model_provided=True, 579 model_connected=self.lightning_module is not None, 580 ) --> 581 self._run(model, ckpt_path=ckpt_path) 583 assert self.state.stopped 584 self.training = False File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:990, in Trainer._run(self, model, ckpt_path) 985 self._signal_connector.register_signal_handlers() 987 # ---------------------------- 988 # RUN THE TRAINER 989 # ---------------------------- --> 990 results = self._run_stage() 992 # ---------------------------- 993 # POST-Training CLEAN UP 994 # ---------------------------- 995 log.debug(f"{self.__class__.__name__}: trainer tearing down") File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:1036, in Trainer._run_stage(self) 1034 self._run_sanity_check() 1035 with torch.autograd.set_detect_anomaly(self._detect_anomaly): -> 1036 self.fit_loop.run() 1037 return None 1038 raise RuntimeError(f"Unexpected state {self.state}") File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py:202, in _FitLoop.run(self) 200 try: 201 self.on_advance_start() --> 202 self.advance() 203 self.on_advance_end() 204 self._restarting = False File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py:359, in _FitLoop.advance(self) 357 with self.trainer.profiler.profile("run_training_epoch"): 358 assert self._data_fetcher is not None --> 359 self.epoch_loop.run(self._data_fetcher) File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py:136, in _TrainingEpochLoop.run(self, data_fetcher) 134 while not self.done: 135 try: --> 136 self.advance(data_fetcher) 137 self.on_advance_end(data_fetcher) 138 self._restarting = False File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py:213, in _TrainingEpochLoop.advance(self, data_fetcher) 211 batch = trainer.precision_plugin.convert_input(batch) 212 batch = trainer.lightning_module._on_before_batch_transfer(batch, dataloader_idx=0) --> 213 batch = call._call_strategy_hook(trainer, "batch_to_device", batch, dataloader_idx=0) 215 self.batch_progress.increment_ready() 216 trainer._logger_connector.on_batch_start(batch) File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py:309, in _call_strategy_hook(trainer, hook_name, *args, **kwargs) 306 return None 308 with trainer.profiler.profile(f"[Strategy]{trainer.strategy.__class__.__name__}.{hook_name}"): --> 309 output = fn(*args, **kwargs) 311 # restore current_fx when nested context 312 pl_module._current_fx_name = prev_fx_name File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py:269, in Strategy.batch_to_device(self, batch, device, dataloader_idx) 267 device = device or self.root_device 268 if model is not None: --> 269 return model._apply_batch_transfer_handler(batch, device=device, dataloader_idx=dataloader_idx) 270 return move_data_to_device(batch, device) File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/core/module.py:333, in LightningModule._apply_batch_transfer_handler(self, batch, device, dataloader_idx) 329 def _apply_batch_transfer_handler( 330 self, batch: Any, device: Optional[torch.device] = None, dataloader_idx: int = 0 331 ) -> Any: 332 device = device or self.device --> 333 batch = self._call_batch_hook("transfer_batch_to_device", batch, device, dataloader_idx) 334 batch = self._call_batch_hook("on_after_batch_transfer", batch, dataloader_idx) 335 return batch File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/core/module.py:322, in LightningModule._call_batch_hook(self, hook_name, *args) 319 else: 320 trainer_method = call._call_lightning_datamodule_hook --> 322 return trainer_method(trainer, hook_name, *args) 323 hook = getattr(self, hook_name) 324 return hook(*args) File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py:157, in _call_lightning_module_hook(trainer, hook_name, pl_module, *args, **kwargs) 154 pl_module._current_fx_name = hook_name 156 with trainer.profiler.profile(f"[LightningModule]{pl_module.__class__.__name__}.{hook_name}"): --> 157 output = fn(*args, **kwargs) 159 # restore current_fx when nested context 160 pl_module._current_fx_name = prev_fx_name File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/core/hooks.py:583, in DataHooks.transfer_batch_to_device(self, batch, device, dataloader_idx) 532 def transfer_batch_to_device(self, batch: Any, device: torch.device, dataloader_idx: int) -> Any: 533 """Override this hook if your :class:`~torch.utils.data.DataLoader` returns tensors wrapped in a custom data 534 structure. 535 (...) 581 582 """ --> 583 return move_data_to_device(batch, device) File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/lightning_fabric/utilities/apply_func.py:102, in move_data_to_device(batch, device) 99 # user wrongly implemented the `_TransferableDataType` and forgot to return `self`. 100 return data --> 102 return apply_to_collection(batch, dtype=_TransferableDataType, function=batch_to) File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/lightning_utilities/core/apply_func.py:66, in apply_to_collection(data, dtype, function, wrong_dtype, include_none, allow_frozen, *args, **kwargs) 64 return function(data, *args, **kwargs) 65 if data.__class__ is list and all(isinstance(x, dtype) for x in data): # 1d homogeneous list ---> 66 return [function(x, *args, **kwargs) for x in data] 67 if data.__class__ is tuple and all(isinstance(x, dtype) for x in data): # 1d homogeneous tuple 68 return tuple(function(x, *args, **kwargs) for x in data) File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/lightning_utilities/core/apply_func.py:66, in <listcomp>(.0) 64 return function(data, *args, **kwargs) 65 if data.__class__ is list and all(isinstance(x, dtype) for x in data): # 1d homogeneous list ---> 66 return [function(x, *args, **kwargs) for x in data] 67 if data.__class__ is tuple and all(isinstance(x, dtype) for x in data): # 1d homogeneous tuple 68 return tuple(function(x, *args, **kwargs) for x in data) File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/lightning_fabric/utilities/apply_func.py:96, in move_data_to_device.<locals>.batch_to(data) 94 if isinstance(data, Tensor) and isinstance(device, torch.device) and device.type not in _BLOCKING_DEVICE_TYPES: 95 kwargs["non_blocking"] = True ---> 96 data_output = data.to(device, **kwargs) 97 if data_output is not None: 98 return data_output TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
The text was updated successfully, but these errors were encountered:
I encountered the exact same error. Is there any update for a fix?
Sorry, something went wrong.
No branches or pull requests
Encountered a MPS error (see title) when running this demo script. Please advise.
Error
The text was updated successfully, but these errors were encountered: