TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead. #51

angelcymak · 2024-07-10T02:24:31Z

Encountered a MPS error (see title) when running this demo script. Please advise.

## train ST
st.train_and_fit(
    callbacks=[cb_early_stopping],
    logger=[log_tb],
)

Error

GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

  | Name | Type | Params
------------------------------
------------------------------
0         Trainable params
0         Non-trainable params
0         Total params
0.000     Total estimated model params size (MB)
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:436: Consider setting `persistent_workers=True` in 'train_dataloader' to speed up the dataloader worker initialization.
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py:293: The number of training batches (27) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
Epoch 0: 0%
0/27 [00:00<?, ?it/s]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [10], in <cell line: 2>()
      1 ## train ST
----> 2 st.train_and_fit(
      3     callbacks=[cb_early_stopping],
      4     logger=[log_tb],
      5 )

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/starling/starling.py:353, in ST.train_and_fit(self, accelerator, strategy, devices, num_nodes, precision, logger, callbacks, fast_dev_run, max_epochs, min_epochs, max_steps, min_steps, max_time, limit_train_batches, limit_val_batches, limit_test_batches, limit_predict_batches, overfit_batches, val_check_interval, check_val_every_n_epoch, num_sanity_val_steps, log_every_n_steps, enable_checkpointing, enable_progress_bar, enable_model_summary, accumulate_grad_batches, gradient_clip_val, gradient_clip_algorithm, deterministic, benchmark, inference_mode, use_distributed_sampler, profiler, detect_anomaly, barebones, plugins, sync_batchnorm, reload_dataloaders_every_n_epochs, default_root_dir)
    349 _locals.pop("self")
    351 trainer = pl.Trainer(**_locals)
--> 353 trainer.fit(self)

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:545, in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    543 self.state.status = TrainerStatus.RUNNING
    544 self.training = True
--> 545 call._call_and_handle_interrupt(
    546     self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
    547 )

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py:44, in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     42     if trainer.strategy.launcher is not None:
     43         return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
---> 44     return trainer_fn(*args, **kwargs)
     46 except _TunerExitException:
     47     _call_teardown_hook(trainer)

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:581, in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    574 assert self.state.fn is not None
    575 ckpt_path = self._checkpoint_connector._select_ckpt_path(
    576     self.state.fn,
    577     ckpt_path,
    578     model_provided=True,
    579     model_connected=self.lightning_module is not None,
    580 )
--> 581 self._run(model, ckpt_path=ckpt_path)
    583 assert self.state.stopped
    584 self.training = False

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:990, in Trainer._run(self, model, ckpt_path)
    985 self._signal_connector.register_signal_handlers()
    987 # ----------------------------
    988 # RUN THE TRAINER
    989 # ----------------------------
--> 990 results = self._run_stage()
    992 # ----------------------------
    993 # POST-Training CLEAN UP
    994 # ----------------------------
    995 log.debug(f"{self.__class__.__name__}: trainer tearing down")

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:1036, in Trainer._run_stage(self)
   1034         self._run_sanity_check()
   1035     with torch.autograd.set_detect_anomaly(self._detect_anomaly):
-> 1036         self.fit_loop.run()
   1037     return None
   1038 raise RuntimeError(f"Unexpected state {self.state}")

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py:202, in _FitLoop.run(self)
    200 try:
    201     self.on_advance_start()
--> 202     self.advance()
    203     self.on_advance_end()
    204     self._restarting = False

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py:359, in _FitLoop.advance(self)
    357 with self.trainer.profiler.profile("run_training_epoch"):
    358     assert self._data_fetcher is not None
--> 359     self.epoch_loop.run(self._data_fetcher)

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py:136, in _TrainingEpochLoop.run(self, data_fetcher)
    134 while not self.done:
    135     try:
--> 136         self.advance(data_fetcher)
    137         self.on_advance_end(data_fetcher)
    138         self._restarting = False

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py:213, in _TrainingEpochLoop.advance(self, data_fetcher)
    211     batch = trainer.precision_plugin.convert_input(batch)
    212     batch = trainer.lightning_module._on_before_batch_transfer(batch, dataloader_idx=0)
--> 213     batch = call._call_strategy_hook(trainer, "batch_to_device", batch, dataloader_idx=0)
    215 self.batch_progress.increment_ready()
    216 trainer._logger_connector.on_batch_start(batch)

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py:309, in _call_strategy_hook(trainer, hook_name, *args, **kwargs)
    306     return None
    308 with trainer.profiler.profile(f"[Strategy]{trainer.strategy.__class__.__name__}.{hook_name}"):
--> 309     output = fn(*args, **kwargs)
    311 # restore current_fx when nested context
    312 pl_module._current_fx_name = prev_fx_name

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py:269, in Strategy.batch_to_device(self, batch, device, dataloader_idx)
    267 device = device or self.root_device
    268 if model is not None:
--> 269     return model._apply_batch_transfer_handler(batch, device=device, dataloader_idx=dataloader_idx)
    270 return move_data_to_device(batch, device)

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/core/module.py:333, in LightningModule._apply_batch_transfer_handler(self, batch, device, dataloader_idx)
    329 def _apply_batch_transfer_handler(
    330     self, batch: Any, device: Optional[torch.device] = None, dataloader_idx: int = 0
    331 ) -> Any:
    332     device = device or self.device
--> 333     batch = self._call_batch_hook("transfer_batch_to_device", batch, device, dataloader_idx)
    334     batch = self._call_batch_hook("on_after_batch_transfer", batch, dataloader_idx)
    335     return batch

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/core/module.py:322, in LightningModule._call_batch_hook(self, hook_name, *args)
    319     else:
    320         trainer_method = call._call_lightning_datamodule_hook
--> 322     return trainer_method(trainer, hook_name, *args)
    323 hook = getattr(self, hook_name)
    324 return hook(*args)

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py:157, in _call_lightning_module_hook(trainer, hook_name, pl_module, *args, **kwargs)
    154 pl_module._current_fx_name = hook_name
    156 with trainer.profiler.profile(f"[LightningModule]{pl_module.__class__.__name__}.{hook_name}"):
--> 157     output = fn(*args, **kwargs)
    159 # restore current_fx when nested context
    160 pl_module._current_fx_name = prev_fx_name

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pytorch_lightning/core/hooks.py:583, in DataHooks.transfer_batch_to_device(self, batch, device, dataloader_idx)
    532 def transfer_batch_to_device(self, batch: Any, device: torch.device, dataloader_idx: int) -> Any:
    533     """Override this hook if your :class:`~torch.utils.data.DataLoader` returns tensors wrapped in a custom data
    534     structure.
    535 
   (...)
    581 
    582     """
--> 583     return move_data_to_device(batch, device)

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/lightning_fabric/utilities/apply_func.py:102, in move_data_to_device(batch, device)
     99     # user wrongly implemented the `_TransferableDataType` and forgot to return `self`.
    100     return data
--> 102 return apply_to_collection(batch, dtype=_TransferableDataType, function=batch_to)

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/lightning_utilities/core/apply_func.py:66, in apply_to_collection(data, dtype, function, wrong_dtype, include_none, allow_frozen, *args, **kwargs)
     64     return function(data, *args, **kwargs)
     65 if data.__class__ is list and all(isinstance(x, dtype) for x in data):  # 1d homogeneous list
---> 66     return [function(x, *args, **kwargs) for x in data]
     67 if data.__class__ is tuple and all(isinstance(x, dtype) for x in data):  # 1d homogeneous tuple
     68     return tuple(function(x, *args, **kwargs) for x in data)

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/lightning_utilities/core/apply_func.py:66, in <listcomp>(.0)
     64     return function(data, *args, **kwargs)
     65 if data.__class__ is list and all(isinstance(x, dtype) for x in data):  # 1d homogeneous list
---> 66     return [function(x, *args, **kwargs) for x in data]
     67 if data.__class__ is tuple and all(isinstance(x, dtype) for x in data):  # 1d homogeneous tuple
     68     return tuple(function(x, *args, **kwargs) for x in data)

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/lightning_fabric/utilities/apply_func.py:96, in move_data_to_device.<locals>.batch_to(data)
     94 if isinstance(data, Tensor) and isinstance(device, torch.device) and device.type not in _BLOCKING_DEVICE_TYPES:
     95     kwargs["non_blocking"] = True
---> 96 data_output = data.to(device, **kwargs)
     97 if data_output is not None:
     98     return data_output

TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

The text was updated successfully, but these errors were encountered:

joaolsf · 2025-01-21T19:55:30Z

I encountered the exact same error. Is there any update for a fix?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead. #51

TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead. #51

angelcymak commented Jul 10, 2024

joaolsf commented Jan 21, 2025

TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead. #51

TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead. #51

Comments

angelcymak commented Jul 10, 2024

joaolsf commented Jan 21, 2025