Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 107 additions & 2 deletions docs/source-pytorch/common/trainer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,27 @@ Example::
See also: :ref:`gradient_accumulation` to enable more fine-grained accumulation schedules.


barebones
^^^^^^^^^

Whether to run in "barebones mode", where all features that may impact raw speed are disabled. This is meant for
analyzing the Trainer overhead and is discouraged during regular training runs.

When enabled, the following features are automatically deactivated:
- Checkpointing: ``enable_checkpointing=False``
- Logging: ``logger=False``, ``log_every_n_steps=0``
- Progress bar: ``enable_progress_bar=False``
- Model summary: ``enable_model_summary=False``
- Sanity checking: ``num_sanity_val_steps=0``

.. testcode::

# default used by the Trainer
trainer = Trainer(barebones=False)

# enable barebones mode for speed analysis
trainer = Trainer(barebones=True)

benchmark
^^^^^^^^^

Expand Down Expand Up @@ -364,6 +385,22 @@ will need to be set up to use remote filepaths.
# default used by the Trainer
trainer = Trainer(default_root_dir=os.getcwd())


detect_anomaly
^^^^^^^^^^^^^^

Enable anomaly detection for the autograd engine. This will significantly slow down compute speed and is recommended
only for model debugging.

.. testcode::

# default used by the Trainer
trainer = Trainer(detect_anomaly=False)

# enable anomaly detection for debugging
trainer = Trainer(detect_anomaly=True)


devices
^^^^^^^

Expand Down Expand Up @@ -548,6 +585,24 @@ impact to subsequent runs. These are the changes enabled:
- If using the CLI, the configuration file is not saved.


gradient_clip_algorithm
^^^^^^^^^^^^^^^^^^^^^^^

The gradient clipping algorithm to use. Pass ``gradient_clip_algorithm="value"`` to clip by value, and
``gradient_clip_algorithm="norm"`` to clip by norm. By default it will be set to ``"norm"``.

.. testcode::

# default used by the Trainer (defaults to "norm" when gradient_clip_val is set)
trainer = Trainer(gradient_clip_algorithm=None)

# clip by value
trainer = Trainer(gradient_clip_val=0.5, gradient_clip_algorithm="value")

# clip by norm
trainer = Trainer(gradient_clip_val=0.5, gradient_clip_algorithm="norm")


gradient_clip_val
^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -624,6 +679,26 @@ Example::
# run through only 10 batches of the training set each epoch
trainer = Trainer(limit_train_batches=10)


limit_predict_batches
^^^^^^^^^^^^^^^^^^^^^

How much of prediction dataset to check. Value is per device.

.. testcode::

# default used by the Trainer
trainer = Trainer(limit_predict_batches=1.0)

# run through only 25% of the prediction set
trainer = Trainer(limit_predict_batches=0.25)

# run for only 10 batches
trainer = Trainer(limit_predict_batches=10)

In the case of multiple prediction dataloaders, the limit applies to each dataloader individually.


limit_test_batches
^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -801,6 +876,23 @@ For customizable options use the :class:`~lightning.pytorch.callbacks.timer.Time
In case ``max_time`` is used together with ``min_steps`` or ``min_epochs``, the ``min_*`` requirement
always has precedence.


model_registry
^^^^^^^^^^^^^^

If specified will upload the model to lightning model registry under the provided name.

.. testcode::

# default used by the Trainer
trainer = Trainer(model_registry=None)

# specify model name for model hub upload
trainer = Trainer(model_registry="my-model-name")

See `Lightning model registry docs <https://lightning.ai/docs/overview/finetune-models/model-registry>`_ for more info.


num_nodes
^^^^^^^^^

Expand Down Expand Up @@ -875,12 +967,25 @@ Useful for quickly debugging or trying to overfit on purpose.

# debug using a single consistent train batch and a single consistent val batch

plugins
^^^^^^^

:ref:`Plugins` allow you to connect arbitrary backends, precision libraries, clusters etc. For example:

Plugins allow you to connect arbitrary backends, precision libraries, clusters etc. and modification of core lightning logic.
Examples of plugin types:
- :ref:`Checkpoint IO <checkpointing_expert>`
- `TorchElastic <https://pytorch.org/elastic/0.2.2/index.html>`_
- :ref:`Precision Plugins <precision_expert>`
- :class:`~lightning.pytorch.plugins.environments.ClusterEnvironment`

.. testcode::

# default used by the Trainer
trainer = Trainer(plugins=None)

# example using built in slurm plugin
from lightning.fabric.plugins.environments import SLURMEnvironment
trainer = Trainer(plugins=[SLURMEnvironment()])


To define your own behavior, subclass the relevant class and pass it in. Here's an example linking up your own
:class:`~lightning.pytorch.plugins.environments.ClusterEnvironment`.
Expand Down