Skip to content

Commit

Permalink
[DOCS] Cross-reference API/deep dives/tutorials (pytorch#1229)
Browse files Browse the repository at this point in the history
Co-authored-by: Rafi Ayub <[email protected]>
  • Loading branch information
SalmanMohammadi and RdoubleA authored Aug 3, 2024
1 parent d2d3088 commit 3653c4a
Show file tree
Hide file tree
Showing 14 changed files with 97 additions and 59 deletions.
28 changes: 17 additions & 11 deletions docs/source/deep_dives/checkpointer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -115,21 +115,28 @@ fine-tuned checkpoints from torchtune with any post-training tool (quantization,
which supports the source format, without any code changes OR conversion scripts. This is one of the
ways in which torchtune interoperates with the surrounding ecosystem.

To be "state-dict invariant", the ``load_checkpoint`` and
``save_checkpoint`` methods make use of the weight convertors available
`here <https://github.com/pytorch/torchtune/blob/main/torchtune/models/convert_weights.py>`_.
.. note::

To be state-dict "invariant" in this way, the ``load_checkpoint`` and ``save_checkpoint`` methods of each checkpointer
make use of weight converters which correctly map weights between checkpoint formats. For example, when loading weights
from Hugging Face, we apply a permutation to certain weights on load and save to ensure checkpoints behave exactly the same.
To further illustrate this, the Llama family of models uses a
`generic weight converter function <https://github.com/pytorch/torchtune/blob/898670f0eb58f956b5228e5a55ccac4ea0efaff8/torchtune/models/convert_weights.py#L113>`_
whilst some other models like Phi3 have their own `conversion functions <https://github.com/pytorch/torchtune/blob/main/torchtune/models/phi3/_convert_weights.py>`_
which can be found within their model folders.

|
Handling different Checkpoint Formats
-------------------------------------

torchtune supports three different
`checkpointers <https://github.com/pytorch/torchtune/blob/main/torchtune/utils/_checkpointing/_checkpointer.py>`_,
:ref:`checkpointers<checkpointing_label>`,
each of which supports a different checkpoint format.


**HFCheckpointer**
:class:`HFCheckpointer <torchtune.utils.FullModelHFCheckpointer>`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This checkpointer reads and writes checkpoints in a format which is compatible with the transformers
framework from Hugging Face. As mentioned above, this is the most popular format within the Hugging Face
Expand Down Expand Up @@ -195,12 +202,11 @@ The following snippet explains how the HFCheckpointer is setup in torchtune conf
read directly from the ``config.json`` file. This helps ensure we either load the weights
correctly or error out in case of discrepancy between the HF checkpoint file and torchtune's
model implementations. This json file is downloaded from the hub along with the model checkpoints.
More details on how these are used during conversion can be found
`here <https://github.com/pytorch/torchtune/blob/main/torchtune/models/convert_weights.py>`_.

|
**MetaCheckpointer**
:class:`MetaCheckpointer <torchtune.utils.FullModelMetaCheckpointer>`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This checkpointer reads and writes checkpoints in a format which is compatible with the original meta-llama
github repository.
Expand Down Expand Up @@ -259,7 +265,8 @@ The following snippet explains how the MetaCheckpointer is setup in torchtune co
|
**TorchTuneCheckpointer**
:class:`TorchTuneCheckpointer <torchtune.utils.FullModelTorchTuneCheckpointer>`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This checkpointer reads and writes checkpoints in a format that is compatible with torchtune's
model definition. This does not perform any state_dict conversions and is currently used either
Expand Down Expand Up @@ -463,8 +470,7 @@ For this section we'll use the Llama2 13B model in HF format.
You can do this with any model supported by torchtune. You can find a full list
of models and model builders
`here <https://github.com/pytorch/torchtune/tree/main/torchtune/models>`__.
of models and model builders :ref:`here <models>`.

We hope this deep-dive provided a deeper insight into the checkpointer and
associated utilities in torchtune. Happy tuning!
15 changes: 9 additions & 6 deletions docs/source/deep_dives/configs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,8 @@ for a particular run.
enable_fsdp: True
...
Configuring components using :code:`instantiate`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Configuring components using :func:`instantiate<torchtune.config.instantiate>`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Many fields will require specifying torchtune objects with associated keyword
arguments as parameters. Models, datasets, optimizers, and loss functions are
common examples of this. You can easily do this using the :code:`_component_`
Expand Down Expand Up @@ -152,10 +152,10 @@ will automatically resolve it for you.
Validating your config
^^^^^^^^^^^^^^^^^^^^^^
We provide a convenient CLI utility, :code:`tune validate`, to quickly verify that
We provide a convenient CLI utility, :ref:`tune validate<validate_cli_label>`, to quickly verify that
your config is well-formed and all components can be instantiated properly. You
can also pass in overrides if you want to test out the exact commands you will run
your experiments with. If any parameters are not well-formed, :code:`tune validate`
your experiments with. If any parameters are not well-formed, :ref:`tune validate<validate_cli_label>`
will list out all the locations where an error was found.

.. code-block:: bash
Expand Down Expand Up @@ -216,6 +216,8 @@ the config itself. To enable quick experimentation, you can specify override val
to parameters in your config via the :code:`tune` command. These should be specified
as key-value pairs :code:`k1=v1 k2=v2 ...`

.. TODO (SalmanMohammadi) link this to the upcoming recipe docpage for the lora recipe
For example, to run the :code:`lora_finetune_single_device` recipe with custom model and tokenizer directories, you can provide overrides:

.. code-block:: bash
Expand Down Expand Up @@ -248,9 +250,10 @@ Removing config fields
You may need to remove certain parameters from the config when changing components
through overrides that require different keyword arguments. You can do so by using
the `~` flag and specify the dotpath of the config field you would like to remove.
For example, if you want to override a built-in config and use the ``bitsandbytes.optim.PagedAdamW8bit``
For example, if you want to override a built-in config and use the
`bitsandbytes.optim.PagedAdamW8bit <https://huggingface.co/docs/bitsandbytes/main/en/reference/optim/adamw#bitsandbytes.optim.PagedAdamW8bit>`_
optimizer, you may need to delete parameters like ``foreach`` which are
specific to PyTorch optimizers. Note that this example requires that you have ``bitsandbytes``
specific to PyTorch optimizers. Note that this example requires that you have `bitsandbytes <https://github.com/bitsandbytes-foundation/bitsandbytes>`_
installed.

.. code-block:: yaml
Expand Down
6 changes: 4 additions & 2 deletions docs/source/deep_dives/recipe_deepdive.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,8 @@ For a complete working example, refer to the
in torchtune and the associated
`config <https://github.com/pytorch/torchtune/blob/main/recipes/configs/7B_full.yaml>`_.

.. TODO (SalmanMohammadi) ref to full finetune recipe doc
|
What Recipes are not?
Expand Down Expand Up @@ -209,7 +211,7 @@ You can learn all about configs in our :ref:`config deep-dive<config_tutorial_la
Config and CLI parsing using :code:`parse`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We provide a convenient decorator :func:`~torchtune.config.parse` that wraps
your recipe to enable running from the command-line with :code:`tune` with config
your recipe to enable running from the command-line with :ref:`tune <cli_label>` with config
and CLI override parsing.

.. code-block:: python
Expand All @@ -225,7 +227,7 @@ and CLI override parsing.
Running your recipe
^^^^^^^^^^^^^^^^^^^
You should be able to run your recipe by providing the direct paths to your custom
recipe and custom config using the :code:`tune` command with any CLI overrides:
recipe and custom config using the :ref:`tune <cli_label>` command with any CLI overrides:

.. code-block:: bash
Expand Down
2 changes: 1 addition & 1 deletion docs/source/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ And should see the following output:
Install via ``git clone``
-------------------------

If you want the latest and greatest features from torchtune or if you want to become a contributor,
If you want the latest and greatest features from torchtune or if you want to `become a contributor <https://github.com/pytorch/torchtune/blob/main/CONTRIBUTING.md>`_,
you can also install the package locally with the following command.

.. code-block:: bash
Expand Down
9 changes: 6 additions & 3 deletions docs/source/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,24 +39,27 @@ As you go through the tutorials and code, there are two concepts which will help

**Configs.** YAML files which help you configure training settings (dataset, model, chekckpoint) and
hyperparameters (batch size, learning rate) without modifying code.
See the :ref:`All About Configs deep-dive <config_tutorial_label>` for more information.
See the ":ref:`All About Configs<config_tutorial_label>`" deep-dive for more information.

**Recipes.** Recipes can be thought of
as targeted end-to-end pipelines for training and optionally evaluating LLMs.
Each recipe implements a training method (eg: full fine-tuning) with a set of meaningful
features (eg: FSDP + Activation Checkpointing + Gradient Accumulation + Reduced Precision training)
applied to a given model family (eg: Llama2). See the :ref:`What Are Recipes? deep-dive<recipe_deepdive>` for more information.
applied to a given model family (eg: Llama2). See the ":ref:`What Are Recipes?<recipe_deepdive>`" deep-dive for more information.

|
.. _design_principles_label:

Design Principles
-----------------

torchtune embodies `PyTorch’s design philosophy <https://pytorch.org/docs/stable/community/design.html>`_, especially "usability over everything else".

**Native PyTorch**

torchtune is a native-PyTorch library. While we provide integrations with the surrounding ecosystem (eg: Hugging Face Datasets, EleutherAI Eval Harness), all of the core functionality is written in PyTorch.
torchtune is a native-PyTorch library. While we provide integrations with the surrounding ecosystem (eg: `Hugging Face Datasets <https://huggingface.co/docs/datasets/en/index>`_,
`EleutherAI's Eval Harness <https://github.com/EleutherAI/lm-evaluation-harness>`_), all of the core functionality is written in PyTorch.


**Simplicity and Extensibility**
Expand Down
10 changes: 9 additions & 1 deletion docs/source/tune_cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ The ``--help`` option is convenient for getting more details about any command.
available options and their details. For example, ``tune download --help`` provides more information on how
to download files using the CLI.

.. _tune_download_label:

Download a model
----------------

Expand Down Expand Up @@ -102,6 +104,8 @@ with matching names. By default we ignore safetensor files, but if you want to i
built-in recipes or configs. For a list of supported model families and architectures, see :ref:`models<models>`.


.. _tune_ls_label:

List built-in recipes and configs
---------------------------------

Expand All @@ -123,11 +127,13 @@ The ``tune ls`` command lists out all the built-in recipes and configs within to
llama3/70B_full
...
.. _tune_cp_cli_label:

Copy a built-in recipe or config
--------------------------------

The ``tune cp <recipe|config> <path>`` command copies built-in recipes and configs to a provided location. This allows you to make a local copy of a library
recipe or config to edit directly for yourself.
recipe or config to edit directly for yourself. See :ref:`here <tune_cp_label>` for an example of how to use this command.

.. list-table::
:widths: 30 60
Expand Down Expand Up @@ -201,6 +207,8 @@ Further information on config overrides can be found :ref:`here <cli_override>`
tune run <RECIPE> --config <CONFIG> epochs=1
.. _validate_cli_label:

Validate a config
-----------------

Expand Down
4 changes: 2 additions & 2 deletions docs/source/tutorials/chat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -317,13 +317,13 @@ object.
)
.. note::
You can pass in any keyword argument for :code:`load_dataset` into all our
You can pass in any keyword argument for `load_dataset <https://huggingface.co/docs/datasets/v2.20.0/en/package_reference/loading_methods#datasets.load_dataset>`_ into all our
Dataset classes and they will honor them. This is useful for common parameters
such as specifying the data split with :code:`split` or configuration with
:code:`name`

Now we're ready to start fine-tuning! We'll use the built-in LoRA single device recipe.
Use the :code:`tune cp` command to get a copy of the :code:`8B_lora_single_device.yaml`
Use the :ref:`tune cp <tune_cp_cli_label>` command to get a copy of the :code:`8B_lora_single_device.yaml`
config and update it to use your new dataset. Create a new folder for your project
and make sure the dataset builder and message converter are saved in that directory,
then specify it in the config.
Expand Down
6 changes: 3 additions & 3 deletions docs/source/tutorials/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ Hugging Face datasets
---------------------

We provide first class support for datasets on the Hugging Face hub. Under the hood,
all of our built-in datasets and dataset builders are using Hugging Face's ``load_dataset()``
all of our built-in datasets and dataset builders are using Hugging Face's `load_dataset() <https://huggingface.co/docs/datasets/v2.20.0/en/package_reference/loading_methods#datasets.load_dataset>`_
to load in your data, whether local or on the hub.

You can pass in a Hugging Face dataset path to the ``source`` parameter in any of our builders
Expand Down Expand Up @@ -97,7 +97,7 @@ on Hugging Face's `documentation. <https://huggingface.co/docs/datasets/en/loadi
Setting max sequence length
---------------------------

The default collator :func:`~torchtune.utils.collate.padded_collate` used in all
The default collator, :func:`~torchtune.utils.padded_collate`, used in all
our training recipes will pad samples to the max sequence length within the batch,
not globally. If you wish to set an upper limit on the max sequence length globally,
you can specify it in the dataset builder with ``max_seq_len``. Any sample in the dataset
Expand Down Expand Up @@ -250,7 +250,7 @@ Here is an example of a sample that is formatted with :class:`~torchtune.data.Al
# ### Response:
#
We provide `other instruct templates <data>`
We provide :ref:`other instruct templates <data>`
for common tasks such summarization and grammar correction. If you need to create your own
instruct template for a custom task, you can inherit from :class:`~torchtune.data.InstructTemplate`
and create your own class.
Expand Down
6 changes: 4 additions & 2 deletions docs/source/tutorials/e2e_flow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,8 @@ Run Evaluation using EleutherAI's Eval Harness

We've fine-tuned a model. But how well does this model really do? Let's run some Evaluations!

.. TODO (SalmanMohammadi) ref eval recipe docs
torchtune integrates with
`EleutherAI's evaluation harness <https://github.com/EleutherAI/lm-evaluation-harness>`_.
An example of this is available through the
Expand All @@ -169,7 +171,7 @@ will be easier than overriding all of these elements through the CLI.
tune cp eleuther_evaluation ./custom_eval_config.yaml \
For this tutorial we'll use the ``truthfulqa_mc2`` task from the harness.
For this tutorial we'll use the `truthfulqa_mc2 <https://github.com/sylinrl/TruthfulQA>`_ task from the harness.
This task measures a model's propensity to be truthful when answering questions and
measures the model's zero-shot accuracy on a question followed by one or more true
responses and one or more false responses. Let's first run a baseline without fine-tuning.
Expand Down Expand Up @@ -422,7 +424,7 @@ Uploading your model to the Hugging Face Hub
--------------------------------------------

Your new model is working great and you want to share it with the world. The easiest way to do this
is utilizing the ``huggingface-cli`` command, which works seamlessly with torchtune. Simply point the CLI
is utilizing the `huggingface-cli <https://huggingface.co/docs/huggingface_hub/en/guides/cli>`_ command, which works seamlessly with torchtune. Simply point the CLI
to your finetuned model directory like so:

.. code-block:: bash
Expand Down
10 changes: 6 additions & 4 deletions docs/source/tutorials/first_finetune_tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,9 @@ Each recipe consists of three components:

torchtune provides built-in recipes for finetuning on single device, on multiple devices with `FSDP <https://pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/>`_,
using memory efficient techniques like `LoRA <https://arxiv.org/abs/2106.09685>`_, and more! You can view all built-in recipes `on GitHub <https://github.com/pytorch/torchtune/tree/main/recipes>`_. You can also utilize the
:code:`tune ls` command to print out all recipes and corresponding configs.
:ref:`tune ls <tune_ls_label>` command to print out all recipes and corresponding configs.

.. TODO (SalmanMohammadi) point to recipe index page here.
.. code-block:: bash
Expand All @@ -92,7 +94,7 @@ a single device. For a more in-depth discussion on LoRA in torchtune, you can se
.. note::

**Why have a separate recipe for single device vs. distributed?** This is discussed in
:ref:`recipe_deepdive` but one of our core principles in torchtune is minimal abstraction and boilerplate code.
":ref:`recipe_deepdive`" but one of our :ref:`core principles <design_principles_label>` in torchtune is minimal abstraction and boilerplate code.
If you only want to train on a single GPU, our single-device recipe ensures you don't have to worry about additional
features like FSDP that are only required for distributed training.

Expand All @@ -119,7 +121,7 @@ you want to set the number of training epochs to 1.
**Copy the config through `tune cp` and modify directly**

If you want to make more substantial changes to the config, you can use the :code:`tune` CLI to copy it to your local directory.
If you want to make more substantial changes to the config, you can use the :ref:`tune <cli_label>` CLI to copy it to your local directory.

.. code-block:: bash
Expand All @@ -139,7 +141,7 @@ Training a model
----------------
Now that you have a model in the proper format and a config that suits your needs, let's get training!

Just like all the other steps, you will be using the :code:`tune` CLI tool to launch your finetuning run.
Just like all the other steps, you will be using the :ref:`tune <cli_label>` CLI tool to launch your finetuning run.

.. code-block:: bash
Expand Down
Loading

0 comments on commit 3653c4a

Please sign in to comment.