[DOCS] Cross-reference API/deep dives/tutorials (pytorch#1229)

Co-authored-by: Rafi Ayub <[email protected]>
felipemello1 · Aug 3, 2024 · 3653c4a · 3653c4a
1 parent d2d3088
commit 3653c4a
Show file tree

Hide file tree

Showing 14 changed files with 97 additions and 59 deletions.
diff --git a/docs/source/deep_dives/checkpointer.rst b/docs/source/deep_dives/checkpointer.rst
@@ -115,21 +115,28 @@ fine-tuned checkpoints from torchtune with any post-training tool (quantization,
 which supports the source format, without any code changes OR conversion scripts. This is one of the
 ways in which torchtune interoperates with the surrounding ecosystem.
 
-To be "state-dict invariant", the ``load_checkpoint`` and
-``save_checkpoint`` methods make use of the weight convertors available
-`here <https://github.com/pytorch/torchtune/blob/main/torchtune/models/convert_weights.py>`_.
+.. note::
+
+  To be state-dict "invariant" in this way, the ``load_checkpoint`` and ``save_checkpoint`` methods of each checkpointer
+  make use of weight converters which correctly map weights between checkpoint formats. For example, when loading weights
+  from Hugging Face, we apply a permutation to certain weights on load and save to ensure checkpoints behave exactly the same.
+  To further illustrate this, the Llama family of models uses a
+  `generic weight converter function <https://github.com/pytorch/torchtune/blob/898670f0eb58f956b5228e5a55ccac4ea0efaff8/torchtune/models/convert_weights.py#L113>`_
+  whilst some other models like Phi3 have their own `conversion functions <https://github.com/pytorch/torchtune/blob/main/torchtune/models/phi3/_convert_weights.py>`_
+  which can be found within their model folders.
 
 |
 
 Handling different Checkpoint Formats
 -------------------------------------
 
 torchtune supports three different
-`checkpointers <https://github.com/pytorch/torchtune/blob/main/torchtune/utils/_checkpointing/_checkpointer.py>`_,
+:ref:`checkpointers<checkpointing_label>`,
 each of which supports a different checkpoint format.
 
 
-**HFCheckpointer**
+:class:`HFCheckpointer <torchtune.utils.FullModelHFCheckpointer>`
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 This checkpointer reads and writes checkpoints in a format which is compatible with the transformers
 framework from Hugging Face. As mentioned above, this is the most popular format within the Hugging Face
@@ -195,12 +202,11 @@ The following snippet explains how the HFCheckpointer is setup in torchtune conf
     read directly from the ``config.json`` file. This helps ensure we either load the weights
     correctly or error out in case of discrepancy between the HF checkpoint file and torchtune's
     model implementations. This json file is downloaded from the hub along with the model checkpoints.
-    More details on how these are used during conversion can be found
-    `here <https://github.com/pytorch/torchtune/blob/main/torchtune/models/convert_weights.py>`_.
 
 |
 
-**MetaCheckpointer**
+:class:`MetaCheckpointer <torchtune.utils.FullModelMetaCheckpointer>`
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 This checkpointer reads and writes checkpoints in a format which is compatible with the original meta-llama
 github repository.
@@ -259,7 +265,8 @@ The following snippet explains how the MetaCheckpointer is setup in torchtune co
 
 |
 
-**TorchTuneCheckpointer**
+:class:`TorchTuneCheckpointer <torchtune.utils.FullModelTorchTuneCheckpointer>`
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 This checkpointer reads and writes checkpoints in a format that is compatible with torchtune's
 model definition. This does not perform any state_dict conversions and is currently used either
@@ -463,8 +470,7 @@ For this section we'll use the Llama2 13B model in HF format.
 
 
 You can do this with any model supported by torchtune. You can find a full list
-of models and model builders
-`here <https://github.com/pytorch/torchtune/tree/main/torchtune/models>`__.
+of models and model builders :ref:`here <models>`.
 
 We hope this deep-dive provided a deeper insight into the checkpointer and
 associated utilities in torchtune. Happy tuning!
diff --git a/docs/source/deep_dives/configs.rst b/docs/source/deep_dives/configs.rst
@@ -47,8 +47,8 @@ for a particular run.
     enable_fsdp: True
     ...
 
-Configuring components using :code:`instantiate`
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Configuring components using :func:`instantiate<torchtune.config.instantiate>`
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 Many fields will require specifying torchtune objects with associated keyword
 arguments as parameters. Models, datasets, optimizers, and loss functions are
 common examples of this. You can easily do this using the :code:`_component_`
@@ -152,10 +152,10 @@ will automatically resolve it for you.
 
 Validating your config
 ^^^^^^^^^^^^^^^^^^^^^^
-We provide a convenient CLI utility, :code:`tune validate`, to quickly verify that
+We provide a convenient CLI utility, :ref:`tune validate<validate_cli_label>`, to quickly verify that
 your config is well-formed and all components can be instantiated properly. You
 can also pass in overrides if you want to test out the exact commands you will run
-your experiments with. If any parameters are not well-formed, :code:`tune validate`
+your experiments with. If any parameters are not well-formed, :ref:`tune validate<validate_cli_label>`
 will list out all the locations where an error was found.
 
 .. code-block:: bash
@@ -216,6 +216,8 @@ the config itself. To enable quick experimentation, you can specify override val
 to parameters in your config via the :code:`tune` command. These should be specified
 as key-value pairs :code:`k1=v1 k2=v2 ...`
 
+.. TODO (SalmanMohammadi) link this to the upcoming recipe docpage for the lora recipe
+
 For example, to run the :code:`lora_finetune_single_device` recipe with custom model and tokenizer directories, you can provide overrides:
 
 .. code-block:: bash
@@ -248,9 +250,10 @@ Removing config fields
 You may need to remove certain parameters from the config when changing components
 through overrides that require different keyword arguments. You can do so by using
 the `~` flag and specify the dotpath of the config field you would like to remove.
-For example, if you want to override a built-in config and use the ``bitsandbytes.optim.PagedAdamW8bit``
+For example, if you want to override a built-in config and use the
+`bitsandbytes.optim.PagedAdamW8bit <https://huggingface.co/docs/bitsandbytes/main/en/reference/optim/adamw#bitsandbytes.optim.PagedAdamW8bit>`_
 optimizer, you may need to delete parameters like ``foreach`` which are
-specific to PyTorch optimizers. Note that this example requires that you have ``bitsandbytes``
+specific to PyTorch optimizers. Note that this example requires that you have `bitsandbytes <https://github.com/bitsandbytes-foundation/bitsandbytes>`_
 installed.
 
 .. code-block:: yaml

diff --git a/docs/source/deep_dives/recipe_deepdive.rst b/docs/source/deep_dives/recipe_deepdive.rst
@@ -68,6 +68,8 @@ For a complete working example, refer to the
 in torchtune and the associated
 `config <https://github.com/pytorch/torchtune/blob/main/recipes/configs/7B_full.yaml>`_.
 
+.. TODO (SalmanMohammadi) ref to full finetune recipe doc
+
 |
 
 What Recipes are not?
@@ -209,7 +211,7 @@ You can learn all about configs in our :ref:`config deep-dive<config_tutorial_la
 Config and CLI parsing using :code:`parse`
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 We provide a convenient decorator :func:`~torchtune.config.parse` that wraps
-your recipe to enable running from the command-line with :code:`tune` with config
+your recipe to enable running from the command-line with :ref:`tune <cli_label>` with config
 and CLI override parsing.
 
 .. code-block:: python
@@ -225,7 +227,7 @@ and CLI override parsing.
 Running your recipe
 ^^^^^^^^^^^^^^^^^^^
 You should be able to run your recipe by providing the direct paths to your custom
-recipe and custom config using the :code:`tune` command with any CLI overrides:
+recipe and custom config using the :ref:`tune <cli_label>` command with any CLI overrides:
 
 .. code-block:: bash
 

diff --git a/docs/source/install.rst b/docs/source/install.rst
@@ -41,7 +41,7 @@ And should see the following output:
 Install via ``git clone``
 -------------------------
 
-If you want the latest and greatest features from torchtune or if you want to become a contributor,
+If you want the latest and greatest features from torchtune or if you want to `become a contributor <https://github.com/pytorch/torchtune/blob/main/CONTRIBUTING.md>`_,
 you can also install the package locally with the following command.
 
 .. code-block:: bash

diff --git a/docs/source/overview.rst b/docs/source/overview.rst
@@ -39,24 +39,27 @@ As you go through the tutorials and code, there are two concepts which will help
 
 **Configs.** YAML files which help you configure training settings (dataset, model, chekckpoint) and
 hyperparameters (batch size, learning rate) without modifying code.
-See the :ref:`All About Configs deep-dive <config_tutorial_label>` for more information.
+See the ":ref:`All About Configs<config_tutorial_label>`" deep-dive for more information.
 
 **Recipes.** Recipes can be thought of
 as targeted end-to-end pipelines for training and optionally evaluating LLMs.
 Each recipe implements a training method (eg: full fine-tuning) with a set of meaningful
 features (eg: FSDP + Activation Checkpointing + Gradient Accumulation + Reduced Precision training)
-applied to a given model family (eg: Llama2). See the :ref:`What Are Recipes? deep-dive<recipe_deepdive>` for more information.
+applied to a given model family (eg: Llama2). See the ":ref:`What Are Recipes?<recipe_deepdive>`" deep-dive for more information.
 
 |
 
+.. _design_principles_label:
+
 Design Principles
 -----------------
 
 torchtune embodies `PyTorch’s design philosophy <https://pytorch.org/docs/stable/community/design.html>`_, especially "usability over everything else".
 
 **Native PyTorch**
 
-torchtune is a native-PyTorch library. While we provide integrations with the surrounding ecosystem (eg: Hugging Face Datasets, EleutherAI Eval Harness), all of the core functionality is written in PyTorch.
+torchtune is a native-PyTorch library. While we provide integrations with the surrounding ecosystem (eg: `Hugging Face Datasets <https://huggingface.co/docs/datasets/en/index>`_,
+`EleutherAI's Eval Harness <https://github.com/EleutherAI/lm-evaluation-harness>`_), all of the core functionality is written in PyTorch.
 
 
 **Simplicity and Extensibility**

diff --git a/docs/source/tune_cli.rst b/docs/source/tune_cli.rst
@@ -34,6 +34,8 @@ The ``--help`` option is convenient for getting more details about any command.
 available options and their details. For example, ``tune download --help`` provides more information on how
 to download files using the CLI.
 
+.. _tune_download_label:
+
 Download a model
 ----------------
 
@@ -102,6 +104,8 @@ with matching names. By default we ignore safetensor files, but if you want to i
     built-in recipes or configs. For a list of supported model families and architectures, see :ref:`models<models>`.
 
 
+.. _tune_ls_label:
+
 List built-in recipes and configs
 ---------------------------------
 
@@ -123,11 +127,13 @@ The ``tune ls`` command lists out all the built-in recipes and configs within to
                                              llama3/70B_full
     ...
 
+.. _tune_cp_cli_label:
+
 Copy a built-in recipe or config
 --------------------------------
 
 The ``tune cp <recipe|config> <path>`` command copies built-in recipes and configs to a provided location. This allows you to make a local copy of a library
-recipe or config to edit directly for yourself.
+recipe or config to edit directly for yourself. See :ref:`here <tune_cp_label>` for an example of how to use this command.
 
 .. list-table::
    :widths: 30 60
@@ -201,6 +207,8 @@ Further information on config overrides can be found :ref:`here  <cli_override>`
 
   tune run <RECIPE> --config <CONFIG> epochs=1
 
+.. _validate_cli_label:
+
 Validate a config
 -----------------
 

diff --git a/docs/source/tutorials/chat.rst b/docs/source/tutorials/chat.rst
@@ -317,13 +317,13 @@ object.
         )
 
 .. note::
-    You can pass in any keyword argument for :code:`load_dataset` into all our
+    You can pass in any keyword argument for `load_dataset <https://huggingface.co/docs/datasets/v2.20.0/en/package_reference/loading_methods#datasets.load_dataset>`_ into all our
     Dataset classes and they will honor them. This is useful for common parameters
     such as specifying the data split with :code:`split` or configuration with
     :code:`name`
 
 Now we're ready to start fine-tuning! We'll use the built-in LoRA single device recipe.
-Use the :code:`tune cp` command to get a copy of the :code:`8B_lora_single_device.yaml`
+Use the :ref:`tune cp <tune_cp_cli_label>` command to get a copy of the :code:`8B_lora_single_device.yaml`
 config and update it to use your new dataset. Create a new folder for your project
 and make sure the dataset builder and message converter are saved in that directory,
 then specify it in the config.

diff --git a/docs/source/tutorials/datasets.rst b/docs/source/tutorials/datasets.rst
@@ -56,7 +56,7 @@ Hugging Face datasets
 ---------------------
 
 We provide first class support for datasets on the Hugging Face hub. Under the hood,
-all of our built-in datasets and dataset builders are using Hugging Face's ``load_dataset()``
+all of our built-in datasets and dataset builders are using Hugging Face's `load_dataset() <https://huggingface.co/docs/datasets/v2.20.0/en/package_reference/loading_methods#datasets.load_dataset>`_
 to load in your data, whether local or on the hub.
 
 You can pass in a Hugging Face dataset path to the ``source`` parameter in any of our builders
@@ -97,7 +97,7 @@ on Hugging Face's `documentation. <https://huggingface.co/docs/datasets/en/loadi
 Setting max sequence length
 ---------------------------
 
-The default collator :func:`~torchtune.utils.collate.padded_collate` used in all
+The default collator, :func:`~torchtune.utils.padded_collate`, used in all
 our training recipes will pad samples to the max sequence length within the batch,
 not globally. If you wish to set an upper limit on the max sequence length globally,
 you can specify it in the dataset builder with ``max_seq_len``. Any sample in the dataset
@@ -250,7 +250,7 @@ Here is an example of a sample that is formatted with :class:`~torchtune.data.Al
     # ### Response:
     #
 
-We provide `other instruct templates <data>`
+We provide :ref:`other instruct templates <data>`
 for common tasks such summarization and grammar correction. If you need to create your own
 instruct template for a custom task, you can inherit from :class:`~torchtune.data.InstructTemplate`
 and create your own class.

diff --git a/docs/source/tutorials/e2e_flow.rst b/docs/source/tutorials/e2e_flow.rst
@@ -151,6 +151,8 @@ Run Evaluation using EleutherAI's Eval Harness
 
 We've fine-tuned a model. But how well does this model really do? Let's run some Evaluations!
 
+.. TODO (SalmanMohammadi) ref eval recipe docs
+
 torchtune integrates with
 `EleutherAI's evaluation harness <https://github.com/EleutherAI/lm-evaluation-harness>`_.
 An example of this is available through the
@@ -169,7 +171,7 @@ will be easier than overriding all of these elements through the CLI.
 
     tune cp eleuther_evaluation ./custom_eval_config.yaml \
 
-For this tutorial we'll use the ``truthfulqa_mc2`` task from the harness.
+For this tutorial we'll use the `truthfulqa_mc2 <https://github.com/sylinrl/TruthfulQA>`_ task from the harness.
 This task measures a model's propensity to be truthful when answering questions and
 measures the model's zero-shot accuracy on a question followed by one or more true
 responses and one or more false responses. Let's first run a baseline without fine-tuning.
@@ -422,7 +424,7 @@ Uploading your model to the Hugging Face Hub
 --------------------------------------------
 
 Your new model is working great and you want to share it with the world. The easiest way to do this
-is utilizing the ``huggingface-cli`` command, which works seamlessly with torchtune. Simply point the CLI
+is utilizing the `huggingface-cli <https://huggingface.co/docs/huggingface_hub/en/guides/cli>`_ command, which works seamlessly with torchtune. Simply point the CLI
 to your finetuned model directory like so:
 
 .. code-block:: bash

diff --git a/docs/source/tutorials/first_finetune_tutorial.rst b/docs/source/tutorials/first_finetune_tutorial.rst
@@ -70,7 +70,9 @@ Each recipe consists of three components:
 
 torchtune provides built-in recipes for finetuning on single device, on multiple devices with `FSDP <https://pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/>`_,
 using memory efficient techniques like `LoRA <https://arxiv.org/abs/2106.09685>`_, and more! You can view all built-in recipes `on GitHub <https://github.com/pytorch/torchtune/tree/main/recipes>`_. You can also utilize the
-:code:`tune ls` command to print out all recipes and corresponding configs.
+:ref:`tune ls <tune_ls_label>` command to print out all recipes and corresponding configs.
+
+.. TODO (SalmanMohammadi) point to recipe index page here.
 
 .. code-block:: bash
 
@@ -92,7 +94,7 @@ a single device. For a more in-depth discussion on LoRA in torchtune, you can se
 .. note::
 
   **Why have a separate recipe for single device vs. distributed?** This is discussed in
-  :ref:`recipe_deepdive` but one of our core principles in torchtune is minimal abstraction and boilerplate code.
+  ":ref:`recipe_deepdive`" but one of our :ref:`core principles <design_principles_label>` in torchtune is minimal abstraction and boilerplate code.
   If you only want to train on a single GPU, our single-device recipe ensures you don't have to worry about additional
   features like FSDP that are only required for distributed training.
 
@@ -119,7 +121,7 @@ you want to set the number of training epochs to 1.
 
 **Copy the config through `tune cp` and modify directly**
 
-If you want to make more substantial changes to the config, you can use the :code:`tune` CLI to copy it to your local directory.
+If you want to make more substantial changes to the config, you can use the :ref:`tune <cli_label>` CLI to copy it to your local directory.
 
 .. code-block:: bash
 
@@ -139,7 +141,7 @@ Training a model
 ----------------
 Now that you have a model in the proper format and a config that suits your needs, let's get training!
 
-Just like all the other steps, you will be using the :code:`tune` CLI tool to launch your finetuning run.
+Just like all the other steps, you will be using the :ref:`tune <cli_label>` CLI tool to launch your finetuning run.
 
 .. code-block:: bash