[docs] update config file info (#147)

themattinthehatt · web-flow · commit cd10c8752f11 · 2024-04-15T12:38:28.000-04:00
* [docs] update min/max epochs description

* [docs] add info about starting model training from a checkpoint
diff --git a/docs/source/user_guide/config_file.rst b/docs/source/user_guide/config_file.rst
@@ -24,15 +24,31 @@ The config file contains several sections:
 Data parameters
 ===============
 
-* ``data.image_orig_dims.height/width``: the current version of Lightning Pose requires all training images to be the same size. We are working on an updated version without this requirement. However, if you plan to use the PCA losses (Pose PCA or multiview PCA) then all training images **must** be the same size, otherwise the PCA subspace will erroneously contain variance related to image size.
-
-* ``data.image_resize_dims.height/width``: images (and videos) will be resized to the specified height and width before being processed by the network. Supported values are {64, 128, 256, 384, 512}. The height and width need not be identical. Some points to keep in mind when selecting these values: if the resized images are too small, you will lose resolution/details; if they are too large, the model takes longer to train and might not train as well.
+* ``data.image_orig_dims.height/width``: the current version of Lightning Pose requires all
+  training images to be the same size.
+  We are working on an updated version without this requirement.
+  However, if you plan to use the PCA losses (Pose PCA or multiview PCA) then all training images
+  **must** be the same size, otherwise the PCA subspace will erroneously contain variance related
+  to image size.
+
+* ``data.image_resize_dims.height/width``: images (and videos) will be resized to the specified
+  height and width before being processed by the network.
+  Supported values are {64, 128, 256, 384, 512}.
+  The height and width need not be identical.
+  Some points to keep in mind when selecting these values:
+  if the resized images are too small, you will lose resolution/details;
+  if they are too large, the model takes longer to train and might not train as well.
 
 * ``data.data_dir/video_dir``: update these to reflect your local paths
 
-* ``data.num_keypoints``: the number of body parts. If using a mirrored setup, this should be the number of body parts summed across all views. If using a multiview setup, this number should indicate the number of keyponts per view (must be the same across all views).
+* ``data.num_keypoints``: the number of body parts.
+  If using a mirrored setup, this should be the number of body parts summed across all views.
+  If using a multiview setup, this number should indicate the number of keyponts per view
+  (must be the same across all views).
 
-* ``data.keypoint_names``: keypoint names should reflect the actual names/order in the csv file. This field is necessary if, for example, you are running inference on a machine that does not have the training data saved on it.
+* ``data.keypoint_names``: keypoint names should reflect the actual names/order in the csv file.
+  This field is necessary if, for example, you are running inference on a machine that does not
+  have the training data saved on it.
 
 * ``data.columns_for_singleview_pca``: see the :ref:`Pose PCA documentation <unsup_loss_pcasv>`
 
@@ -45,19 +61,36 @@ Model/training parameters
 Below is a list of some commonly modified arguments related to model architecture/training.
 
 * ``training.train_batch_size``: batch size for labeled data
-* ``training.min_epochs`` / ``training.max_epochs``: length of training
+
+* ``training.min_epochs`` / ``training.max_epochs``: length of training.
+  An epoch is one full pass through the dataset.
+  As an example, if you have 400 labeled frames, and ``training.train_batch_size=10``, then your
+  dataset is divided into 400/10 = 40 batches.
+  One "batch" in this case is equivalent to one "iteration" for DeepLabCut.
+  Therefore, 300 epochs, at 40 batches per epoch, is equal to 300*40=12k total batches
+  (or iterations).
+
 * ``model.model_type``:
 
     * regression: model directly outputs an (x, y) prediction for each keypoint; not recommended
     * heatmap: model outputs a 2D heatmap for each keypoint
-    * heatmap_mhcrnn: the "multi-head convolutional RNN", this model takes a temporal window of frames as input, and outputs two heatmaps: one "context-aware" and one "static". The prediction with the highest confidence is automatically chosen.
+    * heatmap_mhcrnn: the "multi-head convolutional RNN", this model takes a temporal window of
+      frames as input, and outputs two heatmaps: one "context-aware" and one "static".
+      The prediction with the highest confidence is automatically chosen.
 
-* ``model.losses_to_use``: defines the unsupervised losses. An empty list indicates a fully supervised model. Each element of the list corresponds to an unsupervised loss. For example, ``model.losses_to_use=[pca_multiview,temporal]`` will fit both a pca_multiview loss and a temporal loss. Options include:
+* ``model.losses_to_use``: defines the unsupervised losses.
+  An empty list indicates a fully supervised model.
+  Each element of the list corresponds to an unsupervised loss.
+  For example, ``model.losses_to_use=[pca_multiview,temporal]`` will fit both a pca_multiview loss
+  and a temporal loss. Options include:
 
     * pca_multiview: penalize inconsistencies between multiple camera views
     * pca_singleview: penalize implausible body configurations
     * temporal: penalize large temporal jumps
 
+* ``model.checkpoint``: to continue training from an existing checkpoint, update this parameter
+  to the absolute path of a pytorch .ckpt file
+
 See the :ref:`Unsupervised losses <unsupervised_losses>` section for more details on the various
 losses and their associated hyperparameters.
 
diff --git a/scripts/configs/config_default.yaml b/scripts/configs/config_default.yaml
@@ -95,6 +95,8 @@ model:
   heatmap_loss_type: mse
   # directory name for model saving
   model_name: test
+  # load model from checkpoint
+  checkpoint: null
 
 dali:
   general: