Full DPO Distributed #2275

sam-pi · 2025-01-17T01:33:48Z

Context

Adapted from the great work in #1966

What is the purpose of this PR? Is it to

add a new feature

Please link to any issues this PR addresses: relates to #2082

Changelog

What are the changes made in this PR?

Adds full DPO distributed training configs and recipes, adapting from the lora DPO training
Includes integration tests
Includes configs for llama3.1 8B and 70B models

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
add unit tests for any new functionality
update docstrings for any new or updated methods or classes
run unit tests via pytest tests
run recipe tests via pytest tests -m integration_test
manually run any new or modified recipes with sufficient proof of correctness
include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example

I did not change any public API

Commands and Sample Outputs

Full DPO Config

output_dir: .../Meta-Llama-3.1-8B-Instruct/full_dpo
model:
  _component_: torchtune.models.llama3_1.llama3_1_8b
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  path: .../Meta-Llama-3.1-8B-Instruct/original/tokenizer.model
  max_seq_len: 1024
checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: .../Meta-Llama-3.1-8B-Instruct
  checkpoint_files:
  - model-00001-of-00004.safetensors
  - model-00002-of-00004.safetensors
  - model-00003-of-00004.safetensors
  - model-00004-of-00004.safetensors
  recipe_checkpoint: null
  output_dir: ${output_dir}
  model_type: LLAMA3
resume_from_checkpoint: false
ref_checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: .../Meta-Llama-3.1-8B-Instruct
  checkpoint_files:
  - model-00001-of-00004.safetensors
  - model-00002-of-00004.safetensors
  - model-00003-of-00004.safetensors
  - model-00004-of-00004.safetensors
  recipe_checkpoint: null
  output_dir: ${output_dir}
  model_type: LLAMA3
dataset:
  _component_: torchtune.datasets.stack_exchange_paired_dataset
seed: null
shuffle: true
batch_size: 4
optimizer:
  _component_: torch.optim.AdamW
  fused: true
  weight_decay: 0.05
  lr: 1.0e-06
lr_scheduler:
  _component_: torchtune.training.lr_schedulers.get_cosine_schedule_with_warmup
  num_warmup_steps: 100
loss:
  _component_: torchtune.rlhf.loss.DPOLoss
  beta: 0.05
  label_smoothing: 0
epochs: 1
max_steps_per_epoch: 2000
gradient_accumulation_steps: 4
compile: false
metric_logger:
  _component_: torchtune.training.metric_logging.WandBLogger
  log_dir: ${output_dir}/logs
  project: torchtune
  name: llama3.1-8B-dpo_3605
log_every_n_steps: 1
log_peak_memory_stats: true
device: cuda
dtype: bf16
enable_activation_checkpointing: true
enable_activation_offloading: false

Lora DPO Config

output_dir: .../Meta-Llama-3.1-8B-Instruct/lora_dpo
model:
  _component_: torchtune.models.llama3_1.lora_llama3_1_8b
  lora_attn_modules:
  - q_proj
  - v_proj
  - output_proj
  apply_lora_to_mlp: true
  apply_lora_to_output: false
  lora_rank: 256
  lora_alpha: 256
  lora_dropout: 0.0
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  path: .../Meta-Llama-3.1-8B-Instruct/original/tokenizer.model
  max_seq_len: 1024
checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: .../Meta-Llama-3.1-8B-Instruct
  checkpoint_files:
  - model-00001-of-00004.safetensors
  - model-00002-of-00004.safetensors
  - model-00003-of-00004.safetensors
  - model-00004-of-00004.safetensors
  recipe_checkpoint: null
  output_dir: ${output_dir}
  model_type: LLAMA3
resume_from_checkpoint: false
save_adapter_weights_only: false
dataset:
  _component_: torchtune.datasets.stack_exchange_paired_dataset
seed: null
shuffle: true
batch_size: 4
optimizer:
  _component_: torch.optim.AdamW
  fused: true
  weight_decay: 0.05
  lr: 1.0e-05
lr_scheduler:
  _component_: torchtune.training.lr_schedulers.get_cosine_schedule_with_warmup
  num_warmup_steps: 100
loss:
  _component_: torchtune.rlhf.loss.DPOLoss
  beta: 0.1
  label_smoothing: 0
epochs: 1
max_steps_per_epoch: 100
gradient_accumulation_steps: 4
compile: false
metric_logger:
  _component_: torchtune.training.metric_logging.WandBLogger
  log_dir: ${output_dir}/logs
  project: torchtune
  name: llama3.1-8Blora-dpo_3603
log_every_n_steps: 1
log_peak_memory_stats: true
device: cuda
dtype: bf16
enable_activation_checkpointing: true
enable_activation_offloading: false

pytorch-bot · 2025-01-17T01:33:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2275

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

sam-pi · 2025-01-17T01:35:42Z

@joecummings Please take a look and let me know if you have feedback!

recipes/full_dpo_distributed.py

SalmanMohammadi · 2025-01-20T11:56:14Z

Hey @sam-pi! Thanks so much for adding this. I had a quick skim through and it looked good to me. I'll have a closer look soon. First, a couple of high level points.

Did you manage to train using these configs? If so, could you attach some evidence of successful runs (e.g. WandB links)?

I'm particularly interested in the hardware requirements for the 70B config. We may want to think about offering some additional memory performance improvements for this recipe in particular, such as different parallelization configurations for the reference model (which doesn't need gradients to be sharded), offloading the entire reference model to CPU, etc.

recipes/full_dpo_distributed.py

sam-pi · 2025-01-21T16:51:01Z

@SalmanMohammadi Please take a look at my training run screenshots and configs at the bottom of the PR summary (I tried re-uploading the screenshot of my WandB run). I tried showing a comparison of a rank/alpha 256 lora dpo run against a full dpo run (only 100 iterations).
For Llama3.1-70B-Instruct, I was able to run using 2 nodes with 8x H100 GPUs (I think this is just 2x the HW requirements for running a single non-quantized 70B).

SalmanMohammadi · 2025-01-23T14:28:30Z

recipes/configs/llama3_1/70B_full_dpo.yaml

+# You can add specific overrides through the command line. For example
+# to override the checkpointer directory while launching training
+# you can run:
+#   tune run --nnodes 1 --nproc_per_node 2 full_dpo_distributed --config llama3_1/70B_full_dpo checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>


Since you mentioned you trained on 2 nodes it'd be good to add the command you used here.

Seperately, I'm going to try see if I can find a config that can train on a single node with reasonable speeds.

I looked into running this on 1 node and I couldn't find a way to get it to fit - if you do please feel free to update. Otherwise, maybe it's not worth including this 70B_full_dpo.yaml in the PR since technically I only got this working with some custom scripts using sbatch and torchrun with --nnodes 2.

I know I'm late to this discussion, but at least for now I would leave out any config that cannot run on a single node. Now that #2301 is open, we do have a playbook on how to run our recipes on multiple nodes. At the same time, I don't want us to be in the business of maintaining a bunch of separate slurm scripts for every recipe. So the way I would sequence this is:

Land this PR without the 70B config (but keep it in our back pocket)

Figure out whether we can generalize our slurm script to be parametrized by recipe/config (it seems feasible to me, but admittedly I haven't tried and haven't used slurm in a while)

If (2) works, add in 70B_full_dpo.yaml with similar run instructions to what's in Multinode support in torchtune #2301's 70B_full_multinode.yaml

Separately, at least for 70B full finetune, we can fit on a single node with CPU offload (see the config fsdp_cpu_offload). Not sure if it's sufficient here (or the perf implications). There is also optimizer-in-backward and 8-bit optimizers (maybe model quality implications for the latter though). And while I'm leaving random suggestions.. if we are gonna do a 70B Llama model, why not 3.3?

Thanks, I removed the 70B config for now! Fair point on using 3.3 - I stuck to 3.1 to keep it simple for now and I hope it could be adapted relatively easily to 3.3.

EugenHotaj · 2025-01-28T21:11:28Z

Any updates on merging this to main? Really excited to use it 😄

recipes/full_dpo_distributed.py

EugenHotaj · 2025-01-29T23:05:37Z

recipes/full_dpo_distributed.py

+                        _,
+                    ) = self.concatenated_forward(self._ref_model, batch)
+
+                loss, chosen_rewards, rejected_rewards = self._loss_fn(


Another heads up: we log these below but we're not taking GAS into account.

(lmk if these comments are unhelpful btw and I'll stop 🙂 -- just trying to get this PR to run / verify on our setup and commenting as I find discrepancies)

(lmk if these comments are unhelpful btw and I'll stop 🙂 -- just trying to get this PR to run / verify on our setup and commenting as I find discrepancies)

Not at all, your comments are incredibly helpful and more than welcome! Thanks for taking the time to help review.

Another heads up: we log these below but we're not taking GAS into account.

noob q: what's GAS?

Gradient Accumulation Steps

Yeah you're totally right. We should update to correct for gradient accumulation steps.

In that case I assume the same holds for the LoRA DPO recipe too, right?

EugenHotaj · 2025-01-29T23:45:20Z

recipes/full_dpo_distributed.py

+        self._resume_from_checkpoint = cfg.resume_from_checkpoint
+        self._gradient_accumulation_steps = cfg.gradient_accumulation_steps
+        self._optimizer_in_bwd = cfg.get("optimizer_in_bwd", False)
+        self._clip_grad_norm = cfg.get("clip_grad_norm", None)


Looks like we're missing the actual grad clipping logic in the train step.

EugenHotaj · 2025-01-30T01:10:28Z

With the changes I mentioned in the comments I was able to get parity with NeMo's DPO using the same data / hparams. E.g. here's the loss curves:

Really awesome work! Pretty excited to use this.

SalmanMohammadi · 2025-01-30T10:06:18Z

Any updates on merging this to main? Really excited to use it 😄

I'm going to try out investigate some alternative sharding strategies for the reference model, and see if I can get single-node training working for 70B. Will update soon. @sam-pi would you be up for looking into @EugenHotaj's comments above?

SalmanMohammadi · 2025-01-30T16:18:15Z

OK so we're not blocking this PR I'm going to leave exploring different parallelism strategies for a follow-up. Let's make the necessary fixes to this recipe and bring it in line with our other distributed recipes.

@sam-pi If the 70B config doesn't work on a single node, I'd also suggest we remove it for now and add it back in after patching in the changes from #2301. What do you think?

EugenHotaj · 2025-01-30T16:28:40Z

recipes/full_dpo_distributed.py

+        # formed by concatenating an equal number of "chosen" and "rejected".
+        len_chosen = concatenated_input_ids.shape[0] // 2
+
+        all_logits = model(concatenated_input_ids)


One way to reduce memory and potentially fit this on a single node is to call model(...) twice. Right now we're effectively doubling the batch size here, and might be causing OOMs.

One scenario I can think of is if a user is OOMing and can't go below an effective batch size of 2 (by configuring a batch size of 1). I'd be interested in seeing the tradeoff here vs. the additional computation from two extra model forward passes (both the policy and reference models) - though I have a feeling the memory savings may not be worth it.

I think that's a great idea - but I will plan to leave it out of this PR if that's alright. That seems like a useful addition to all DPO recipes

sam-pi · 2025-01-30T16:46:59Z

OK so we're not blocking this PR I'm going to leave exploring different parallelism strategies for a follow-up. Let's make the necessary fixes to this recipe and bring it in line with our other distributed recipes.

@sam-pi If the 70B config doesn't work on a single node, I'd also suggest we remove it for now and add it back in after patching in the changes from #2301. What do you think?

Thanks, I will look into all these fixes today and also remove the 70B config for now

ebsmothers

This is looking great, thanks so much for adding this @sam-pi! Aside from my inline comments it'd be good to confirm that various features like compile, optimizer-in-backward, etc are working and doing what we'd expect (we can even add e.g. compile to the recipe test)

ebsmothers · 2025-01-30T16:34:36Z

recipes/configs/llama3_1/70B_full_dpo.yaml

+# You can add specific overrides through the command line. For example
+# to override the checkpointer directory while launching training
+# you can run:
+#   tune run --nnodes 1 --nproc_per_node 2 full_dpo_distributed --config llama3_1/70B_full_dpo checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>


I know I'm late to this discussion, but at least for now I would leave out any config that cannot run on a single node. Now that #2301 is open, we do have a playbook on how to run our recipes on multiple nodes. At the same time, I don't want us to be in the business of maintaining a bunch of separate slurm scripts for every recipe. So the way I would sequence this is:

Land this PR without the 70B config (but keep it in our back pocket)

Figure out whether we can generalize our slurm script to be parametrized by recipe/config (it seems feasible to me, but admittedly I haven't tried and haven't used slurm in a while)

If (2) works, add in 70B_full_dpo.yaml with similar run instructions to what's in Multinode support in torchtune #2301's 70B_full_multinode.yaml

ebsmothers · 2025-01-30T16:36:30Z

recipes/configs/llama3_1/70B_full_dpo.yaml

+# You can add specific overrides through the command line. For example
+# to override the checkpointer directory while launching training
+# you can run:
+#   tune run --nnodes 1 --nproc_per_node 2 full_dpo_distributed --config llama3_1/70B_full_dpo checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>


Separately, at least for 70B full finetune, we can fit on a single node with CPU offload (see the config fsdp_cpu_offload). Not sure if it's sufficient here (or the perf implications). There is also optimizer-in-backward and 8-bit optimizers (maybe model quality implications for the latter though). And while I'm leaving random suggestions.. if we are gonna do a 70B Llama model, why not 3.3?

ebsmothers · 2025-01-30T16:40:26Z

tests/recipes/test_full_dpo_distributed.py

+
+        # Train for two epochs
+        cmd_1 = f"""
+        tune run --nnodes 1 --nproc_per_node 1 full_dpo_distributed \


This feels a bit weird.. why are we testing a distributed recipe on a single device (similar comment for the other commands in this file)?

Ah good point, I more or less copied this from the lora DPO testing - I will look into updating it

ebsmothers · 2025-01-30T16:41:24Z

tests/recipes/test_full_dpo_distributed.py

+        # epoch_folder = get_largest_iter_folder(tmpdir)
+        # epoch_folder_minus_one = f"epoch_{int(epoch_folder.split('_')[-1]) - 1}"


Why are these commented out? Can they be removed?

I will remove, thanks for catching that. These are copied from the lora DPO tests

ebsmothers · 2025-01-30T16:42:44Z

tests/recipes/test_full_dpo_distributed.py

+        )
+
+    @pytest.mark.integration_test
+    def test_save_and_load_weights(self, tmpdir, monkeypatch):


This test I don't fully understand.. it makes sense why we would do something like this for LoRA where we are merging the weights in the final checkpoint. But here the model arch is the same, right? Do we see it as likely that something will go wrong during save and load (that's not already accounted for by the resume from checkpoint test above)?

I was guessing it's good practice to make sure save/load works in general, but happy to remove if it doesn't make sense.

Yeah mainly I am cognizant of us not having too too many recipe tests (they take a bit of time to run and run on every PR). In this case I claim save and load is already pretty well-covered by test_training_state_on_resume

ebsmothers · 2025-01-30T17:11:51Z

recipes/full_dpo_distributed.py

+        self._ref_model = self._setup_reference_model(
+            cfg_model=cfg.model,
+            fsdp_cpu_offload=cfg.get("fsdp_cpu_offload", False),
+            reshard_after_forward=cfg.get("fsdp_reshard_after_forward", True),


It may be worth running some memory profiling on this recipe (especially since it's already enabled). Like it seems to me that by setting reshard_after_forward=True here, we never have both the reference model and the policy model weights gathered at the same time. Is that the correct understanding? If so, worth confirming that it happens in practice (especially given the discussion around fitting on a single node)

@sam-pi you mentioned you had to set this to True, right? What did you find?

I was finding if this was set to False I was getting OOM issues. I didn't yet debug further than note that it just works for me when set to True.

ebsmothers · 2025-01-30T17:15:22Z

recipes/full_dpo_distributed.py

+                # deleting logits here helps reduce (peak) memory usage - we only need them for metric logging
+                del policy_chosen_logits, policy_rejected_logits
+
+                with torch.no_grad():


I wonder whether we can just run in inference mode? Cause reference model never has grad updates so stuff like view tracking etc afforded by no_grad shouldn't be relevant, right? Lmk if I'm way off base, otherwise worth a try imo.

I thought that by running model.eval() we are setting inference mode - am I mistaken?

model.eval mainly disables behaviour like dropout and batch norm. I think setting no grad on the prams is probably overkill though.

I've had issues with inference mode causing additional recompiles in the inlined compiled transformer layers.

I've had issues with inference mode causing additional recompiles in the inlined compiled transformer layers.

Ah fair enough. Yeah mainly I was thinking it could potentially give some minor speedups. But if it messes with compile then agree it's not worth it

ebsmothers · 2025-01-30T17:17:07Z

recipes/full_dpo_distributed.py

+                        _,
+                    ) = self.concatenated_forward(self._ref_model, batch)
+
+                loss, chosen_rewards, rejected_rewards = self._loss_fn(


In that case I assume the same holds for the LoRA DPO recipe too, right?

ebsmothers · 2025-01-30T17:17:50Z

recipes/full_dpo_distributed.py

+
+                # Step with optimizer
+                if (idx + 1) % self._gradient_accumulation_steps == 0:
+                    self._optimizer.step()


I don't think we've actually enabled optimizer in backward here either

@sam-pi relevant code snippet to enable

torchtune/recipes/ppo_full_finetune_single_device.py

Line 1007 in 6487029

if not self._optimizer_in_bwd:

ebsmothers · 2025-01-30T17:18:39Z

recipes/full_dpo_distributed.py

+                        time_per_step = time.perf_counter() - t0
+                        log_dict = {
+                            "loss": loss_to_log,
+                            "lr": self._optimizer.param_groups[0]["lr"],


I believe we also have this utility now in case that's helpful here

recipes/full_dpo_distributed.py

EugenHotaj · 2025-01-31T18:14:58Z

recipes/full_dpo_distributed.py

+                        _,
+                        _,


One final heads up: I was getting OOMs on 70B unless I deleted these logits as well. I also had to call gc.collect(); torch.cuda.clear_cache() otherwise I'd OOM mid-training sometimes.

Since all we're doing with the logits is calling .mean() below, we could return the means directly from concatenated_forward. Then you don't need to do any explicit deletion (but maybe still have to call collect(), clear_cache().

Thanks, I added in deleting the logits for now at least

…ora_dpo fix: Running metrics and tokens_per_second_per_gpu fixes for DPO recipes

sam-pi · 2025-02-01T00:14:16Z

@ebsmothers @EugenHotaj @SalmanMohammadi Please take a look at the updates from @bogdansalyp to fix metric syncing/averaging across ranks and accounting for gradient accumulation in metrics.

fix: num_tokens all_reduce crash in DPO recipes

SalmanMohammadi · 2025-02-04T10:51:21Z

torchtune/_recipe_registry.py

+                name="llama3_1/8B_full_dpo",
+                file_path="llama3_1/8B_full_dpo.yaml",
+            ),
+            Config(


This can now be removed.

full dpo configs, distributed recipe, and integration tests

b96255b

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 17, 2025

SalmanMohammadi reviewed Jan 20, 2025

View reviewed changes

recipes/full_dpo_distributed.py Outdated Show resolved Hide resolved

SalmanMohammadi reviewed Jan 20, 2025

View reviewed changes

recipes/full_dpo_distributed.py Show resolved Hide resolved

SalmanMohammadi reviewed Jan 20, 2025

View reviewed changes

recipes/full_dpo_distributed.py Outdated Show resolved Hide resolved

RdoubleA mentioned this pull request Jan 21, 2025

v0.6.0 tracker #2232

Open

SalmanMohammadi reviewed Jan 23, 2025

View reviewed changes

sam-pi and others added 3 commits January 23, 2025 21:34

disable dropout, ref model setup, minor doc update

761b718

Merge remote-tracking branch 'upstream/main' into HEAD

753e822

updating full recipe

0f90093

EugenHotaj mentioned this pull request Jan 29, 2025

Differing component implementation logic across recipes #2307

Open

EugenHotaj reviewed Jan 29, 2025

View reviewed changes

recipes/full_dpo_distributed.py Outdated Show resolved Hide resolved

EugenHotaj reviewed Jan 29, 2025

View reviewed changes

recipes/full_dpo_distributed.py Outdated Show resolved Hide resolved

EugenHotaj reviewed Jan 29, 2025

View reviewed changes

updating recipe

ebed89c

EugenHotaj reviewed Jan 30, 2025

View reviewed changes

removing 70B full dpo config until multi-node support is available

aff595f

ebsmothers reviewed Jan 30, 2025

View reviewed changes

minor update to avoid _ref_model self reference

431f269

SalmanMohammadi reviewed Jan 30, 2025

View reviewed changes

recipes/full_dpo_distributed.py Outdated Show resolved Hide resolved

clean up rank zero logs and ref_checkpointer

c63e9e8

EugenHotaj reviewed Jan 30, 2025

View reviewed changes

recipes/full_dpo_distributed.py Show resolved Hide resolved

sam-pi and others added 2 commits January 30, 2025 19:35

remove unncessary save/load test and update to 2 GPUs

ebf288a

fix: Metrics weren't running and synced across devices

ba12bb4

EugenHotaj reviewed Jan 31, 2025

View reviewed changes

bogdansalyp and others added 5 commits January 31, 2025 22:59

fix: Fixed tokens_per_second_per_gpu

6139096

fix: Fixed torch.distributed naming

2a4ca92

fix: tokens_per_second_pre_gpu fixed for full dpo

7f94b07

fix: Added running metrics to full_dpo_distributed

1a673df

Merge pull request #2 from bogdansalyp/fix/running_metrics_and_sync_l…

16821c4

…ora_dpo fix: Running metrics and tokens_per_second_per_gpu fixes for DPO recipes

bogdansalyp and others added 3 commits February 3, 2025 19:03

fix: num_tokens all_reduce crash in DPO recipes

d052271

Merge pull request #3 from bogdansalyp/fix/num_tokens-tensor-issue

f9fedc4

fix: num_tokens all_reduce crash in DPO recipes

delete ref logits and improved default full dpo config

a0ac5aa

bogdansalyp mentioned this pull request Feb 3, 2025

Seed is not applied for DPO recipes #2335

Open

SalmanMohammadi reviewed Feb 4, 2025

View reviewed changes

		# epoch_folder = get_largest_iter_folder(tmpdir)
		# epoch_folder_minus_one = f"epoch_{int(epoch_folder.split('_')[-1]) - 1}"

Full DPO Distributed #2275

Are you sure you want to change the base?

Full DPO Distributed #2275

Conversation

sam-pi commented Jan 17, 2025 • edited Loading

Context

Changelog

Test plan

UX

Commands and Sample Outputs

pytorch-bot bot commented Jan 17, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2275

sam-pi commented Jan 17, 2025

SalmanMohammadi commented Jan 20, 2025 • edited Loading

sam-pi commented Jan 21, 2025 • edited Loading

Choose a reason for hiding this comment

sam-pi Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EugenHotaj commented Jan 28, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EugenHotaj commented Jan 30, 2025

SalmanMohammadi commented Jan 30, 2025 • edited Loading

SalmanMohammadi commented Jan 30, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sam-pi commented Jan 30, 2025

ebsmothers left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sam-pi commented Feb 1, 2025

Choose a reason for hiding this comment

sam-pi commented Jan 17, 2025 •

edited

Loading

pytorch-bot bot commented Jan 17, 2025 •

edited

Loading

SalmanMohammadi commented Jan 20, 2025 •

edited

Loading

sam-pi commented Jan 21, 2025 •

edited

Loading

sam-pi Jan 23, 2025 •

edited

Loading

SalmanMohammadi commented Jan 30, 2025 •

edited

Loading