-
Notifications
You must be signed in to change notification settings - Fork 120
Open
Description
Hello, thank you for sharing your work.
I was looking at the experimental setup details provided in the documentation (or README), and I have a question regarding the effective batch size calculation.
The table states:
- Hardware: 2 GPUs
- Details: 8 per device, 4 grad accum steps
- Resulting Batch Size: 32
However, based on the standard calculation (Batch per Device * Num GPUs * Grad Accum Steps), the effective batch size should be: 8 * 2 * 4 = 64
Could you please clarify if the effective batch size is indeed 64 (and 32 is a typo), or if there is a different configuration for the batch size per device or accumulation steps?
Thank you!

Metadata
Metadata
Assignees
Labels
No labels