Fix _set_wandb_writer serialization issues #1806
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Fix serialization issues in
_set_wandb_writer
function that can cause failures when passing complex argument configurations to wandb.init().Problem
The current implementation directly passes the
args
namespace to wandb as configuration, which can fail when the args contain non-serializable objects such as:bytes
objectstorch.Tensor
instancesThis leads to serialization errors during wandb initialization, preventing proper logging functionality.
Solution
Added a comprehensive sanitization function
_clean()
that:bytes
,type
, and callable objects from the configurationtorch.Tensor
and numpy arrays to lists using.tolist()
repr()
as a last resort for any remaining problematic objectsChanges Made
_set_wandb_writer()
function inmegatron/training/global_vars.py
_clean()
helper function for recursive sanitizationTesting
Type of Change
Impact
This fix ensures that wandb logging works reliably across different training configurations, especially when using complex argument setups that may include tensors, custom types, or other non-serializable objects. The change is minimal and focused, reducing the risk of introducing new issues while solving a real-world problem that can prevent proper experiment tracking.
Related Issues
Fixes potential
TypeError
andValueError
exceptions during wandb initialization when args contain non-serializable objects.