Skip to content

Conversation

jacobhinkle
Copy link
Collaborator

This fixes a bug where __post_init__ clobbered some values when loading dataclasses from json in the codediff tool. This corrupted the loaded code and diffs in the resulting objects.

@jacobhinkle jacobhinkle requested a review from xwang233 October 1, 2025 13:43
Copy link

github-actions bot commented Oct 1, 2025

Description

  • Fix dataclass deserialization in codediff tool

  • Prevent post_init from overwriting loaded values

  • Update fields to optional for proper JSON loading

  • Add early returns in post_init to skip logic during deserialization


Changes walkthrough 📝

Relevant files
Bug fix
codediff.py
Fix dataclass deserialization in codediff                               

tools/codediff/codediff.py

  • Changed fields in GitRev to optional with default None
  • Added early return in __post_init__ to skip logic when fields are
    already set
  • Updated CompiledKernel to skip __post_init__ during deserialization
  • Modified TestRun and other classes to use None defaults and prevent
    clobbering
  • +43/-20 

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🧪 No relevant tests
    ⚡ Recommended focus areas for review

    Possible Issue

    The condition if self.abbrev is not None in __post_init__ of GitRev may not reliably prevent clobbering during deserialization, since abbrev could be None even after deserialization if it was missing in the input JSON. This might lead to unintended subprocess execution.

    if self.abbrev is not None:
        # Avoid running __post_init__ during deserialization
        return
    Possible Issue

    In CompiledKernel.__post_init__, checking launch_params for skipping initialization assumes it remains non-null after deserialization, but if it was null in the JSON, the parsing logic will be skipped incorrectly, potentially leaving the object in an inconsistent state.

    if self.launch_params is not None:
        # Avoid running __post_init__ during deserialization
        return
    Possible Issue

    The get_kernel method sets kern.code only if it is None, but returns kern immediately if kern.code is already set. However, the check if kern.code is not None occurs after accessing self.kernel_map, which may fail if the map or kernel doesn't exist, leading to potential KeyError.

    if kern.code is not None:
        return kern

    @jacobhinkle
    Copy link
    Collaborator Author

    !build

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants