-
Notifications
You must be signed in to change notification settings - Fork 306
Attribution Targets Encapsulation #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 11 commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
2149aa0
Release: 2025-10-21
hannamw b6bf15f
A proposed API enhancement (`AttributionTargets`) that encapsulates t…
speediedan 2b52ba6
slight clarification in a couple comments based on copilot review
speediedan 418b17f
Allow `offload_modules` to handle single module and container offload…
speediedan 45d6eee
use a smaller non-default batch_size for test_gemma_2_2b to expand th…
speediedan 3d8264a
initial changes to adapt original `AttributionTargets` encapsulation …
speediedan 32f79a4
Merge remote-tracking branch 'upstream/main' into attribution-targets
speediedan f90a7a4
Add pytest markers for long-running and high memory tests; adjust bat…
speediedan af101cf
Update unembedding matrix handling to auto-detect backend-variant ori…
speediedan 6f93ef4
Merge branch 'main' into attribution-targets
speediedan 2f8eb2b
minor type fix, clarify current vram gating mark
speediedan 6b76d08
adds integration tests, refactors proposed interface incorporating PR…
speediedan c8b4425
remove marks, add vram skipif conditions
speediedan f4ad1e9
revert comment hunks to submit in a separate PR
speediedan 787cf18
adjust `test_custom_target_correctness` to adopt our standard attribu…
speediedan 7ea7f14
very rough exploratory draft of attribution_targets_demo.ipynb to sol…
speediedan be7654d
updates to attribution_targets_demo.ipynb including:
speediedan 440f097
cleanup language and formatting, prettify with banner
speediedan 8219073
streamlined serial backend testing with `models_cpu` fixture and `cle…
speediedan 100af4d
restructured the demo to lead with the simpler target modes and extra…
speediedan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better here to just commit to a format! i.e. input either None, a Sequence of strs, or a sequence of fully specified attribution targets. This cuts down on the number of cases / potential confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! Supporting the mixed-mode input is probably not worth the marginal complexity and potential user confusion. I've dropped the mixed-mode
list[int | str | tuple]entirely.The revised interface for
attribution_targetsI'm proposing now is:Sequence[str] | Sequence[TargetSpec] | torch.Tensor | NoneSequence[str]— token strings (AttributionTargets handles tokenization, we warn on multi-token strings and take the final token, lmk if you prefer we error instead or only take the final token if a bool flag is set)Sequence[TargetSpec]— fully specified custom targets with explicit token_str, probability, and unembed vectors. I thought it was worth making the target specification a named tuple for clarity. We still accept regular tuples and create named tuples for the user if preferred but keep the signature readable with this alias:TargetSpec = CustomTarget | tuple[str, float, torch.Tensor]torch.Tensor— My reasoning in accepting thetorch.Tensorinput is that for it's convenient for downstream consumers that have pre-tokenized IDs they'd like to attribute without roundtripping through strings. The world model analysis framework I'm building does this, invoking circuit-tracer'sattributevia composable operations with pre-tokenized tensor inputs. If you feel strongly about this we can drop it and require users to convert toSequence[str]orSequence[TargetSpec]but I think it's a nice-to-have to allow greater breadth of downstream use cases and doesn't add much complexity.None— auto-select salient logitsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds okay to me - I recognize that having multiple input formats could be convenient (even though I would like to cut down on the number of formats). Re: warning vs. erroring, I'd prefer to error here. If anything, I think I'd want to take the first token, but erroring still seems best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! Changed our multi-token string handling from
warnings.warn()toraise ValueError().