-
Notifications
You must be signed in to change notification settings - Fork 31
Vectorize O(N^2) bond feature computation with NumPy #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,272 @@ | ||||||
| """Test vectorized bond features computation. | ||||||
|
|
||||||
| The original implementation used O(N_token^2) nested Python loops to check | ||||||
| atom-level bonds between every token pair. For a 1000-token target, that's | ||||||
| ~500K iterations in pure Python. | ||||||
|
|
||||||
| The vectorized version uses NumPy advanced indexing on the atom adjacency | ||||||
| matrix, mapping bonded atom pairs directly to their token indices. This | ||||||
| eliminates the inner loops entirely. | ||||||
|
|
||||||
| Expected: ~50-100x speedup on the bond feature computation. | ||||||
|
||||||
| Expected: ~50-100x speedup on the bond feature computation. | |
| Expected: substantial (often order-of-magnitude) speedup on the bond feature computation. |
Copilot
AI
Mar 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The module imports torch but never uses it. Removing the unused import will avoid unnecessary dependency loading and keeps the test file clean.
| import torch |
Copilot
AI
Mar 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 94 is over-indented relative to the function body (8 spaces instead of 4), which will raise an IndentationError and prevent the test module from importing. Align the # Inline constants ... comment and subsequent constants with the function indentation.
| # Inline constants to avoid importing pxdesign.data.constants (requires rdkit) | |
| # Inline constants to avoid importing pxdesign.data.constants (requires rdkit) |
Copilot
AI
Mar 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 149 is over-indented relative to the function body (8 spaces instead of 4), which will raise an IndentationError and prevent the test module from importing. Align the # Inline constants ... comment and subsequent constants with the function indentation.
Copilot
AI
Mar 3, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_vectorized_speedup() asserts a wall-clock speed comparison, which is often flaky across CI hardware/load and can cause nondeterministic failures. Consider marking this as a benchmark/slow test (skipped by default) or rewriting it to validate correctness only (keeping performance measurement out of the pass/fail criteria).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the design-token exclusion behavior: the previous loop skipped bonds only when the outer-loop token was a design residue, but the new vectorized filter drops any bond where either endpoint is a design token. If this semantic change is intentional, it should be called out in the PR description and ideally covered by a regression test against
Featurizer.get_bond_features(); if it’s meant to be a pure vectorization, adjust the filter to preserve the prior behavior.