Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update _split_one_line and remove whitespace parameter #686

Merged
merged 6 commits into from
Nov 18, 2024

Conversation

xvlaurent
Copy link
Contributor

This is an attempt to update _split_one_line function to avoid usage of 'whitespace' parameter and approximation made for '_atom_site' CIF block.

Performance have to be approved with a CI run.

Copy link

codspeed-hq bot commented Oct 25, 2024

CodSpeed Performance Report

Merging #686 will degrade performances by 19.46%

Comparing Discngine:split_line_enhancement (c465aa5) with main (8fd73bf)

Summary

❌ 2 (👁 2) regressions
✅ 45 untouched benchmarks

Benchmarks breakdown

Benchmark main Discngine:split_line_enhancement Change
👁 benchmark_get_structure[cif-False] 62.4 ms 77.5 ms -19.46%
👁 benchmark_get_structure[cif-True] 124.6 ms 139.3 ms -10.5%

@padix-key
Copy link
Member

padix-key commented Oct 27, 2024

Hi, thanks for preparing this PR. The benchmarks show that this variant is indeed slightly faster than than the regex-based one. However is still about 40% slower than the fast expect_whitespace=False hack. This makes parsing a structure without bonds about 20% slower (as shown above).

Still, I tend towards accepting this change, because it is much cleaner and technically gives the correct result, while currently it does not. If high performance is desired, I would recommend using BinaryCIF anyway. @biotite-dev has anyone a strong opinion about it?

I probably need a few more days to review the code, but for now could you reformat the code with ruff? Currently the CI check is failing.

@padix-key
Copy link
Member

Fixes #673

@xvlaurent
Copy link
Contributor Author

I tryed to reduce/optimize the code a bit more, but it seems there is no real gain...

@padix-key
Copy link
Member

This is okay, a few other people (including me) already tried that unsuccessfully. After Rust code is introduced into the project (#688), this is probably the first function that is replaced by fast Rust routine.

The PDBx test is currently failing, as it does not expect a generator. Could you either revert to the non-generator variant or adapt the test?

@xvlaurent xvlaurent force-pushed the split_line_enhancement branch from fdafb18 to 33f1936 Compare November 12, 2024 09:15
@xvlaurent
Copy link
Contributor Author

Finally, all tests are fine and branch is up to date!

Copy link
Member

@padix-key padix-key left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only found this small typo. Otherwise the PR looks good to merge!

tests/database/test_rcsb.py Show resolved Hide resolved
# Loop over the line
while line:
# Strip leading whitespace(s)
striped_line = line.lstrip()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a small typo

Suggested change
striped_line = line.lstrip()
stripped_line = line.lstrip()

src/biotite/structure/io/pdbx/cif.py Outdated Show resolved Hide resolved
src/biotite/structure/io/pdbx/cif.py Outdated Show resolved Hide resolved
@xvlaurent
Copy link
Contributor Author

Typo fixed! Thanks.

@padix-key padix-key merged commit 77cf0f6 into biotite-dev:main Nov 18, 2024
27 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants