Skip to content

(by claude) Fix PDB sequence clusters: migrate off deprecated RCSB bc-*.out files#2239

Merged
jamesmkrieger merged 2 commits into
prody:mainfrom
jamesmkrieger:fix/pdbclusters-rcsb-entity-clusters
Jun 19, 2026
Merged

(by claude) Fix PDB sequence clusters: migrate off deprecated RCSB bc-*.out files#2239
jamesmkrieger merged 2 commits into
prody:mainfrom
jamesmkrieger:fix/pdbclusters-rcsb-entity-clusters

Conversation

@jamesmkrieger

Copy link
Copy Markdown
Contributor

Fixes #2238 (discovered and written by claude too)

RCSB removed the legacy bc-{sqid}.out cluster files (now HTTP 404), so fetchPDBClusters retrieved no usable data and a subsequent loadPDBClusters crashed in os.path.getmtime with FileNotFoundError.

  • fetchPDBClusters: download from the current clusters-by-entity-{sqid}.txt endpoint, and refuse to save empty bodies or HTML error pages (so a failed download is never reported as success).
  • loadPDBClusters: re-check the cache file after fetching and raise a clear IOError instead of crashing in getmtime; parse on the last underscore so identifiers containing underscores (computed structure models, e.g. AF_AFP12345F1) still yield clean (identifier, entity) pairs.
  • listPDBCluster: the data now clusters by polymer entity rather than chain, so the second argument is an entity id; update matching and docstrings.
  • Add tests for fetch/list and the failed-download error path.

RCSB removed the legacy bc-{sqid}.out cluster files (now HTTP 404), so
fetchPDBClusters retrieved no usable data and a subsequent loadPDBClusters
crashed in os.path.getmtime with FileNotFoundError.

- fetchPDBClusters: download from the current clusters-by-entity-{sqid}.txt
  endpoint, and refuse to save empty bodies or HTML error pages (so a failed
  download is never reported as success).
- loadPDBClusters: re-check the cache file after fetching and raise a clear
  IOError instead of crashing in getmtime; parse on the last underscore so
  identifiers containing underscores (computed structure models, e.g.
  AF_AFP12345F1) still yield clean (identifier, entity) pairs.
- listPDBCluster: the data now clusters by polymer entity rather than chain, so
  the second argument is an entity id; update matching and docstrings.
- Add tests for fetch/list and the failed-download error path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
In a fresh environment prody's package path is unset, so getPackagePath()
(reached via fetchPDBClusters) prompts with input(); under pytest's captured
output this raised "OSError: reading from stdin while output is captured!".
Point the package path at the writable test TEMPDIR in setUpModule and restore
it in tearDownModule.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@karolamik13 karolamik13 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested it. Everything is working fine.

@jamesmkrieger jamesmkrieger merged commit 8db982e into prody:main Jun 19, 2026
6 checks passed
@jamesmkrieger jamesmkrieger deleted the fix/pdbclusters-rcsb-entity-clusters branch June 19, 2026 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug report: fetchPDBClusters broken — RCSB deprecated the legacy bc-*.out cluster files

2 participants