Skip to content

Support job-scoped custom CCD definitions for PTM inference#314

Open
taivu1998 wants to merge 1 commit into
bytedance:mainfrom
taivu1998:tdv/issue-245-user-ccd-ptm
Open

Support job-scoped custom CCD definitions for PTM inference#314
taivu1998 wants to merge 1 commit into
bytedance:mainfrom
taivu1998:tdv/issue-245-user-ccd-ptm

Conversation

@taivu1998
Copy link
Copy Markdown

Summary

Adds job-scoped custom CCD support for inference so users can provide custom PTM or ligand chemical components directly in the input JSON instead of editing the global Protenix CCD cache files.

This PR addresses #245.

Motivation

Issue #245 asks whether a user-generated PTM can be added to a specified residue at model input time. The existing workaround required mutating the shared CCD cache files, which is awkward for one-off inference jobs, hard to isolate between jobs, and easy to misconfigure.

Changes

  • Adds a per-job CCDProvider overlay that can load custom components from either:
    • userCCD: inline CCD mmCIF text
    • userCCDPath: a CCD mmCIF file path, resolved relative to the input JSON file
  • Threads that provider through inference-time polymer, ligand, ion, reference-feature, leaving-atom, mol-type, and canonical-residue-name logic.
  • Preserves existing global CCD behavior for jobs that do not provide custom CCD data.
  • Supports custom CCD PTMs via existing modification syntax such as ptmType: "CCD_<code>".
  • Keeps user-defined RDKit molecules scoped to the current job and passes them into geometry-feature generation when training-free guidance features are requested.
  • Updates the web-service request parser so top-level userCCD and userCCDPath are preserved in generated inference JSON.
  • Documents the new JSON fields and adds a minimal custom PTM example under examples/.
  • Adds focused tests for path-based CCD loading, inline CCD loading, invalid user CCD inputs, PTM position validation, polymer bond/leaving-atom behavior, canonical residue naming, reference features, and full feature generation with geometry features.

Validation

  • .venv-test/bin/python -m pytest tests/test_user_ccd_ptm.py -q
    • 7 passed
  • .venv-test/bin/python -m py_compile protenix/data/core/custom_ccd.py protenix/data/core/ccd.py protenix/data/core/parser.py protenix/data/inference/infer_dataloader.py protenix/data/inference/json_parser.py protenix/data/inference/json_to_feature.py protenix/web_service/colab_request_parser.py tests/test_user_ccd_ptm.py
  • uvx ruff check --select E9,F63,F7,F82 protenix/data/core/custom_ccd.py protenix/data/core/ccd.py protenix/data/core/parser.py protenix/data/inference/infer_dataloader.py protenix/data/inference/json_parser.py protenix/data/inference/json_to_feature.py protenix/web_service/colab_request_parser.py tests/test_user_ccd_ptm.py
  • uvx ruff check --select F401,F821,E9 protenix/data/core/custom_ccd.py protenix/data/core/ccd.py protenix/data/core/parser.py protenix/data/inference/infer_dataloader.py protenix/data/inference/json_parser.py protenix/data/inference/json_to_feature.py protenix/web_service/colab_request_parser.py tests/test_user_ccd_ptm.py
  • git diff --check

@taivu1998 taivu1998 marked this pull request as ready for review May 11, 2026 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant