Skip to content

Support direct CIF/mmCIF template inputs for inference#312

Open
taivu1998 wants to merge 1 commit into
bytedance:mainfrom
taivu1998:tdv/issue-263-custom-cif-template
Open

Support direct CIF/mmCIF template inputs for inference#312
taivu1998 wants to merge 1 commit into
bytedance:mainfrom
taivu1998:tdv/issue-263-custom-cif-template

Conversation

@taivu1998
Copy link
Copy Markdown

Summary

Adds inference support for directly supplied local CIF/mmCIF template files, addressing issue #263.

Users can now set proteinChain.templatesPath to a .cif or .mmcif file and optionally provide proteinChain.templateChainId when the file contains multiple protein chains. The direct structure is parsed, chain-selected, aligned to the query sequence, and converted into the existing template feature contract used by searched .a3m/.hhr templates and embedded JSON templates.

Root Cause

The inference template dispatcher only accepted embedded template JSON plus search-derived .a3m and .hhr files. A user-supplied CIF/mmCIF structure path fell through to Unsupported template format, and preprocessing could silently replace missing explicit template paths with automatic template-search output instead of honoring the user's intended custom template.

Changes

  • Added direct CIF/mmCIF parsing and featurization through TemplateHitFeaturizer.parse_cif_template.
  • Extended simplified CIF parsing to retain all protein-like chains and support optional chain filtering.
  • Added shared template path/suffix helpers so direct CIF/mmCIF and embedded JSON templates do not require a local PDB mmCIF mirror.
  • Wired inference template dispatch for .cif and .mmcif paths with clear errors for ambiguous or missing chains.
  • Made preprocessing fail loudly when an explicit templatesPath is missing instead of silently replacing it.
  • Updated CLI/docs for .cif, .mmcif, .json, .a3m, .hhr, and templateChainId.
  • Added focused tests for direct CIF parsing, chain selection, alignment, preprocessing behavior, and mmCIF-dir gating.

Validation

  • uvx ruff check protenix/data/template/template_parser.py protenix/data/template/template_utils.py protenix/data/template/template_featurizer.py protenix/data/template/template_path.py protenix/data/inference/infer_dataloader.py runner/template_search.py runner/batch_inference.py tests/test_custom_cif_templates.py
  • python3.11 -m py_compile protenix/data/template/template_parser.py protenix/data/template/template_utils.py protenix/data/template/template_featurizer.py protenix/data/template/template_path.py protenix/data/inference/infer_dataloader.py runner/template_search.py runner/batch_inference.py tests/test_custom_cif_templates.py
  • git diff --check
  • uvx --python 3.11 --with pytest --with numpy --with rdkit --with biopython --with biotite --with requests --with typing-extensions pytest tests/test_custom_cif_templates.py -q (9 passed)
  • uvx --python 3.11 --with pytest --with numpy --with rdkit --with biopython --with biotite --with requests --with typing-extensions pytest tests/test_custom_cif_templates.py tests/test_json_template_parser.py tests/test_fetch_remote_cif.py -q (19 passed)

@taivu1998 taivu1998 marked this pull request as ready for review May 11, 2026 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant