Skip to content

Add native score-only confidence CLI for existing structures#311

Open
taivu1998 wants to merge 1 commit into
bytedance:mainfrom
taivu1998:tdv/issue-227-scoring-only
Open

Add native score-only confidence CLI for existing structures#311
taivu1998 wants to merge 1 commit into
bytedance:mainfrom
taivu1998:tdv/issue-227-scoring-only

Conversation

@taivu1998
Copy link
Copy Markdown

Summary

Adds a native score-only workflow for Protenix so users can evaluate existing PDB/mmCIF coordinates with the model trunk and confidence heads without running diffusion sampling.

This addresses #227.

Motivation

Issue #227 asks whether Protenix can be used in a scoring-only mode, especially for complexes. The existing inference path always samples new coordinates through diffusion, while the confidence head already accepts predicted coordinates. This PR exposes that capability through a narrow, native workflow instead of requiring users to rely on a pinned external fork.

Changes

  • Adds fixed-coordinate inference support in Protenix.forward() and _main_inference_loop() via score_coordinates.
  • Adds InferenceRunner.score() to run the trunk, distogram, confidence head, and existing confidence summary code on supplied atom coordinates.
  • Adds protenix score CLI support for PDB/mmCIF inputs and directories.
  • Adds structure parsing and coordinate remapping utilities that:
    • preserve source chain IDs when possible through generated JSON id fields,
    • map source coordinates into Protenix featurized atom order,
    • detect duplicate atom keys and missing atoms,
    • support configurable missing-atom fallback policies.
  • Writes score outputs:
    • summary.csv,
    • <sample>/summary_confidence.json,
    • <sample>/chain_id_map.json,
    • optional <sample>/full_confidence.json,
    • optional <sample>/scored.cif,
    • failure records when applicable.
  • Documents protenix score in the README and training/inference instructions.
  • Adds focused unit tests for structure file collection, chain ID preservation, chain mapping, and coordinate remapping.

Design Notes

The implementation deliberately keeps the first version small:

  • It reuses the existing inference preprocessing path for MSA, template, and RNA MSA handling.
  • It reuses the existing confidence summary implementation instead of adding separate scoring metrics.
  • It skips diffusion only when fixed coordinates are explicitly provided.
  • It does not add role-aware MSA caches, target/binder-specific outputs, ipSAE-style metrics, or multi-model scoring in this PR.

Validation

Ran:

git diff --cached --check
uvx ruff check protenix/model/protenix.py runner/inference.py protenix/data/inference/structure_scoring.py runner/scoring.py runner/batch_inference.py tests/test_structure_scoring.py
python3 -m py_compile protenix/model/protenix.py runner/inference.py protenix/data/inference/structure_scoring.py runner/scoring.py runner/batch_inference.py tests/test_structure_scoring.py
uv run --python 3.11 ... python -m pytest tests/test_structure_scoring.py

Results:

  • Ruff passed.
  • Python compilation passed.
  • tests/test_structure_scoring.py passed: 7 tests.
  • Verified the Click group exposes score --help with LAYERNORM_TYPE=torch.

Local Environment Notes

I did not run a full model-backed protenix score on an example structure locally because this machine is missing the Protenix CCD runtime data file at /Users/vuductai/common/components.cif, which blocks the repo's normal parser path. On this macOS host, CLI import smoke also requires LAYERNORM_TYPE=torch to avoid the existing CUDA fused-layernorm compile path.

@taivu1998 taivu1998 marked this pull request as ready for review May 11, 2026 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant