Refactor cluster scripts to use GitHub sync instead of rsync #36

jeremymanning · 2025-11-07T12:04:37Z

Summary

Replaces rsync-based code syncing with GitHub-based workflow for all cluster scripts.

Changes

Created: `scripts/cluster/setup_cluster.sh`

Centralized idempotent setup function for all cluster scripts
Clones/updates repo from ContextLab/giblet-responses (upstream)
Downloads Sherlock dataset if needed (11GB, checks for 15+ .nii.gz files)
Creates conda environment giblet-py311 if missing
Installs dependencies from requirements.txt
All operations are re-runnable and safe (idempotent)
Color-coded output with clear success/warning/error messages

Updated: `scripts/cluster/remote_train.sh`

Removed: All rsync operations for code syncing
Added: GitHub sync via setup_cluster_environment() function call
Kept: All existing functionality (screen sessions, training, monitoring, etc.)

Updated: `scripts/cluster/remote_evaluate.sh`

Added: GitHub sync step (new Step 1) before evaluation
Kept: rsync for syncing reconstruction results BACK to local machine (correct - results are too large for git)
Updated: Step numbering in output messages

Benefits of GitHub Sync

Version Control: All code changes tracked in git history
Atomic Updates: Hard reset ensures clean state every time
No Conflicts: Untracked files automatically cleaned
Faster: No SSH file transfer needed
Reliable: GitHub is single source of truth
Reproducible: Exact same code on local and remote

Usage

Before running cluster scripts, always commit and push to origin:

git add .
git commit -m "Update training code"
git push origin main
# Wait for merge to upstream

Then run cluster scripts as normal:

# Training
./scripts/cluster/remote_train.sh --cluster tensor02 --gpus 6

# Evaluation
./scripts/cluster/remote_evaluate.sh --cluster tensor02 --checkpoint path/to/checkpoint.pt

Testing

Scripts have been tested locally and are ready for cluster deployment once merged. After merge, user will test on actual cluster.

Related Issues

Closes #33
Closes #34

Generated with Claude Code

Co-Authored-By: Claude [email protected]

- Created scripts/cluster/setup_cluster.sh: - Centralized idempotent setup function for all cluster scripts - Clones/updates repo from ContextLab/giblet-responses (upstream) - Downloads Sherlock dataset if needed - Sets up conda environment (giblet-py311) - Installs dependencies from requirements.txt - All operations are re-runnable and safe - Updated scripts/cluster/remote_train.sh: - Removed rsync code syncing - Added GitHub sync via setup_cluster_environment() - Keeps all existing functionality (screen, training, etc.) - Updated scripts/cluster/remote_evaluate.sh: - Added GitHub sync step before evaluation - Keeps rsync for syncing results BACK to local machine (correct) - Updated step numbers in output Benefits: - Version control: All changes tracked in git - Atomic updates: Hard reset ensures clean state - No conflicts: Untracked files cleaned automatically - Faster: No SSH file transfer needed - Reproducible: Same code on local and remote Closes ContextLab#33, Closes ContextLab#34 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

jeremymanning merged commit ce1d685 into ContextLab:main Nov 7, 2025
0 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor cluster scripts to use GitHub sync instead of rsync #36

Refactor cluster scripts to use GitHub sync instead of rsync #36

Uh oh!

jeremymanning commented Nov 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Refactor cluster scripts to use GitHub sync instead of rsync #36

Refactor cluster scripts to use GitHub sync instead of rsync #36

Uh oh!

Conversation

jeremymanning commented Nov 7, 2025

Summary

Changes

Created: scripts/cluster/setup_cluster.sh

Updated: scripts/cluster/remote_train.sh

Updated: scripts/cluster/remote_evaluate.sh

Benefits of GitHub Sync

Usage

Testing

Related Issues

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Created: `scripts/cluster/setup_cluster.sh`

Updated: `scripts/cluster/remote_train.sh`

Updated: `scripts/cluster/remote_evaluate.sh`