Add gene panel selection pipeline by Rwin2 · Pull Request #48 · aristoteleo/PantheonOS

Rwin2 · 2026-03-28T07:53:05Z

Summary

Gene panel selection skill (Steps 0-6) and toolset (SpaPROS, Random Forest, scGeneFit)
Automatic dataset retrieval from CELLxGENE Census when no dataset is provided
GPS MODE LOCK in leader to delegate the full pipeline to analysis_expert
Works for any biological context (not tied to specific tissues or diseases)

Files changed

gene_panel_selection.md — new skill file with the full workflow
gene_panel_selection_tool.py — new toolset with three selection methods
analysis_expert.md — added GPS workflow section and dataset search instructions
leader.md — added GPS MODE LOCK delegation logic
SKILL.md — added reference to the GPS skill
__init__.py — registered the GPS toolset

Adds the GPS workflow (skill + toolset) to the single_cell team. Includes dataset search from CELLxGENE when no data is provided, algorithmic selection (SpaPROS, RF, scGeneFit), and biological curation steps.

JiwaniZakir

The 230-line workflow embedded directly in analysis_expert.md duplicates content that already lives in .pantheon/skills/omics/gene_panel_selection.md — the agent is explicitly told to re-read that skill file "before each major step," yet the full procedure is also inlined here. This creates two sources of truth: if the skill file is updated (e.g., adding a new method or changing the ARI threshold), the template won't reflect it, and the agent will receive conflicting instructions. The workflow content should either live entirely in the skill file (with the template just referencing it) or the skill file should be the authoritative source and this inline copy removed.

The hardcoded magic numbers scattered through the prompt — max_constraints <= 1000 for select_scgenefit, n_hvg < 3000 for select_spapros, the >5% ARI degradation threshold in Phase 2 — are invisible to callers and can't be overridden per-invocation. If these constraints need tuning for a specific dataset, the only option is editing the template. Surfacing them as parameters in the toolset schema or as explicit leader-context fields would make the pipeline more flexible and testable.

There's also a typo in the modified line around line 144: "revelant skill" should be "relevant skill", which is minor but worth fixing given this text is part of the agent's behavioral contract.

Add gene panel selection pipeline

8c92637

Adds the GPS workflow (skill + toolset) to the single_cell team. Includes dataset search from CELLxGENE when no data is provided, algorithmic selection (SpaPROS, RF, scGeneFit), and biological curation steps.

JiwaniZakir reviewed Mar 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gene panel selection pipeline#48

Add gene panel selection pipeline#48
Rwin2 wants to merge 1 commit intomainfrom
gps-clean

Rwin2 commented Mar 28, 2026

Uh oh!

JiwaniZakir left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Rwin2 commented Mar 28, 2026

Summary

Files changed

Uh oh!

JiwaniZakir left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants