Skip to content

Add gene panel selection pipeline#48

Open
Rwin2 wants to merge 1 commit intomainfrom
gps-clean
Open

Add gene panel selection pipeline#48
Rwin2 wants to merge 1 commit intomainfrom
gps-clean

Conversation

@Rwin2
Copy link
Copy Markdown
Collaborator

@Rwin2 Rwin2 commented Mar 28, 2026

Summary

  • Gene panel selection skill (Steps 0-6) and toolset (SpaPROS, Random Forest, scGeneFit)
  • Automatic dataset retrieval from CELLxGENE Census when no dataset is provided
  • GPS MODE LOCK in leader to delegate the full pipeline to analysis_expert
  • Works for any biological context (not tied to specific tissues or diseases)

Files changed

  • gene_panel_selection.md — new skill file with the full workflow
  • gene_panel_selection_tool.py — new toolset with three selection methods
  • analysis_expert.md — added GPS workflow section and dataset search instructions
  • leader.md — added GPS MODE LOCK delegation logic
  • SKILL.md — added reference to the GPS skill
  • __init__.py — registered the GPS toolset

Adds the GPS workflow (skill + toolset) to the single_cell team.
Includes dataset search from CELLxGENE when no data is provided,
algorithmic selection (SpaPROS, RF, scGeneFit), and biological curation steps.
Copy link
Copy Markdown

@JiwaniZakir JiwaniZakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 230-line workflow embedded directly in analysis_expert.md duplicates content that already lives in .pantheon/skills/omics/gene_panel_selection.md — the agent is explicitly told to re-read that skill file "before each major step," yet the full procedure is also inlined here. This creates two sources of truth: if the skill file is updated (e.g., adding a new method or changing the ARI threshold), the template won't reflect it, and the agent will receive conflicting instructions. The workflow content should either live entirely in the skill file (with the template just referencing it) or the skill file should be the authoritative source and this inline copy removed.

The hardcoded magic numbers scattered through the prompt — max_constraints <= 1000 for select_scgenefit, n_hvg < 3000 for select_spapros, the >5% ARI degradation threshold in Phase 2 — are invisible to callers and can't be overridden per-invocation. If these constraints need tuning for a specific dataset, the only option is editing the template. Surfacing them as parameters in the toolset schema or as explicit leader-context fields would make the pipeline more flexible and testable.

There's also a typo in the modified line around line 144: "revelant skill" should be "relevant skill", which is minor but worth fixing given this text is part of the agent's behavioral contract.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants