Conversation
Adds the GPS workflow (skill + toolset) to the single_cell team. Includes dataset search from CELLxGENE when no data is provided, algorithmic selection (SpaPROS, RF, scGeneFit), and biological curation steps.
JiwaniZakir
left a comment
There was a problem hiding this comment.
The 230-line workflow embedded directly in analysis_expert.md duplicates content that already lives in .pantheon/skills/omics/gene_panel_selection.md — the agent is explicitly told to re-read that skill file "before each major step," yet the full procedure is also inlined here. This creates two sources of truth: if the skill file is updated (e.g., adding a new method or changing the ARI threshold), the template won't reflect it, and the agent will receive conflicting instructions. The workflow content should either live entirely in the skill file (with the template just referencing it) or the skill file should be the authoritative source and this inline copy removed.
The hardcoded magic numbers scattered through the prompt — max_constraints <= 1000 for select_scgenefit, n_hvg < 3000 for select_spapros, the >5% ARI degradation threshold in Phase 2 — are invisible to callers and can't be overridden per-invocation. If these constraints need tuning for a specific dataset, the only option is editing the template. Surfacing them as parameters in the toolset schema or as explicit leader-context fields would make the pipeline more flexible and testable.
There's also a typo in the modified line around line 144: "revelant skill" should be "relevant skill", which is minor but worth fixing given this text is part of the agent's behavioral contract.
Summary
Files changed
gene_panel_selection.md— new skill file with the full workflowgene_panel_selection_tool.py— new toolset with three selection methodsanalysis_expert.md— added GPS workflow section and dataset search instructionsleader.md— added GPS MODE LOCK delegation logicSKILL.md— added reference to the GPS skill__init__.py— registered the GPS toolset