Skip to content

Conversation

mlgill
Copy link
Collaborator

@mlgill mlgill commented Aug 19, 2025

PR for tracking efforts to validate task -- for review by @polinabinder1 only.

mlgill and others added 30 commits June 20, 2025 11:25
* WIP data stuff

* Types cleanup

* First pass of datasets proposal

* Cleanup of dataset files

* All tasks except perturbation prediction

* Updates

* Add simple caching method

* Update note

* Ruff

* Remove datatype enum and datatypespec

* Add simple Embedding type

* Remove unload_data

* WIP decoupling all anndata references

* Move random seed to top level constants file

* Type conversion

* Fix imports, purge DataType

* More import purges

* Add back some base dataset validation

* Base dataset tests pass

* Single cell datasets pass

* Dataset tests all pass

* Working on task tests

* WIP task tests

* WIP

* More cleanup

* Note

* One task test bug fixed

* WIP

* WIP

* One more test fixed, two minor test issues remain

* Merge geneexpression type with embedding

* Improve arg name and error messages

* Fixing function calls

* Cross species test fixed, one test remains

* WIP perturb prediction task

* Remove unused conftest from tasks

* Oops

* Update src/czbenchmarks/tasks/base.py

Co-authored-by: Andrew Tolopko <[email protected]>

* Add cli todo

* FIXME for task.display_name

* Update type to CellRepresentation

* Improve error message in perturbseq dataset

* Validation for multiple organisms

* Bug fix

* Remove debugging code

* Add a fixme note

* Perturbation test passes ... finally! 🥳🫠

* Cleanup task inputs

* Unused fixtures

* Final changes

* Add fixme

---------

Co-authored-by: Andrew Tolopko <[email protected]>
* datasets.utils.load_dataset complete

* reuse existing s3 remote download code

* cache implemented. created file_utils with cache support


---------

Co-authored-by: atolopko <[email protected]>
* BaseDataset updates
- rename to Dataset
- new scaffolding to serialize task-specific inputs from dataset:
  - constructor takes a dataset_type_name param, used for output dir path during serialization
  - constructor requires a Path type (not str)
  - replace cache_data() with store_task_inputs()

* Change single cell dataset class hierarchy
- SingleCellDataset is now base class of SingleCellLabeledDataset and SingleCellPerturbationDataset.
- These two subclasses are now independent and enforce different schemas on the AnnData object.
- SingleCellPerturbationDataset no longer requires "cell_type" (label) column in AnnData object.
- SingleCellLabeledDataset takes label_column_key parameter in constructor, defaulting to "cell_type"
- One dataset class per module

* Dataset validation refactoring and test refactoring
- move dataset path validation to constructor, as it is too late to check this in _validate()
- pull up common validations from SingleCellLabeledDataset into SingleCellDataset
- have concrete dataset class tests all run parent class tests, to ensure concrete classes conform to the parent class validations
- wrap dataset tests into test classes to support this test inheritence scheme

* Update example.py

* Initial updates to datasets docs (wip)

* Types refactoring
- move CellRepresentation to czb.tasks.types
- move ListLike to czb.types
- Remove "Base" prefixes from BaseTask → Task and validator classes
- Rename BaseDatasetValidator → DatasetValidator and BaseSingleCellLabeledValidator → SingleCellLabeledValidator
- Rename base.py → task.py to match the Task class name
- Update all imports and references throughout the codebase
- Add Task export to package-level init.py for easier importing
- Update documentation to reflect all changes
- Regenerate autoapi documentation
* Remove asserts

* Fix logic

* REVERT ME: remove excess tests

* Pydantic refactor: base class and clustering

* Pydantic embedding task

* Pydantic metadata integration

* Pydantic metadata prediction

* Pydantic cross species

* Pydantic perturbation

* Clean up imports

* Revert "REVERT ME: remove excess tests"

This reverts commit 09ab74f.

* Remove var from clustering task

* Change set_baseline -> compute_baseline

* Move all task display_name instance variables to class attribute

* Remove outdated comment about cache PR

* Evaluate changing ListLike type hints to collections.abc.Sequence

* Fix type issue

* Comment clean up

* pydantic class for task output

* Lint code

* Fix cli issue

* Fix accidental change

* update test readme

* feature: datasets write task inputs to files (#299)

* BaseDataset updates
- rename to Dataset
- new scaffolding to serialize task-specific inputs from dataset:
  - constructor takes a dataset_type_name param, used for output dir path during serialization
  - constructor requires a Path type (not str)
  - replace cache_data() with store_task_inputs()

* Change single cell dataset class hierarchy
- SingleCellDataset is now base class of SingleCellLabeledDataset and SingleCellPerturbationDataset.
- These two subclasses are now independent and enforce different schemas on the AnnData object.
- SingleCellPerturbationDataset no longer requires "cell_type" (label) column in AnnData object.
- SingleCellLabeledDataset takes label_column_key parameter in constructor, defaulting to "cell_type"
- One dataset class per module

* Dataset validation refactoring and test refactoring
- move dataset path validation to constructor, as it is too late to check this in _validate()
- pull up common validations from SingleCellLabeledDataset into SingleCellDataset
- have concrete dataset class tests all run parent class tests, to ensure concrete classes conform to the parent class validations
- wrap dataset tests into test classes to support this test inheritence scheme

* Update example.py

* Initial updates to datasets docs (wip)

* Types refactoring
- move CellRepresentation to czb.tasks.types
- move ListLike to czb.types

* Fix merge issue. Tests pass.

* Rename in progress

* Fix imports

* Fix list type

* Fixes and cleanup

* Removed metricinput dataclass

* Formatting

* AutoAPI build issue from a different PR.

* feat: Update example.py to use new task input classes (#319)

* rename BaseTask to Task and add to package exports

- Rename BaseTask class to Task in tasks/base.py
- Update all imports and inheritance references from BaseTask to Task
- Add Task to tasks/__init__.py exports for easier importing

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

* update documentation to reflect BaseTask → Task refactoring

- Update all references from BaseTask to Task in developer guides
- Fix broken links in datasets documentation
- Regenerate autoapi documentation to reflect code changes
- Update how-to guides and API reference documentation

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

* rename validator classes to remove Base prefixes

- Rename BaseDatasetValidator to DatasetValidator
- Rename BaseSingleCellLabeledValidator to SingleCellLabeledValidator
- Rename files base_dataset_validator.py to dataset_validator.py
- Rename files base_single_cell_validator.py to single_cell_validator.py
- Update all imports and references throughout codebase
- Update package exports in validators/__init__.py
- All tests passing (38 passed, 2 skipped)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

* rename base.py to task.py to match the Task class

- Rename base.py to task.py in tasks directory
- Update all imports from .base to .task throughout codebase
- Update imports in tasks/__init__.py, cli/types.py, tests/utils.py
- Update imports in all task implementation files
- All tests passing (48 passed, 2 skipped, 14 deselected)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

* update documentation for base.py to task.py rename

- Update developer_guides/tasks.md to reference task.py instead of base.py
- Update how_to_guides/add_new_task.md to reference task.py instead of base.py
- Update import statements from tasks.base to tasks.task
- Update file path references from base.py to task.py
- Regenerate autoapi documentation to reflect new file structure
- All task classes now correctly inherit from czbenchmarks.tasks.task.Task

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

* Update example.py to use new TaskInput structure

- Add imports for TaskInput classes from all three tasks
- Replace old task_kwargs/metric_kwargs pattern with proper TaskInput objects
- Update ClusteringTask to use ClusteringTaskInput
- Update EmbeddingTask to use EmbeddingTaskInput
- Update MetadataLabelPredictionTask to use MetadataLabelPredictionTaskInput
- Combine all results into single dictionary and output as formatted JSON

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

* use underscore for unused task input param names

* add more tasks to example.py

* feat: Remove Base prefixes from classes and rename files (#318)

- Remove "Base" prefixes from BaseTask → Task and validator classes
- Rename BaseDatasetValidator → DatasetValidator and BaseSingleCellLabeledValidator → SingleCellLabeledValidator
- Rename base.py → task.py to match the Task class name
- Update all imports and references throughout the codebase
- Add Task export to package-level init.py for easier importing
- Update documentation to reflect all changes
- Regenerate autoapi documentation

* ruff

---------

Co-authored-by: Claude <[email protected]>

---------

Co-authored-by: Andrew Tolopko <[email protected]>
Co-authored-by: Claude <[email protected]>
Created 2 new example notebooks for bring your own model use case.
- Basic use with model output embeddings
- Use with external model
- Use with external model with fine tuning
…umentation (#332)

Updated doc strings for accuracy and better API documentation.
- Add regression tests for clustering, embedding, label prediction, and cross-species tests (perturbation skipped, since not validated by compbio). Uses real, pre-generated model embeddings as input to tasks, comparing to real, past results that have been validated.
- Add end-to-end integration test (loading dataset and running tasks)
- Uses @pytest.mark.integration for selective test execution
- Update example.py to use new TaskInput structure with proper baseline workflow
- update test README
- ignore cli for code coverage
…e changes (#338)

* added new test cases. Validated existing test cases for new code changes
- refactor and reorganize cli code
- disabled "run" command, implementation for running tasks on model outputs is partially implemented; running tasks will only be supported via Python code usage for now
- registry based cli
- removed unused file_cache.py module.
- fancy output for cli list command
* docs: docs updated for byomodel release
Example Notebooks for using czbenchmarks
…s for new perturb seq task (#364)

* Remove other perturb datasets and split key

* Temporarily deprecate older perturbseq datasets

* Ready to start testing

* Dataset directory update

* Debugging

* Dataset load_data runs to completion

* nit

* Formatting and improving pandas efficiency

* Update

* Fix utils tests

* WIP tests

* Additional working test

* No more split columns

* store_task_inputs works

* 10 genes

* Linting

* Fixing test

* Joblib runs

* Remove dual condition, validation works

* Tests pass

* Linting

* Unused imports causing linter failures

* Add example script

* update

* nit

* Docstring updates for errors

* Fix dimension

* Fixed docstring formatting in API

* Remove joblib since it doesn't accelerate results

* nit

* Add gene masking as an input param

* Cleanup joblib imports

* Update filters

* WIP updates to stored inputs

* Fixed tests

* Clean up FIXMEs

* Bug fix and expose min_de_genes

* Fix

* Expose them all

* FIXME

* Fix path

* Fix test

* Filter de_results

* Commit equivalency test to this PR

* equivalency test updates

* Add filter for DE, fixes comparison

* Feedback from Jasleen, Part 1/2

* control cells list order not being preserved

* Add ability to load in backed mode

* Ready for benchmarking

* default wilcoxon

* Fixed

* Cleanup

* Formatting

* Ruff

* Hard code parameters

* Add test

* Improve tests

* Bug fix

* gene --> condition

* Ruff

* Remove testing files, cleanup single cell dataset class comments

* Clean up utils

* Updates

* Requested comment

* Ruff

* Ruff

* Update variable name

* Fix tests

* uv
Base automatically changed from feat/perturbation_task to v1.0-byomodel August 23, 2025 00:09
Base automatically changed from v1.0-byomodel to main September 16, 2025 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants