Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError when trying to run online MapMyCells mouse brain taxonomy programmatically #26

Open
Winterwind opened this issue Nov 4, 2024 · 1 comment

Comments

@Winterwind
Copy link

Hi there,
First off, I'd like to say thank you for making this wonderful tool!

I'm having an issue where I am trying to run MapMyCells using the code from this repo, and I am trying to do so with the online 10x Whole Mouse Brain taxonomy (CCN20230722) that is used on the MapMyCells website. I am also trying to do so programmatically so I don't have to use the otherwise unwieldy command line expression to run it. My python code to perform a test run looks like this (I am also using the sample mouse brain h5ad file provided on the website to test it out):

from cell_type_mapper.cli.from_specified_markers import (
    FromSpecifiedMarkersRunner
)

config = {
    'precomputed_stats': {
        'path': '/Users/ariandjahed/LocalRepos/Other/cell_type_mapper/taxonomies/10x_Whole_Mouse_Brain_taxonomy_(CCN20230722)/precomputed_stats_ABC_revision_230821.h5'
    },
    'query_markers': {
        'serialized_lookup': '/Users/ariandjahed/LocalRepos/Other/cell_type_mapper/taxonomies/10x_Whole_Mouse_Brain_taxonomy_(CCN20230722)/mouse_markers_230821.json'
    },
    'type_assignment': {
        'n_processors': 4,
        'normalization': 'raw'
    },
    'query_path': 'wholemousebrain_ccn20230722_example_10kcells_550genes.h5ad',
    'extended_result_path': 'mapping_output.json',
    'csv_result_path': 'mapping_output.csv',
    'drop_level': 'CCN20230722_SUPT',
    'cloud_safe': False
}

mapping_runner = FromSpecifiedMarkersRunner(
    args=[], input_data=config)

mapping_runner.run()

However, I am getting the following error, and I can't seem to figure out what is causing it:

Traceback (most recent call last):
  File "/Users/ariandjahed/LocalRepos/SharedRepos/Bonsai_SpatialTranscriptomics/mapmycells_data/test_with_local_repo/mapmycellstest.py", line 26, in <module>
    mapping_runner.run()
  File "/Users/ariandjahed/LocalRepos/Other/cell_type_mapper/src/cell_type_mapper/cli/from_specified_markers.py", line 71, in run
    run_mapping(
  File "/Users/ariandjahed/LocalRepos/Other/cell_type_mapper/src/cell_type_mapper/cli/from_specified_markers.py", line 144, in run_mapping
    output = _run_mapping(
             ^^^^^^^^^^^^^
  File "/Users/ariandjahed/LocalRepos/Other/cell_type_mapper/src/cell_type_mapper/cli/from_specified_markers.py", line 310, in _run_mapping
    create_marker_cache_from_specified_markers(
  File "/Users/ariandjahed/LocalRepos/Other/cell_type_mapper/src/cell_type_mapper/type_assignment/marker_cache_v2.py", line 115, in create_marker_cache_from_specified_markers
    marker_lookup = validate_marker_lookup(
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ariandjahed/LocalRepos/Other/cell_type_mapper/src/cell_type_mapper/type_assignment/marker_cache_v2.py", line 775, in validate_marker_lookup
    raise RuntimeError(error_msg)
RuntimeError: After comparing query data to reference data, no valid marker genes could be found at any level in the taxonomy.
Example of genes in query set:
['2900052N01Rik', '4930509J09Rik', '9330158H04Rik', 'A630012P03Rik', 'A830036E02Rik']

What would you advise? Any help would be appreciated.
Thank you so much in advance!

@danielsf
Copy link
Collaborator

danielsf commented Dec 5, 2024

(Sorry for the late reply; I didn't see this issue pop up in my email for some reason).

The marker genes listed in mouse_markers_230821.json are all identified by their ENSEMBL ID. Based on the error message, it looks like the genes in your unlabeled dataset are identified using gene symbols. When the cell type mapper compares the two, it does not by default know the equivalence between a gene symbol and an ENSEMBL ID, so it thinks your dataset has no marker genes in it.

The recommended way to solve this problem is for you to re-generate your unlabeled data, identifying every gene by its ENSEMBL ID. I say this is recommended because there can be some degeneracy in that mapping (the same gene symbol can map to multiple ENSEMBL ID, depending on the version of the ENSEMBL genome you are mapping to).

However if you run the mapper with the configuration parameter

{
...
"map_to_ensembl": True
...
}

the mapper will make its best guess at the equivalence between gene symbols and ENSEMBL ID (really, it will just do the mapping using the version of the ENSEMBL genome that we at the Allen Institute used for sequencing the reference data).

Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants