Mismatch between cell metadata and expression matrix #28

rwollman · 2023-09-17T20:36:12Z

In the 20230830 release, there is a mismatch in the number of cells between the expression matrix and metadata for the Allen MERFISH data. Metadata has 3938808 cells, and the expression matrix has 4334174 cells.

metadata was loaded with:
rpath = metadata['cell_metadata']['files']['csv']['relative_path']
file = os.path.join( download_base, rpath)
cell = pd.read_csv(file, dtype={"cell_label":str})
cell.shape

expression was loaded with:
download_base = '/orangedata/ExternalData/Allen_WMB_2023Sep05'
filename = expression_matrices['C57BL6J-638850']['raw']['files']['h5ad']['relative_path']
adata = anndata.read_h5ad(os.path.join(download_base,filename))
adata.shape

Both of these numbers are different than the number of cells in 20230630 where both datasets had the same number of cells at 4330907.

If the cell numbers are not the same, the spatial data becomes useless, as you can't correspond between cells and xy position. For example, I suspect that the notebooks merfish_tutorial_1,2a,2b show inaccurate maps of gene expression due to this issue (depending on how filtered cells are distributed across sections).

tmchartrand · 2023-09-26T16:44:22Z

I can't explain the number mismatch, but expect it's due to changes in some QC criteria - maybe @mkunst23 can?
Just to note though, this is not an issue for using the remaining data as long as you join the anndata and metadata properly using the cell IDs.

rwollman · 2023-09-26T16:47:07Z

Thanks, you are correct that I can avoid this with a proper merge. My bad and thanks for pointing this out.

mkunst23 · 2023-09-26T16:51:10Z

Yes, the 4334174 cells are before filtering out cells with low average correlation scores (<0.5) when mapped against the reference taxonomy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mismatch between cell metadata and expression matrix #28

Mismatch between cell metadata and expression matrix #28

rwollman commented Sep 17, 2023

tmchartrand commented Sep 26, 2023

rwollman commented Sep 26, 2023

mkunst23 commented Sep 26, 2023

Mismatch between cell metadata and expression matrix #28

Mismatch between cell metadata and expression matrix #28

Comments

rwollman commented Sep 17, 2023

tmchartrand commented Sep 26, 2023

rwollman commented Sep 26, 2023

mkunst23 commented Sep 26, 2023