CODEX protein annotation and Fig. 4f reproducibility #14

hurazh · 2024-08-22T09:26:43Z

Hi Bokai and Shuxiao,

I would first like to thank you for developing this interesting method and making it open-sourced. I found the protein annotation and spatial pattern discovery section particularly exciting (as well as the corresponding benchmark against CELESTA and Astir). However, if my understanding is correct, the provided demo 2 /docs/tonsil_codex_rnaseq.ipynb and the experiment scripts in /Archiv/tonsil/code are all matching from scrna to protein, i.e. they are doing scrna annotation that corresponds to Extended Data Fig. 6 instead of protein annotation as shown in Fig. 4 in the manuscript.

My questions are:

Is there any difference between

 mf.model.Fusor(
 shared_arr1=protein_shared,
 shared_arr2=rna_shared,
 active_arr1=protein_active,
 active_arr2=rna_active,
 labels1=None,
 labels2=None)

and

mf.model.Fusor(
shared_arr1=rna_shared,
shared_arr2=protein_shared,
active_arr1=rna_active,
active_arr2=protein_active,
labels1=None,
labels2=None)

If so, I'm assuming I should use the latter for the CODEX protein annotation task like what you did in Fig. 4f, right?

When I try to reproduce Fig. 4f results by following the pipeline in /docs/tonsil_codex_rnaseq.ipynb but swapping the parameters for rna and protein (like what I mentioned in Q1 as well as the svd_components), the final result is quite different from the Fig. 4f. I'm not sure if I'm using the correct parameter and setup. Can you provide the script that can reproduce Fig. 4f for clarification of the usage?

Best,
Haoran

The text was updated successfully, but these errors were encountered:

BokaiZhu · 2024-08-28T15:32:24Z

Hi Haoran,

Thanks for being interested in our method!

Q1: Is there any difference between the two running parameters? Yes, there will be differences, and in some cases, there will be very noticeable differences, usually when matching to spatial data. Arr1 is the modality where cells will match to the other modality. Arr2 is the modality where cells will be matched. In our extensive benchmarking during our study, we observed better performance if Arr1 has a stronger and cleaner population structure ( compared to Arr2). Thus usually we use scRNA-seq data as Arr1 and spatial data (eg. CODEX) in Arr2, and as in our entire paper, such setup is consistent across all figures. Reversing the setup will likely decrease the performance, and the major reason would be Arr1 now has worsened population structures, influencing the smoothing quality (but also affected by other reasons, including cell numbers etc). So I wouldn't be surprised if you see reduced performance if you switch the matching direction.

Q2: Each dataset might have its own optimized parameters, so I would expect minor changes in results when using different svd_components, but as described in Q1, reversing the matching direction will likely not be a good idea. As described in the context, it sounds like you have a scRNAseq dataset that has been already annotated, and you have a CODEX dataset that wants to be annotated. In this case, you can still keep the matching direction (RNA -> Protein), but can still get cell-matching results of all cells (either Arr1 or Arr2). For example by calling get_matching(order=(2, 1), target='full_data') at the end, you will get all Arr2 cells (after filtering) a cell match to Arr1. Similarlly you can reverse the order to order = (1,2) for final matching results.

Please let me know if my clarification makes sense and let me know if you have any additional questions.

Best,
Bokai

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CODEX protein annotation and Fig. 4f reproducibility #14

CODEX protein annotation and Fig. 4f reproducibility #14

hurazh commented Aug 22, 2024

BokaiZhu commented Aug 28, 2024

CODEX protein annotation and Fig. 4f reproducibility #14

CODEX protein annotation and Fig. 4f reproducibility #14

Comments

hurazh commented Aug 22, 2024

BokaiZhu commented Aug 28, 2024