Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CODEX protein annotation and Fig. 4f reproducibility #14

Open
hurazh opened this issue Aug 22, 2024 · 1 comment
Open

CODEX protein annotation and Fig. 4f reproducibility #14

hurazh opened this issue Aug 22, 2024 · 1 comment

Comments

@hurazh
Copy link

hurazh commented Aug 22, 2024

Hi Bokai and Shuxiao,

I would first like to thank you for developing this interesting method and making it open-sourced. I found the protein annotation and spatial pattern discovery section particularly exciting (as well as the corresponding benchmark against CELESTA and Astir). However, if my understanding is correct, the provided demo 2 /docs/tonsil_codex_rnaseq.ipynb and the experiment scripts in /Archiv/tonsil/code are all matching from scrna to protein, i.e. they are doing scrna annotation that corresponds to Extended Data Fig. 6 instead of protein annotation as shown in Fig. 4 in the manuscript.

My questions are:

  1. Is there any difference between
     mf.model.Fusor(
     shared_arr1=protein_shared,
     shared_arr2=rna_shared,
     active_arr1=protein_active,
     active_arr2=rna_active,
     labels1=None,
     labels2=None)
    
    and
    mf.model.Fusor(
    shared_arr1=rna_shared,
    shared_arr2=protein_shared,
    active_arr1=rna_active,
    active_arr2=protein_active,
    labels1=None,
    labels2=None)
    
    If so, I'm assuming I should use the latter for the CODEX protein annotation task like what you did in Fig. 4f, right?
  2. When I try to reproduce Fig. 4f results by following the pipeline in /docs/tonsil_codex_rnaseq.ipynb but swapping the parameters for rna and protein (like what I mentioned in Q1 as well as the svd_components), the final result is quite different from the Fig. 4f. I'm not sure if I'm using the correct parameter and setup. Can you provide the script that can reproduce Fig. 4f for clarification of the usage?

Best,
Haoran

@BokaiZhu
Copy link
Collaborator

Hi Haoran,

Thanks for being interested in our method!

Q1: Is there any difference between the two running parameters? Yes, there will be differences, and in some cases, there will be very noticeable differences, usually when matching to spatial data. Arr1 is the modality where cells will match to the other modality. Arr2 is the modality where cells will be matched. In our extensive benchmarking during our study, we observed better performance if Arr1 has a stronger and cleaner population structure ( compared to Arr2). Thus usually we use scRNA-seq data as Arr1 and spatial data (eg. CODEX) in Arr2, and as in our entire paper, such setup is consistent across all figures. Reversing the setup will likely decrease the performance, and the major reason would be Arr1 now has worsened population structures, influencing the smoothing quality (but also affected by other reasons, including cell numbers etc). So I wouldn't be surprised if you see reduced performance if you switch the matching direction.

Q2: Each dataset might have its own optimized parameters, so I would expect minor changes in results when using different svd_components, but as described in Q1, reversing the matching direction will likely not be a good idea. As described in the context, it sounds like you have a scRNAseq dataset that has been already annotated, and you have a CODEX dataset that wants to be annotated. In this case, you can still keep the matching direction (RNA -> Protein), but can still get cell-matching results of all cells (either Arr1 or Arr2). For example by calling get_matching(order=(2, 1), target='full_data') at the end, you will get all Arr2 cells (after filtering) a cell match to Arr1. Similarlly you can reverse the order to order = (1,2) for final matching results.

Please let me know if my clarification makes sense and let me know if you have any additional questions.

Best,
Bokai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants