-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized SCC
Implementation and Removed TensorFlow
Dependencies
#250
Conversation
Those updates are awesome; thanks for your contribution! I noticed you've specified matplotlib<=3.6.2 and pandas<=1.5.3 in your changes. Is there a specific reason for keeping these versions? We're currently updating these dependencies, and I also feel they may be causing some dependency issues in the CI. |
Thank you for your feedback! I’m glad you found the updates helpful. Regarding the specific versions of
However, I understand the importance of keeping our dependencies up-to-date. If the current versions are causing CI issues, I’m more than willing to help investigate and test with the latest versions to ensure everything works smoothly. |
@Sichao25 But the latest version of anndata seems to have solved the support bug for pandas versions greater than 2. This requires more testing to ensure compatibility |
Thanks for sharing your concerns. We are updating important dependencies like About scc, it will be nice to have STAGATE introduced as a new option. How about setting it as an optional dependency that users can install themselves? Specifically, we won't add new dependency directly to
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Starlitnightly thanks for this pull request. Your ideas on switching from tensorflow to pytorch is a great one. I also like your optimize_cluster function which can definitely be used to smooth the cluster layer on the space
|
||
# The itermediate model gets the output of the bottleneck layer, | ||
# which acts as the projection layer. | ||
self.intermediate_layer_model = Model(inputs=model.input, outputs=model.get_layer(bname).output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Starlitnightly it looks like your updated code doesn't set the intermediate_layer_model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it is always None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented intermediate_layer_model
inside the AutoEncoder later on, which should really be removed in self, I need to test it a bit further
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested this function on simulated data, and it runs well. However, I haven't found specific use cases for this function. Could someone provide specific scenarios where this function is used so that I can further optimize it?
@Starlitnightly regarding your comments on scc, we found scc works well on several dataset given its simple formulation. (1) but use STAGATE to replace pca seem like a good idea. (2) are you saying spateo's leiden results are worse than scanpy? |
I compared the implementation of leiden in spateo and scanpy, in fact both are the same, so the result is the same, it is the |
I'm going to try to implement STAGATE in spateo in a future PR. Since the author doesn't give a usable package directly in |
Maybe it need to be added more method for cluster SOTA in this commit. |
The latest tutorial can be viewed in the pull request of spateo-tutorial: https://github.com/aristoteleo/spateo-tutorials/blob/bf32d5739f380948e76cbe07c99e3e8d6d1e3627/5_cluster_digitization/1_bin_scc.ipynb It should be noted that CAST and STAGATE do not perform well on the current dataset, but they outperform SCC on other datasets. Additionally, in the new tutorial, I have modified the method of spatial domain annotation to use dictionary-based annotation instead of sequential annotation. |
There are a few packages providing commands. Try e.g. `pip install scanpy- scripts`! positional arguments: {settings} options: -h, --help show this help message and exit in pySTAGATE
excellent work. I am going to merge the pull request now. |
When installing
spateo
on macOS, I encountered version incompatibility errors. For instance, thevtk
package requires a minimum Python version of 3.9. Additionally, installing bothtorch
andtensorflow
simultaneously can lead to package conflicts. After thoroughly reviewing the implementation of the relevant TensorFlow code, I rewrote all TensorFlow-related code using PyTorch and removed all TensorFlow-related dependencies fromrequirements.txt
.Updates:
NLPCA
using PyTorch instead of TensorFlow.weighted_binary_crossentropy
using PyTorch instead of TensorFlow.calculate_leiden_partition
and addedlogger.info
.optimize_cluster
.Notes:
After carefully comparing the clustering effects of Leiden in
scanpy
andspateo
, I found that the enhanced effect is due todynamo.tl.neighbors
compared toscanpy.pp.neighbors
. The exact reason for this enhancement still needs further investigation.Additionally, it is important to note that
scc
does not achieve state-of-the-art (SOTA) performance and does not yield better results on the gold standard dataset of human cortical neurons.Since
scc
is an adjacency matrix that directly combines spatial neighborhoods and PCA neighborhoods, mclust is not applicable and was not included in this PR. Perhaps I could introduce STAGATE into thespateo
framework and use it in place of PCA. However, this would introduce a new dependency,pyg
, which might complicate updating the existingrequirements.txt
and incur additional installation costs for users.