Optimized `SCC` Implementation and Removed `TensorFlow` Dependencies #250

Starlitnightly · 2024-07-22T22:24:41Z

When installing spateo on macOS, I encountered version incompatibility errors. For instance, the vtk package requires a minimum Python version of 3.9. Additionally, installing both torch and tensorflow simultaneously can lead to package conflicts. After thoroughly reviewing the implementation of the relevant TensorFlow code, I rewrote all TensorFlow-related code using PyTorch and removed all TensorFlow-related dependencies from requirements.txt.

Updates:

Re-Implemented NLPCA using PyTorch instead of TensorFlow.
Re-Implemented weighted_binary_crossentropy using PyTorch instead of TensorFlow.
Re-implemented calculate_leiden_partitionand added logger.info.
Optimized label assignment by majority voting in the neighborhood using optimize_cluster.

Notes:

After carefully comparing the clustering effects of Leiden in scanpy and spateo, I found that the enhanced effect is due to dynamo.tl.neighbors compared to scanpy.pp.neighbors. The exact reason for this enhancement still needs further investigation.

Additionally, it is important to note that scc does not achieve state-of-the-art (SOTA) performance and does not yield better results on the gold standard dataset of human cortical neurons.

Since scc is an adjacency matrix that directly combines spatial neighborhoods and PCA neighborhoods, mclust is not applicable and was not included in this PR. Perhaps I could introduce STAGATE into the spateo framework and use it in place of PCA. However, this would introduce a new dependency, pyg, which might complicate updating the existing requirements.txt and incur additional installation costs for users.

…pendencies aristoteleo#244

…as a writeup

Starlitnightly · 2024-07-22T22:30:33Z

More info about optimize_cluster

Sichao25 · 2024-07-24T16:19:20Z

Those updates are awesome; thanks for your contribution! I noticed you've specified matplotlib<=3.6.2 and pandas<=1.5.3 in your changes. Is there a specific reason for keeping these versions? We're currently updating these dependencies, and I also feel they may be causing some dependency issues in the CI.

Starlitnightly · 2024-07-25T05:38:41Z

Thank you for your feedback! I’m glad you found the updates helpful.

Regarding the specific versions of matplotlib and pandas, there are a couple of reasons for those choices:

Pandas: Version 1.5.3 is the last release before Pandas 2.0, which replaced the internal engine with Polars. This update introduced significant changes, including the renaming of many functions and alterations in formatting. Consequently, many packages have limited compatibility to Pandas 1.5.3 to avoid these disruptions.
Matplotlib: For matplotlib, the version restriction to 3.6.2 or 3.6.3 is due to changes in the parameters of some plotting functions in higher versions, such as cmap. To maintain consistency and avoid potential issues, we opted to restrict it to 3.6.2. Without this restriction, we would need to implement version checks before calling plotting functions to ensure compatibility.

However, I understand the importance of keeping our dependencies up-to-date. If the current versions are causing CI issues, I’m more than willing to help investigate and test with the latest versions to ensure everything works smoothly.

Starlitnightly · 2024-07-25T05:52:41Z

@Sichao25 But the latest version of anndata seems to have solved the support bug for pandas versions greater than 2. This requires more testing to ensure compatibility

scverse/anndata#948

Sichao25 · 2024-07-25T16:28:16Z

Thanks for sharing your concerns. We are updating important dependencies like NumPy, pandas, and matplotlib to ensure compatibility with other tools. The current plan includes support for pandas >=1.5.3 and matplotlib>=3.6.2. These updates will make it easier for you to pass the CI and test your changes. Feel free to join the discussion here to share your ideas or report any dependency issues.

About scc, it will be nice to have STAGATE introduced as a new option. How about setting it as an optional dependency that users can install themselves? Specifically, we won't add new dependency directly to requirements.txt, but we will check for its installation before using it by some code like:

try:
    import pyg
except ImportError:

Xiaojieqiu

Hi @Starlitnightly thanks for this pull request. Your ideas on switching from tensorflow to pytorch is a great one. I also like your optimize_cluster function which can definitely be used to smooth the cluster layer on the space

Xiaojieqiu · 2024-07-25T17:56:41Z

spateo/tdr/models/models_backbone/backbone_methods.py


-        # The itermediate model gets the output of the bottleneck layer,
-        # which acts as the projection layer.
-        self.intermediate_layer_model = Model(inputs=model.input, outputs=model.get_layer(bname).output)


@Starlitnightly it looks like your updated code doesn't set the intermediate_layer_model?

so it is always None?

I implemented intermediate_layer_model inside the AutoEncoder later on, which should really be removed in self, I need to test it a bit further

I have tested this function on simulated data, and it runs well. However, I haven't found specific use cases for this function. Could someone provide specific scenarios where this function is used so that I can further optimize it?

spateo/tools/cluster/find_clusters.py

spateo/tools/cluster/leiden.py

Xiaojieqiu · 2024-07-25T18:46:41Z

@Starlitnightly regarding your comments on scc, we found scc works well on several dataset given its simple formulation. (1) but use STAGATE to replace pca seem like a good idea. (2) are you saying spateo's leiden results are worse than scanpy?

Starlitnightly · 2024-07-27T10:51:40Z

@Starlitnightly regarding your comments on scc, we found scc works well on several dataset given its simple formulation. (1) but use STAGATE to replace pca seem like a good idea. (2) are you saying spateo's leiden results are worse than scanpy?

I compared the implementation of leiden in spateo and scanpy, in fact both are the same, so the result is the same, it is the neighbors that cause the difference in leiden. spateo's results are better compared to scanpy's

Starlitnightly · 2024-07-27T11:12:30Z

Thanks for sharing your concerns. We are updating important dependencies like NumPy, pandas, and matplotlib to ensure compatibility with other tools. The current plan includes support for pandas >=1.5.3 and matplotlib>=3.6.2. These updates will make it easier for you to pass the CI and test your changes. Feel free to join the discussion here to share your ideas or report any dependency issues.

About scc, it will be nice to have STAGATE introduced as a new option. How about setting it as an optional dependency that users can install themselves? Specifically, we won't add new dependency directly to requirements.txt, but we will check for its installation before using it by some code like:
try:
    import pyg
except ImportError:

I'm going to try to implement STAGATE in spateo in a future PR. Since the author doesn't give a usable package directly in pypi, a lot of files and code may need to be added!

Starlitnightly · 2024-10-01T13:14:40Z

Maybe it need to be added more method for cluster SOTA in this commit.

STAGATE: https://github.com/QIFEIDKN/STAGATE_pyG
CAST: https://github.com/wanglab-broad/CAST/blob/main/setup.py

Starlitnightly · 2024-10-09T20:09:42Z

The latest tutorial can be viewed in the pull request of spateo-tutorial: https://github.com/aristoteleo/spateo-tutorials/blob/bf32d5739f380948e76cbe07c99e3e8d6d1e3627/5_cluster_digitization/1_bin_scc.ipynb

It should be noted that CAST and STAGATE do not perform well on the current dataset, but they outperform SCC on other datasets. Additionally, in the new tutorial, I have modified the method of spatial domain annotation to use dictionary-based annotation instead of sequential annotation.

There are a few packages providing commands. Try e.g. `pip install scanpy- scripts`! positional arguments: {settings} options: -h, --help show this help message and exit in pySTAGATE

Xiaojieqiu · 2024-10-09T21:12:34Z

excellent work. I am going to merge the pull request now.

Starlitnightly added 4 commits July 23, 2024 03:31

Rewrite all tensorflow related code in pytorch and remove relevant de…

01bc07d

…pendencies aristoteleo#244

Optimized graph transformation of for leiden computation, discarding …

b386bf5

…as a writeup

Added to Optimize the label by majority voting in the neighborhood.

b299d05

Added to Optimize the label by majority voting in the neighborhood.

c6a7bd6

Update requirements.txt

5d59ed7

Xiaojieqiu reviewed Jul 25, 2024

View reviewed changes

Xiaojieqiu mentioned this pull request Jul 25, 2024

Optimized SCC performance & fix some bugs #249

Open

update using instead of

1099262

Merge branch 'main' into main

5371f04

Starlitnightly added 4 commits October 10, 2024 01:17

Merge branch 'aristoteleo:main' into main

ee677cc

Fixed the error of subset of cell using isin (aristoteleo#144)

ccd623c

other

5b68e1a

Re-Added cluster_method arg in scc

b7d315c

Starlitnightly mentioned this pull request Oct 9, 2024

Fixed the error and added smooth function to optimate the clusters aristoteleo/spateo-tutorials#5

Merged

Added CAST and STAGATE to cluster

3b84fbc

remove the usage: scanpy [-h] {settings} ...

882d8b4

There are a few packages providing commands. Try e.g. `pip install scanpy- scripts`! positional arguments: {settings} options: -h, --help show this help message and exit in pySTAGATE

Xiaojieqiu merged commit 3c01986 into aristoteleo:main Oct 9, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized `SCC` Implementation and Removed `TensorFlow` Dependencies #250

Optimized `SCC` Implementation and Removed `TensorFlow` Dependencies #250

Starlitnightly commented Jul 22, 2024

Starlitnightly commented Jul 22, 2024

Sichao25 commented Jul 24, 2024

Starlitnightly commented Jul 25, 2024

Starlitnightly commented Jul 25, 2024

Sichao25 commented Jul 25, 2024 •

edited by Xiaojieqiu

Loading

Xiaojieqiu left a comment

Xiaojieqiu Jul 25, 2024

Xiaojieqiu Jul 25, 2024

Starlitnightly Jul 27, 2024

Starlitnightly Jul 27, 2024

Xiaojieqiu commented Jul 25, 2024

Starlitnightly commented Jul 27, 2024

Starlitnightly commented Jul 27, 2024

Starlitnightly commented Oct 1, 2024 •

edited

Loading

Starlitnightly commented Oct 9, 2024

Xiaojieqiu commented Oct 9, 2024

Optimized SCC Implementation and Removed TensorFlow Dependencies #250

Optimized SCC Implementation and Removed TensorFlow Dependencies #250

Conversation

Starlitnightly commented Jul 22, 2024

Updates:

Notes:

Starlitnightly commented Jul 22, 2024

Sichao25 commented Jul 24, 2024

Starlitnightly commented Jul 25, 2024

Starlitnightly commented Jul 25, 2024

Sichao25 commented Jul 25, 2024 • edited by Xiaojieqiu Loading

Xiaojieqiu left a comment

Choose a reason for hiding this comment

Xiaojieqiu Jul 25, 2024

Choose a reason for hiding this comment

Xiaojieqiu Jul 25, 2024

Choose a reason for hiding this comment

Starlitnightly Jul 27, 2024

Choose a reason for hiding this comment

Starlitnightly Jul 27, 2024

Choose a reason for hiding this comment

Xiaojieqiu commented Jul 25, 2024

Starlitnightly commented Jul 27, 2024

Starlitnightly commented Jul 27, 2024

Starlitnightly commented Oct 1, 2024 • edited Loading

Starlitnightly commented Oct 9, 2024

Xiaojieqiu commented Oct 9, 2024

Optimized `SCC` Implementation and Removed `TensorFlow` Dependencies #250

Optimized `SCC` Implementation and Removed `TensorFlow` Dependencies #250

Sichao25 commented Jul 25, 2024 •

edited by Xiaojieqiu

Loading

Starlitnightly commented Oct 1, 2024 •

edited

Loading