Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map new data when using Seurat object #120

Open
PeggySze opened this issue Apr 27, 2021 · 9 comments
Open

Map new data when using Seurat object #120

PeggySze opened this issue Apr 27, 2021 · 9 comments

Comments

@PeggySze
Copy link

PeggySze commented Apr 27, 2021

Hi,
I have precomputed two seurat objects. After infering trajectory for one seurat object, I'd like to map cells from the other seurat object to the precomputed trajectory. However, there is an error when I run map_new_data() function :

adata_combined = st.map_new_data(WT_adata,Mutant_adata)
Top principal components are being used for mapping ...
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-56-f419d4836daf> in <module>
----> 1 adata_combined = st.map_new_data(WT_adata,Mutant_adata)

/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/stream/core.py in map_new_data(adata_ref, adata_new, use_radius)
   4059         print('Top principal components are being used for mapping ...')
   4060         if('top_pcs' not in adata_ref.uns_keys()):
-> 4061             raise ValueError("top principal components are not selected yet in `adata_ref`")
   4062         trans = adata_ref.uns['top_pcs']
   4063         top_pcs_feature = adata_ref.uns['params']['select_top_principal_components']['feature']

ValueError: top principal components are not selected yet in `adata_ref`

The following is my code :

import anndata as ad
import stream as st

# Load the first seurat object
WT_adata = ad.read_loom('"./my_WT_object.loom")
st.set_workdir(WT_adata,'./stream_result')
WT_adata.obsm['top_pcs'] = WT_adata.obsm['pca_cell_embeddings']
WT_adata.obsm['X_vis_umap'] = WT_adata.obsm['umap_cell_embeddings']

st.dimension_reduction(WT_adata,method='mlle',feature='top_pcs',n_jobs=4,n_neighbors=30)
st.seed_elastic_principal_graph(WT_adata,n_clusters=10)
st.elastic_principal_graph(WT_adata,epg_alpha=0.05,epg_mu=0.05,epg_lambda=0.05,epg_trimmingradius=0.1)
st.extend_elastic_principal_graph(WT_adata)
st.plot_dimension_reduction(WT_adata,color=['label'],n_components=2,show_graph=True,show_text=True)
st.plot_branches(WT_adata,show_text=True)

# Load the second seurat object
Mutant_adata = ad.read_loom("./my_mutant_object.loom")
Mutant_adata.obsm['top_pcs'] = Mutant_adata.obsm['pca_cell_embeddings']
Mutant_adata.obsm['X_vis_umap'] = Mutant_adata.obsm['umap_cell_embeddings']

# mapping
adata_combined = st.map_new_data(WT_adata,Mutant_adata)

I tried to run select_top_principal_components() function for WT_adata, but there is another error :

st.select_top_principal_components(WT_adata,n_pc=15,first_pc=True)
using all the features ...
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-63-358562d189fb> in <module>
----> 1 st.select_top_principal_components(WT_adata,n_pc=15,first_pc=True)

/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/stream/core.py in select_top_principal_components(adata, feature, n_pc, max_pc, first_pc, use_precomputed, save_fig, fig_name, fig_path, fig_size, pad, w_pad, h_pad)
   1035         else:
   1036             print('using all the features ...')
-> 1037             trans = sklearn_pca.fit(adata.X)
   1038             X_pca = trans.transform(adata.X)
   1039 #             X_pca = sklearn_pca.fit_transform(adata.X)

/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/sklearn/decomposition/_pca.py in fit(self, X, y)
    350             Returns the instance itself.
    351         """
--> 352         self._fit(X)
    353         return self
    354 

/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/sklearn/decomposition/_pca.py in _fit(self, X)
    392         # This is more informative than the generic one raised by check_array.
    393         if issparse(X):
--> 394             raise TypeError('PCA does not support sparse input. See '
    395                             'TruncatedSVD for a possible alternative.')
    396 

TypeError: PCA does not support sparse input. See TruncatedSVD for a possible alternative.

Could anyone please tell me how to solve this problem? Thanks a lot !

@huidongchen
Copy link
Collaborator

It looks like Mutant_adata.X is a sparse matrix, which resulted in the error you are seeing. Unfortunately for the current version, sparse matrix is not supported yet.

One quick fix for it is to convert sparse matrices into dense matrices, i.e.

WT_adata.X = WT_adata.X.A
Mutant_adata.X = Mutant_adata.X.A

@PeggySze
Copy link
Author

I tried to convert sparse matrices into dense matrices. However, I got the following error :

WT_adata.X = WT_adata.X.A
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-3d42edc1ccff> in <module>
      1 ##  convert sparse matrices into dense matrices
----> 2 WT_adata.X = WT_adata.X.A

AttributeError: 'numpy.ndarray' object has no attribute 'A'

@huidongchen
Copy link
Collaborator

Hmmm, they seem already numpy arrays, which kind of contradicts the previous error.. is there a dummy dataset you can share to reproduce the issue? I m happy to take a closer look.

@PeggySze
Copy link
Author

PeggySze commented May 13, 2021

Sorry for the late reply. I tried to rerun the scripts and found out that converting sparse matrices into dense matrices works. Howerver, I still encounter another problem when mapping data.

# mapping
adata_combined = st.map_new_data(WT_adata,Mutant_adata)

Top principal components are being used for mapping ...
method 'mlle' is being used for mapping ...
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-26-f419d4836daf> in <module>
----> 1 adata_combined = st.map_new_data(WT_adata,Mutant_adata)

/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/stream/core.py in map_new_data(adata_ref, adata_new, use_radius)
   4129     shared_var_key = [x for x in adata_new.var_keys() if x in adata_ref.var_keys()]
   4130     adata_combined.obs = adata_combined.obs[shared_obs_key+['batch']]
-> 4131     adata_combined.var = adata_combined.var[shared_var_key]
   4132     adata_combined.uns['workdir'] = adata_new.uns['workdir']
   4133     for key in adata_ref.uns_keys():

/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   3028             if is_iterator(key):
   3029                 key = list(key)
-> 3030             indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
   3031 
   3032         # take() does not accept boolean indexers

/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
   1264             keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
   1265 
-> 1266         self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
   1267         return keyarr, indexer
   1268 

/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1306             if missing == len(indexer):
   1307                 axis_name = self.obj._get_axis_name(axis)
-> 1308                 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   1309 
   1310             ax = self.obj._get_axis(axis)

KeyError: "None of [Index(['Selected'], dtype='object')] are in the [columns]"

It seems that my adata contain something unexpected var after converting from Seurat object. Here is my adata :

WT_adata
AnnData object with n_obs × n_vars = 6936 × 17612
    obs: 'ClusterID', 'ClusterName', 'integrated_snn_res_0_2', 'nCount_RNA', 'nFeature_RNA', 'orig_ident', 'percent_mt', 'rough_cell_type', 'seurat_clusters', 'kmeans', 'node', 'branch_id', 'branch_id_alias', 'branch_lam', 'branch_dist', 'S0_pseudotime', 'S1_pseudotime'
    var: 'Selected'
    uns: 'workdir', 'var_genes', 'pca_variance_ratio', 'top_pcs', 'params', 'trans_mlle', 'rough_cell_type_color', 'epg', 'flat_tree', 'seed_epg', 'seed_flat_tree', 'ori_epg', 'epg_obj', 'ori_epg_obj', 'branch_id_alias_color', 'stream_S1'
    obsm: 'pca_cell_embeddings', 'umap_cell_embeddings', 'top_pcs', 'X_vis_umap', 'var_genes', 'pca', 'X_mlle', 'X_dr', 'X_spring', 'X_stream_S1'
    varm: 'pca_feature_loadings'
    layers: 'scale_data'

Mutant_adata
AnnData object with n_obs × n_vars = 7216 × 17612
    obs: 'ClusterID', 'ClusterName', 'integrated_snn_res_0_2', 'nCount_RNA', 'nFeature_RNA', 'orig_ident', 'percent_mt', 'rough_cell_type', 'seurat_clusters'
    var: 'Selected'
    obsm: 'pca_cell_embeddings', 'umap_cell_embeddings', 'top_pcs', 'X_vis_umap'
    varm: 'pca_feature_loadings'
    layers: 'scale_data'

@huidongchen
Copy link
Collaborator

huidongchen commented May 14, 2021

The two anndata objects both look fine to me. It's difficult to understand what's going on without knowing the output of Mutant_adata.var and Mutant_adata.var_keys(). Not sure if you can share a subset of both anndata objects. it would be much easier for me to give some more useful feedback on it.

@PeggySze
Copy link
Author

Here are a subset of WT_adata and Mutant_adata which both downsampled to 1000 cells. You can download the data from this url : https://mab.to/5HqGKVtVt
The following is my code :

import anndata as ad
import stream as st

WT_adata = ad.read_loom("Seurat_1000cells_integrated_WT_germline_scRNA.loom")
##  convert sparse matrices into dense matrices
WT_adata.X = WT_adata.X.A
## Feature selection
st.select_variable_genes(WT_adata,loess_frac=0.01, n_genes=2000)
## select top principal components using variable genes
st.select_top_principal_components(WT_adata,n_pc=30,feature='var_genes',first_pc=False)
WT_adata.obsm['top_pcs'] = WT_adata.obsm['pca_cell_embeddings']
## dimension_reduction
st.dimension_reduction(WT_adata,method='mlle',feature='top_pcs',n_jobs=4,n_neighbors=30)
st.plot_dimension_reduction(WT_adata,color=['rough_cell_type'],n_components=2,show_graph=False,show_text=False)
## Trajectory inference
# Seeding the initial elastic principal graph
st.seed_elastic_principal_graph(WT_adata,n_clusters=10)
st.elastic_principal_graph(WT_adata,epg_alpha=0.05,epg_mu=0.05,epg_lambda=0.05,epg_trimmingradius=0.1)
#Extend leaf branch to reach further cells 
st.extend_elastic_principal_graph(WT_adata)
## mapping
Mutant_adata = ad.read_loom("Seurat_1000cells_integrated_Mutant_germline_scRNA.loom")
Mutant_adata.obsm['top_pcs'] = Mutant_adata.obsm['pca_cell_embeddings']
#  convert sparse matrices into dense matrices
Mutant_adata.X = Mutant_adata.X.A
adata_combined = st.map_new_data(WT_adata,Mutant_adata)

Thanks a lot for your help!

@huidongchen
Copy link
Collaborator

huidongchen commented May 18, 2021

Hi,

Thanks for sharing the data. I was able to reproduce the error.

Basically the error was caused by the auto-change of variable keys of the same name when concatenating two anndata objects (The same variable key Selected is present in both WT_adata.var and Mutant_adata.var).

A simple workaround would be to rename either one of them, e.g.

Mutant_adata = ad.read_loom("./Seurat_1000cells_integrated_Mutant_germline_scRNA.loom")

st.set_workdir(WT_adata, './stream_result')
st.set_workdir(Mutant_adata, './stream_mapping_result')
Mutant_adata.var.rename(columns={"Selected": "Selected_mutant"},inplace=True)

Mutant_adata.obsm['top_pcs'] = Mutant_adata.obsm['pca_cell_embeddings']
#  convert sparse matrices into dense matrices
Mutant_adata.X = Mutant_adata.X.A
adata_combined = st.map_new_data(WT_adata,Mutant_adata)

The above code works for me.

@PeggySze
Copy link
Author

PeggySze commented May 30, 2021

Thanks for your help and sorry to bother you again. I can sucessfully run map_new_data() function successfully now. However, when I tired to genereate the dimension reduction plot , there was another error :

st.plot_flat_tree(adata_combined,color=['batch'],show_graph=True,show_text=True)

TypeError                                 Traceback (most recent call last)
<ipython-input-68-66a7b0482b27> in <module>
      1 ## since we learnt graph on visualization manifold, the dimension reduction plot will be the same as the visualization plot
----> 2 st.plot_flat_tree(adata_combined,color=['batch'],show_graph=True,show_text=True)

/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/stream/core.py in plot_flat_tree(adata, color, dist_scale, fig_size, fig_ncol, fig_legend_ncol, fig_legend_order, vmin, vmax, alpha, pad, w_pad, h_pad, show_text, show_graph, save_fig, fig_path, fig_name, plotly)
   2595                 # ax_i.get_legend().texts[0].set_text("")
   2596             else:
-> 2597                 vmin_i = df_plot[ann].min() if vmin is None else vmin
   2598                 vmax_i = df_plot[ann].max() if vmax is None else vmax
   2599                 sc_i = ax_i.scatter(df_plot_shuf['FlatTree1'], df_plot_shuf['FlatTree2'],

/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/pandas/core/generic.py in min(self, axis, skipna, level, numeric_only, **kwargs)
  11211         )
  11212         def min(self, axis=None, skipna=None, level=None, numeric_only=None, **kwargs):
> 11213             return NDFrame.min(self, axis, skipna, level, numeric_only, **kwargs)
  11214 
  11215         # pandas\core\generic.py:11009: error: Cannot assign to a method

/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/pandas/core/generic.py in min(self, axis, skipna, level, numeric_only, **kwargs)
  10715     def min(self, axis=None, skipna=None, level=None, numeric_only=None, **kwargs):
  10716         return self._stat_function(
> 10717             "min", nanops.nanmin, axis, skipna, level, numeric_only, **kwargs
  10718         )
  10719 

/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/pandas/core/generic.py in _stat_function(self, name, func, axis, skipna, level, numeric_only, **kwargs)
  10710             return self._agg_by_level(name, axis=axis, level=level, skipna=skipna)
  10711         return self._reduce(
> 10712             func, name=name, axis=axis, skipna=skipna, numeric_only=numeric_only
  10713         )
  10714 

/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/pandas/core/series.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
   4171         if isinstance(delegate, ExtensionArray):
   4172             # dispatch to ExtensionArray interface
-> 4173             return delegate._reduce(name, skipna=skipna, **kwds)
   4174 
   4175         else:

/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/pandas/core/arrays/_mixins.py in _reduce(self, name, skipna, **kwargs)
    271         meth = getattr(self, name, None)
    272         if meth:
--> 273             return meth(skipna=skipna, **kwargs)
    274         else:
    275             msg = f"'{type(self).__name__}' does not implement reduction '{name}'"

/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    197                 else:
    198                     kwargs[new_arg_name] = new_arg_value
--> 199             return func(*args, **kwargs)
    200 
    201         return cast(F, wrapper)

/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/pandas/core/arrays/categorical.py in min(self, skipna, **kwargs)
   1957         nv.validate_minmax_axis(kwargs.get("axis", 0))
   1958         nv.validate_min((), kwargs)
-> 1959         self.check_for_ordered("min")
   1960 
   1961         if not len(self._codes):

/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/pandas/core/arrays/categorical.py in check_for_ordered(self, op)
   1435         if not self.ordered:
   1436             raise TypeError(
-> 1437                 f"Categorical is not ordered for operation {op}\n"
   1438                 "you can use .as_ordered() to change the "
   1439                 "Categorical to an ordered one\n"

TypeError: Categorical is not ordered for operation min
you can use .as_ordered() to change the Categorical to an ordered one

I tried to run adata_combined.obs['batch'].as_ordered() to change batch colunm to an ordered one. However, it stiil failed to generate dimension reduction plot. The link to download WT_adata and Mutant_adata is changed to https://mab.to/oWopBLhAh

@huidongchen
Copy link
Collaborator

huidongchen commented Jun 2, 2021

No problem. I'm glad the above solution worked out for you.

Re the plotting error, unfortunately I was not able to reproduce it. With the same dataset you shared previously, the same function works for me.

But it seems to have something to do with the data type. Can you change the data type from category to str with the following code and give it a try again?

adata_combined.obs['batch'] = adata_combined.obs['batch'].astype(str)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants