Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Availability #9

Open
marianogabitto opened this issue Oct 26, 2022 · 8 comments
Open

Data Availability #9

marianogabitto opened this issue Oct 26, 2022 · 8 comments

Comments

@marianogabitto
Copy link

Hi,
is it possible to request the data from a different source than the Baidu Cloud. It is inaccessible.

Thanks

@adamtongji
Copy link
Contributor

The sciCAR demo have been uploaded to my dropbox folder. Could you check if it is accessible?

https://www.dropbox.com/sh/p0wyyeuutzw9je9/AAAD362HYPXDGrjgltXpYmHza?dl=0

The input format is same across all demo datasets. And you may test your code with the smallest sciCAR cell line dataset.

@marianogabitto
Copy link
Author

Hi, I was able to download it . Would it be possible to request the two additional toy datasets that you use at the beginning of the paper ? The snare-seq and the paired one?

Thanks a lot !

@marianogabitto
Copy link
Author

One more thing, do you have the raw data ? Because you copied normalized versions of it !
Thanks !

@adamtongji
Copy link
Contributor

Hi,
The raw data fastq and count matrix of these datasets could be downloaded from GEO accession ID of the origin data paper.

And scMVP should take top DEG scRNA matrix and TF-IDF normalized or binary scATAC matrix as input. More scATAC peaks and normalized scATAC input would both improve the performance of latent embedding and imputation in scMVP.

The datasets exceeds the limit of my dropbox account, and I upload the other two cell line datasets to the google drive as the following link:
https://drive.google.com/drive/folders/18ymTLyMb_wD20O4Z2qkOXBQt5yoDTvea?usp=sharing

@Citugulia40
Copy link

Hi,
I want to ask you that, how did you generated the "sciCAR_cell_annot.txt" file. I have the barcodes, features and count matrix from both scRNA and scATAC, how can I get the annotation file?

Thanks

@adamtongji
Copy link
Contributor

Hi, I want to ask you that, how did you generated the "sciCAR_cell_annot.txt" file. I have the barcodes, features and count matrix from both scRNA and scATAC, how can I get the annotation file?

Thanks

You can download the "sciCAR_cell_annot.txt" file directly from the demo dataset folder in the baidu cloud disk or the dropbox folder link(https://www.dropbox.com/sh/p0wyyeuutzw9je9/AAAD362HYPXDGrjgltXpYmHza?dl=0).

@EddieBio
Copy link

Hi,

I found that there is no TF-IDF code in your repository. Should we process it by ourselves in advance?

@adamtongji
Copy link
Contributor

Hi,

I found that there is no TF-IDF code in your repository. Should we process it by ourselves in advance?

The scATAC profiles in the demo datasets are preprocessed for TF-IDF (normalized) using Seurat.

We have compared the performance using raw scATAC binary count or TF-IDF transformed scATAC profile, and found consistent (a bit) higher accuracy with TF-IDF transformed scATAC profile. If you apply our tool to your own raw count scATAC data, we suggest to perform TF-IDF for scATAC data in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants