Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass the databases as parameters, skipping downloading #50

Open
ypriverol opened this issue May 7, 2022 · 2 comments
Open

Pass the databases as parameters, skipping downloading #50

ypriverol opened this issue May 7, 2022 · 2 comments
Assignees
Labels
enhancement Improvement for existing functionality feature-request Request for a new pipeline feature
Milestone

Comments

@ypriverol
Copy link
Member

@DongdongdongW has reported that COSMIC download sometimes fails to download. Email:

During this process, I encountered some problems. For some reason, the database of COSMIC cannot be downloaded. At the same time, the vcf file from ENSEML is missing in the pipeline. So I chose to download the files from these databases myself and generate the proteogenomics database via pypgatk. When selecting the COSMIC database and cBioportal, I only selected data for cell line A549 and lung cancer type. The size of the database containing the decoy generated by the most popular pypgatk is 3.21GB.

We can add the logic of download using wget and also have an option when the user provides the COSMIC file as a parameter in the pipeline and the pipeline do not need to download it.

@ypriverol ypriverol added the enhancement Improvement for existing functionality label May 7, 2022
@ypriverol ypriverol added the feature-request Request for a new pipeline feature label May 7, 2022
@ypriverol ypriverol added this to the DLS2 milestone May 7, 2022
@DongdongdongW
Copy link
Collaborator

I have set the parameters to upload the COSMIC files.

@husensofteng
Copy link
Collaborator

I think it would actually be good to add a parameter e.g. downloaded_data_dir or similar where the user can put pre-downloaded files that are used by the pipeline.

At each download section in the pipeline we can skip downloading the files that already exist in the given directory. Though, I don't know if there is a nice way to implement this in DSL2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement for existing functionality feature-request Request for a new pipeline feature
Projects
None yet
Development

No branches or pull requests

3 participants