Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create images specific to a given specie? #6

Open
nunogit opened this issue Feb 9, 2018 · 12 comments
Open

Create images specific to a given specie? #6

nunogit opened this issue Feb 9, 2018 · 12 comments

Comments

@nunogit
Copy link
Contributor

nunogit commented Feb 9, 2018

If we decide to do this, should we make it for all species?

@marvinm2
Copy link
Collaborator

marvinm2 commented Feb 9, 2018

For personal use, it would save lots of time downloading and running the docker if you have species specific. For cloud, minishift/openshift, other online location, why not all species at once?

@Chris-Evelo
Copy link

Most people will not use more than 1 or 2 species, and certain combinations make a lot more sense than others. No one wants rice and cows for instance, unless you really want all. But people might want human + mouse + rat.

@nunogit
Copy link
Contributor Author

nunogit commented Feb 10, 2018

Ok. That makes a lot of sense... But than how can we elect the important species to package?

@Chris-Evelo
Copy link

I am not sure, How much effort will be involved in making these containers, and how easy will it be to update them? I am tempted to make:

  1. An empty one with instructions how to add.
  2. A human research related one that has human, mouse, rat (and possibly also the metabolomics, reactions and gene-variant databases)
  3. A complete one with everything we have.

And finally, indicate that people can contact us for other specific combinations they might want.

Best, Chris

@nunogit
Copy link
Contributor Author

nunogit commented Feb 14, 2018

Well, the build of a full image takes about 1 hour + upload (~20 minutes).

So, I think that idea can work. The empty one can prompt people for which databases they want. It can be quite easy from a user perspective.

I still have a question though. We currently have BridgeDB releases. We should probably release a docker image each time, not? For reproducibility... or is that not relevant for this?

If the answer is yes and we decide to have a container that it's custom build (like @Chris-Evelo mentions in 1) we should also allow people to pick their own version of the bridgedb database, shouldn't we? If that happens we must keep an history of bridgedb files?!

@Chris-Evelo
Copy link

Yes, I think both make sense. Update the containers with the releases, and archive at least the mapping files. Egon suggested to do the latter also on Zenodo or Figshare.

@nunogit
Copy link
Contributor Author

nunogit commented Feb 15, 2018

Ok. I have to look at their storage capabilities. With FAIR I've been looking at archive.org
Might be an interesting solution to store data like the bdb files.

@egonw
Copy link
Member

egonw commented Feb 16, 2018

@nunogit, can we have these Docker images be built on Jenkins?

I suggest we anticipate to have one general Docker, which downloads all files (ideally from the archive, Figshare, Archive.org, Zenodo, or otherwise; I don't have a strong preference, as long as it gives DOIs :)... @Chris-Evelo, with a FAIRMappings.org idea, that could be the source of information where to download the latest files...

And we have one or more dedicated images... @marvinm2, can you check the use cases to see which species we need for OpenRiskNet (see #9) ?

@nunogit
Copy link
Contributor Author

nunogit commented Feb 16, 2018

@nunogit, can we have these Docker images be built on Jenkins?

@egonw I don't see why not. I haven't done it yet though. I will have a look.

I don't have a strong preference, as long as it gives DOIs :)...

I am not sure archive.org gives DOIs. I will check. I don't have a strong preference either. My only requirement is that it is persistent (long term) and very preferable that they publish their storage policy (so we can have it for FAIR). Is it funded, for how long, for how long will be data maintained.
In any case we should start creating a mechanism to publish the bdb databases systematically into those repos.

which downloads all files

So, is your opinion that the databases should be downloaded after downloading the image?

@Chris-Evelo
Copy link

I am not sure... I am OK with Zenodo and Figshare as a backup facility that happens to give DOI based persistent IDs. But I am not sure that means that we should not maintain our own data repository which can also easily hold other information. I would not be surprised that we actually have to. But I understand the idea that you need a persistent dataset ID. Maybe we can work with identifiers.org and discuss in the ELIXIR identifiers workgroup whether that really should be a DOI, and if so identifiers.org can possible become an issueing instance?

@Chris-Evelo
Copy link

BTW if you indeed download the mapping files after installing BridgeDb it should not be so hard to add a download configuration file where you simply select which datasets you want.

@egonw
Copy link
Member

egonw commented Mar 9, 2018

Another thought is that even single species Docker images make sense... because the way OpenRiskNet works, but as I learned this week, platforms like Phenomenal too, they spin up services on demand. And people just as easily spin up two Dockers (one for each species) as one Docker (with both species).

@egonw egonw removed their assignment Mar 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants