This repository has been archived by the owner on Oct 4, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 8
Home
Nathan Hammond edited this page Jul 15, 2015
·
16 revisions
The eXtremely Portable Pipeline Framework, is a tool to create and execute bioinformatics workflows.
We observed that many bioinformaticians are doing analysis in a way that is not repeatable or traceable, and that cannot be easily migrated to a new compute environment. For example:
- Depending on local software installations and environment settings
- Executing commands manually
- Failing to record analysis details and results in a database
- Depending on file names and paths that may change and cannot be verified
XPPF is designed as an easy-to-use analysis platform that remedies these problems to make analysis repeatable, your results traceable, and your workflows portable to other platforms.
- XPPF uses Docker to make your pipelines platform-agnostic and to avoid using local software installations or environment settings
- The same pipelines can be run in the cloud, on a local cluster, or on your laptop
- XPPF will seamlessly pull data from your local path, a remote file server, or an object store
- XPPF analyses and results are defined as JSON documents that can be easily shared by email
- JSON documents defining analyses and results have the same meaning anywhere, with no dependencies on your software configuration, local environment settings, or file locations. External references to files or applications (Docker images) can always be verified using a cryptographic hash of contents
- XPPF lets you run all your analysis with encryption of data in transit and at rest
- The same features that make XPPF sharable help to ensure repeatability: input files are verified with a cryptographic hash, and applications are stored in Docker containers that can be verified by image ID
- Analyses and results are recorded in a persistent database
- When answering questions like "Where did this file come from?", "What software version did we use to produce this result", or "What settings did we use for this?", you should never be scrambling through your notes or digging through output logs. XPPF keeps track of result provenance and can tell you all the steps that were performed from import of the original input data to producing the final result.
XPPF is under active development. To get involved, contact [email protected]
- Nathan Hammond
- Isaac Liao
- Ziliang Qian