Skip to content
This repository has been archived by the owner on Oct 4, 2023. It is now read-only.
Nathan Hammond edited this page Jun 13, 2015 · 16 revisions

Welcome to the XPPF Wiki!

This is a place for the development team to document XPPF while it is under development. Feel free to use this as scratch space for half-baked ideas.

What is XPPF?

The eXtremely Portable Pipeline Framework (XPPF) is a platform for creating and running analysis pipelines that can be easily executed in a variety of environments (local, job scheduler, cloud), using a variety of types of persistent storage (cloud, local).

It also provides a convenient means of exactly describing an analysis performed, the executables used, and the input files used, such that an analysis can be shared with others for review.

Direct access to the storage system for inputs and outputs is not required to run an analysis. Rather, data and analyses are published through a web interface. This allows XPPF to handle access controls and access logging, useful for regulatory compliance and for reducing the chance of accidental data loss from human error. The analysis records track the provenance of each result, and built-in environment control and version management ensure reproducibility as long as the underlying applications are deterministic.

Runtime environment management is fully automated and reproducible, requiring no intervention from the pipeline operator.

The result is an extremely convenient way to develop pipelines that can be easily moved or shared, and that generate data in such a way that it can be easily tracked, queried, and shared.

User stores

For a full list of user stories to be supported, see the User Stories wiki pageUser Stories wiki page.

Planned feature highlights:

Data security

  • encrypted data transmission
  • encrypted data storage
  • user access management

Traceability

  • access logging
  • analysis logs
  • data provenance tracking

Repeatability

  • enforces consistency of processing environment and installed apps
  • version management

Flexibility

  • processing may be configured as local, cloud, or job scheduler
  • storage may be configured as local or cloud
  • backend web server may be run locally or remotely, including cloud
  • database may be MySQL or SQLite and can be local or remote, including cloud solutions
  • the same pipeline can can be run on any of the above without modification, simply by changing the configuration

Portability

  • runs on Docker containers, which allow platform-agnostic pipeline execution
  • supports cloud platforms for ease of deployment
  • database import and export functions simplify migration
Clone this wiki locally