Skip to content

RBGKew/pypaftol

Repository files navigation

pypaftol: Home of paftools

This repository contains the paftools Python module which provides functionality for recovery and data analysis of target capture data in the PAFTOL Project. Some of this functionality will be applicable and hopefully found to be useful beyond PAFTOL, e.g. for processing HybSeq data in general. Functionality can be accessed either via the paftools script or the Python API.

The documentation provided here is intended to help beta-testers with getting started.

Usage

Description and examples of usage are provided in our usage and tutorial readme.

Prerequisites

Building and installing the paftol module requires

  • Python 2.7.x (Python 3.x is currently not supported)
  • Python 2.7.x development package (libpython-all-dev)
  • BioPython 1.66 (newer versions are likely to work as well) (python-biopython, and also python-biopython-sql)
  • setuptools (python-setuptools)
  • epydoc (python-epydoc)
  • GNU C compiler (gcc) and associated tools
  • GNU make

The following bioinformatics applications and suites are required for full functionality of the module and the paftools script:

  • Trimmomatic - to use Trimmomatic via Paftools a little shell script is required called trimmomatic that needs to be available from the command line
#! /bin/bash
args=$@
java -jar <FULL_PATH_TO>/trimmomatic-0.39.jar ${args[@]}
  • blast
  • spades
  • samtools
  • bwa
  • exonerate
  • mafft
  • clustalo (aka clustal-omega)
  • emboss
  • embassy-phylip
  • fastqc (currently exactly version 0.11.5 is required)

Additional prerequisites for PAFTOL internal use include:

  • Python mysql.connector

These prerequisites should generally be provided on the cluster.

Installation Guide

  1. Clone the repository
git clone https://github.com/RBGKew/pypaftol
  1. Install by running the command
make hinstall

This will install the package in $HOME/lib/python, which is the standard directory for installing Python modules for use in your account only. You'll need to ensure that your PYTHONPATH environment variable includes this directory, see Tips section below.

  1. Check that the installation was successful by running
paftools -h

This should give you a help message listing the paftools subcommands currently available.

  1. If you like a HTML version of the APIs provided by the paftools package and its subpackages, run the command
make doc

At the time of writing this README, this installation process works on the cluster. Sharing any feedback is very welcome, of course.

Additional information about installation are available [here][Advanced_Install.md]

Tips

Setting up PYTHONPATH

PYTHONPATH is an environment variable which the Python interpreter uses to obtain a list of directories to search for modules when executing an import statement. By default, this variable won't include any directories in your login directory, so if you want to install any modules in your personal space, you'll need to add the directory where you install modules for your personal use. This can be done by the following snippet of bash code:

if test -z "$PYTHONPATH" ; then
  PYTHONPATH=${HOME}/lib/python
else
  PYTHONPATH="${HOME}/lib/python:${PYTHONPATH}"
fi
export PYTHONPATH
  • identify the (mandatory) parameters and options required by foo
  • write a function addFooParser that takes an argparse parser as an argument and adds the relevant parameters and options to that
  • write a runFoo function that takes a single argNamespace parameter, containing the argument namespace generated by the parser, and uses the attributes in that namespace to execute the command
  • finally, wire everything up by calling addFooParser in paftoolsMain and by calling p.add_default(func=runFoo) on the subparser