Skip to content

Latest commit

 

History

History
60 lines (50 loc) · 2.45 KB

README.md

File metadata and controls

60 lines (50 loc) · 2.45 KB

Density analysis of sampling schemes

This repository contains code to go with the paper (bioRxiv)

A near-tight lower bound on the density of forward sampling schemes

  1. A script (run-benchmarks.sh) to benchmark existing sampling schemes, using the repo RagnarGrootKoerkamp/minimizers.
  2. Code to run our ILP (integer linear program, run-ilp.py) that searches for optimal sampling schemes for small parameters.
  3. A python notebook (plots.ipynb) that plots all results and lower bounds.

Requirements

Listed below are the necessary packages and corresponding versions we used to perform the analysis.

Python

Other

Running the benchmarks

Benchmarks can be generated via

./run-benchmarks.sh

which takes around an hour on a machine with 6 cores.

Instructions for running the ILP models

The ILP models are built with gurobipy.

The run-ilp.py script can construct and optimize a forward or local model. To run multiple models with one command, you can supply multiple window sizes, k-mer sizes, and alphabet sizes. An ILP is constructed for each combination of w, k, and sigma.

python run-ilp.py -w 2 3 4 -k 1 2 3 4 5 --sigma 2 3 4 --verbose

All options are listed with --help:

>$ python run-ilp.py --help
usage: run-ilp.py [-h] -w WINDOW_SIZE [WINDOW_SIZE ...] -k KMER [KMER ...] --sigma SIGMA [SIGMA ...] [--local] [-o OUTPUT] [--time-limit TIME_LIMIT] [-t THREADS] [-v]

options:
  -h, --help            show this help message and exit
  -w WINDOW_SIZE [WINDOW_SIZE ...], --window-size WINDOW_SIZE [WINDOW_SIZE ...]
  -k KMER [KMER ...], --kmer KMER [KMER ...]
  --sigma SIGMA [SIGMA ...]
  --local               Find minimum local scheme density
  -o OUTPUT, --output OUTPUT
                        Path to output directory
  --time-limit TIME_LIMIT
                        Time limit (in seconds)
  -t THREADS, --threads THREADS
  -v, --verbose         Log ILP to output