Skip to content

Latest commit

 

History

History
100 lines (65 loc) · 2.71 KB

README.rst

File metadata and controls

100 lines (65 loc) · 2.71 KB
https://travis-ci.org/EpiCompBio/stats_utils.svg?branch=master Documentation Status

stats_utils

A collection of scripts for common procedures (e.g. PCA)

Requirements

Various and this will probably get outdated quickly. Please see the individual script requirements.

See also requirements files and Dockerfile for more information.

Most of the scripts are R or Python though so at the least you'll need:

  • R >= 3.2
  • Python >= 3.5
  • r-docopt
  • r-data.table
  • r-ggplot2

Installation

# You may want to create a specific environment with conda first, then run:
pip install git+git://github.com/EpiCompBio/stats_utils.git

To use

# Create a folder or a whole data science project, e.g. project_quickstart -n my_project
cd my_project/results
mkdir tests
cd tests
# You may need to install missing dependencies, e.g.:
conda install r-docopt r-data.table r-ggplot2 r-cowplot r-ggthemes
# Simulate some data:
simulate_cont_var.py -h
simulate_cont_var.py --createDF --sample-size=1000 --var-size=50 -O cont_var_sim_data
# The file will have rows as features/variables and columns as
# samples/individuals. Transpose it for prcomp in run_PCA:
transpose.R -I cont_var_sim_data.tsv
# Run principal components:
run_PCA.R -h
run_PCA.R -I cont_var_sim_data.transposed.tsv
# Check the outputs:
head cont_var_sim_data* | cut -f1-5
open top_10_PCs_cont_var_sim_data.transposed.pca.svg

Contribute

Pull requests welcome!

Support

If you have any issues, pull requests, etc. please report them in the issue tracker.