Skip to content

Latest commit

 

History

History
125 lines (92 loc) · 4.18 KB

README.md

File metadata and controls

125 lines (92 loc) · 4.18 KB

scATS

ATS idenfication and quantification tools

This code still under development

Installation

git clone [email protected]:ygidtu/scATS.git

cd scATS

# install scATS as command line tool
python3 setup.py install

# Or just run source code
python3 main.py

Usage

Usage: main.py [OPTIONS] COMMAND [ARGS]...

  Welcome

  This function is used to test the function of sashimi plotting

  Created by [email protected] at 2018.12.19 :return:

Options:
  --version   Show the version and exit.
  -h, --help  Show this message and exit.

Commands:
  ats      Inference :param debug: enable debug mode
  count    Postprocess: count ats and calculate psi

ats

Identify ATSs based on aligned BAM files.

➜  afe git:(master) ✗ python main.py ats --help
Usage: main.py ats [OPTIONS] BAMS...

  Inference

  :param debug: enable debug mode

Options:
  -g, --gtf PATH                 The path to genome annotation file in GTF format.   [required]
  -o, --output PATH              The path to output file.   [required]
  -u, --utr-length INTEGER       The length of UTR.
  --n-max-ats INTEGER RANGE      The maximum number of ATSs in same UTR.
  --n-min-ats INTEGER RANGE      The minimum number of ATSs in same UTR.
  --min-ws FLOAT                 The minimum weight of ATSs.
  --min-reads INTEGER            The minimum number of reads in UTR.
  --max-unif-ws FLOAT            The maximum weight of uniform component.
  --max-beta INTEGER             The maximum std for ATSs.
  --fixed-inference              Inference with fixed parameters.
  -d, --debug                    Enable debug mode to get more debugging
                                 information, Never used this while
                                 running.
  -p, --processes INTEGER RANGE  How many cpu to use.
  --remove-duplicate-umi         Only kept reads with different UMIs for
                                 ATS inference.
  -h, --help                     Show this message and exit.

count

Quantification of ATSs.

➜  afe git:(master) ✗ python main.py count --help
Usage: main.py count [OPTIONS]

  Postprocess: count ats and calculate psi

Options:
  -i, --ats PATH                 The path to inferred ats sites.
  -b, --bam PATH                 The file contains path to bams.
  --delimiter TEXT               The delimiter of input bam list
  -o, --output PATH              The path to output utr file, bed format.
  -p, --processes INTEGER RANGE  How many cpu to use.
  -c, --compress                 Wheter to save in gzip format.
  --bulk                         Wheter the input bam is Nanopore or
                                 PacBio.
  -h, --help                     Show this message and exit.

Example

Please prepare the genome annotation file in GTF format and corresponding genome sequance file in fasta format first.

cd simulation

# generate simulation data
python simulation.py /path/to/gtf /path/to/fasta --n_jobs 2 --total 10000

# run ats model
python ../main.py ats -g ./tss.gtf -o ./inferred_sites.txt -p 4 ./simu.bam

The output file details

utr     gene_name       transcript_name number_of_reads inferred_sites   alpha_arr       beta_arr        ws      L
1:1168755-1169255:+     MIR429  MIR429-201      27      1168924,1169077,1169096,1169101 169,322,341,346 5,5,10,5        0.07156744317872403,0.8035226895855316,0.03289719913137757,0.08391005080588618,0.008102617298480465     500
1:1891221-1891721:+     AL109917.1      AL109917.1-201  10      1891223,1891474,1891539 2,253,318       5,5,10  0.3979824461238993,0.2978253162745582,0.2929895754731792,0.01120266212836311    500
  • utr: the genomic location of UTR.
  • gene_name: the name of host gene of this UTR.
  • transcript_name: the name of host transcripts of this UTR.
  • number_of_reads: the number of reads used to infer ATS sites in corresponding UTR.
  • inferred_sites: the inferred ATS sites, multiple sites were seperate by comma.
  • alpha_arr: the alpha of guassion distribution.
  • beta_arr: the beta of guassion distribution.
  • ws: weights of each ATS sites, and weights of reads not belong to each ATS sites.
  • L: the length of UTR, which may larger than 500 (default UTR length). Due to overlapped UTR merging.