Checkout and build

$ git clone https://github.com/cloudozer/BWT.git
$ cd BWT
$ git checkout master
$ ./rebar get-deps
# Setup domain config files (bwtm.dom and bwtw.dom)
$ make

Getting DNA files

Download an archive: https://docs.google.com/uc?id=0B2DPaltm6IwpYVFHOEZYSGpldHc&export=download
Extract it to the BWT folder

Run test on local machine using 2 workers

$ ./scripts/start_local.sh SRR770176_1.fastq GL000193.1 2

Cluster's nodes requirements

Friendly Linux
Xen
Erlang OTP 17
Git
Internet access

Master node Setup

Edit domain config file 'bwtm.dom', setup expected number of workers, ssh port, etc.

$ make
$ sudo xl create -c bwtm.dom

Worker node Setup

Edit domain config file 'bwtm.dom', setup master ip address, etc.

$ make
$ sudo xl create -c bwtw.dom

Secure Shell connection to a Ling node

$ ssh %NODE_HOST% -p %PORT%   # (password: 1)

[Disregards Info below this line]

Big file processing

These are small tools to help process large files in Erlang. In general, the strategy is to read in the file as an array of possibly overlapping Erlang binary "chunks". These can then be processed in parallel/concurrently.

How to run

download bio_pfile.erl
launch Erlang shell
compile: c(bio_pfil)
run: bio_pfile:read(Filename,NumerOfChunks) or bio_pfile:read(FileName,NumberOfChunks,SizeOfOverlap) both of which return an array of chunk elements: {{StartPos,Length},BinaryData}

Example

1> Data = bio_pfile:read("../data/GCA_000001405.15_GRCh38_full_analysis_set.fna",10000).
[{{0,32553715},<<">chr1  AC:CM000663.2  gi:568336023  LN:248956422  rl:Chromosome  M5:6aef897c3d6ff0c78aff06ac189178dd     AS"...>>},
{{32543715,32563715},<<"ACCTCATAGATTGGTCATCTTTTTCTC\nCTATATTTCTCTAATATTTAATCTCTCTCTCTCTCTCTCTTTGTATGTGCATTGCCTTTGGAGAGATTTC\nC"...>>},
{{65097430,32563715},<<"AATCAAGAAAATATGTTTACCAAAA\nTGCATTGCAATTTTCCCAAACCTGAGTCTTCAAATAACAAACATGAACTTATAGGTACTGTGAACTAGAA"...>>},
{{97651145,32563715},<<"CAAGAATTGAGGTTTGGGAAACT\nCCATCTAGATTTCAGAGGATGTATGGAAATACCTGGATGTCCAGGCAGTAGTTTGCTGCAAGGGTGTG"...>>},
{{813832875,32563715},<<"TT\nT"...>>},
{{846386590,...},<<...>>},
{{...},...},
{...}|...]
2> length(Data).                                                                        
10001
3> lists:nth(1,Data).                                                                   
{{0,32553715},<<">chr1  AC:CM000663.2  gi:568336023  LN:248956422  rl:Chromosome  M5:6aef897c3d6ff0c78aff06ac189178dd  AS:GRC"...>>}
4>

run: bio_pfile:spawn_find_pattern(ChunkArray,BinaryPattern) which returns an array of all the stat positions where the pattern was found as {StartPosition,LengthOfPattern}.

Example

4>  bio_pfile:spawn_find_pattern(Data,<<"TATATTCAGTCTTTCTAACACCATTTATTGAAGAGACTGTAG">>).
[{162758595,42}]

Name		Name	Last commit message	Last commit date
Latest commit History 451 Commits
apps		apps
bwt_files		bwt_files
fm_indices		fm_indices
include		include
rel		rel
scripts		scripts
src		src
.gitignore		.gitignore
LING.md		LING.md
Makefile		Makefile
README.md		README.md
bwtm.dom		bwtm.dom
bwtw.dom		bwtw.dom
demo_desc.md		demo_desc.md
domain_config.save		domain_config.save
domain_config.src		domain_config.src
make_index.py		make_index.py
railing		railing
rebar		rebar
rebar.config		rebar.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Checkout and build

Getting DNA files

Run test on local machine using 2 workers

Cluster's nodes requirements

Master node Setup

Worker node Setup

Secure Shell connection to a Ling node

Big file processing

How to run

Example

Example

About

Releases 2

Packages

Contributors 5

Languages

cloudozer/BWT

Folders and files

Latest commit

History

Repository files navigation

Checkout and build

Getting DNA files

Run test on local machine using 2 workers

Cluster's nodes requirements

Master node Setup

Worker node Setup

Secure Shell connection to a Ling node

Big file processing

How to run

Example

Example

About

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 5

Languages

Packages