GitHub - ponnhide/flashpy: Fast python code to merge paired-end reads

flashpy

flashpy is the python module for merging paired-ends reads generated by high-throughput DNA sequencing systems such as Illumina Miseq, Hiseq and Novaseq. This python code reimplements the algorithm of flash (https://github.com/ebiggers/flash) using cython, so it runs very fast (Merge 10,000 sequence pairs within about 1 second.)

Installation

python setup.py build_ext --inplace
Set PYTHONPATH to the directory where you cloned the repository.

Usage

flashpy provides only two functions: merge and flash. The merge function merge a single pair of two reads. The flash function just iterate merge function for paired reads in the given paired fastq files.

merge(seq1=Nonn, seq2=None, score1=None, score2=None, min_overlap=50, max_overlap=300, allow_outies=True, min_identity=0.5, max_idenity=1.0)
Merge a single seqeunce pair of seq1 and seq2.
- seq1: str
  The DNA sequence.
- seq2: str
  The DNA sequence paired with the seq1.
- score1: list of int
  The quality values for the DNA sequence seq1. The values must be decoded from the ascii codes. The list must be composed of the same number of values as letters in the sequence seq1.
- score2: list of int
  The quality values for the DNA sequence seq2. The values must be decoded from the ascii codes. The list must be composed of the same number of values as letters in the sequence seq2.
- min_overlap: int
  The minimum overlap length between two sequences, seq1 and seq2.
- max_overlap: int
  The maximum overlap length between two sequences, seq1 and seq2.
- allow-outies: bool
  If True, try to combine a sequence pair of seq1 and seq2 in the "outie".
- min_identity: bool
  Minimum allowed sequence identity between the overlapping regions of seq1 and seq2.
- max_identity: bool
  If the identity of a overlapping region is larger than the max_identity value, the function will terminate the operation and return the result based on the overlapping region, even if better overlap regions are stil remained in the other locations.
return
merged_sequence (str), merged_score (list of int), identity (float), ovelap_length (int), overlap_direction ("innie" or "outie")
flash(read1=None, read2=None, min_overlap=50, max_overlap=300, allow_outies=True, min_identity=0.5, max_idenity=1.0, show_progress=True, key_check=True)
Merge a single pair of two fastq files.
- read1: str
  FASTQ file path.
- read2: str
  FASTQ file path paired with the read1.
- min_overlap: int
  Same parameter with min_overlap of merge. The parameter value is applied for all sequence pairs.
- max_overlap: int
  Same parameter with max_overlap of merge. The parameter value is applied for all sequence pairs.
- allow-outies: bool
  Same parameter with min_overlap of merge. The parameter value is applied for all sequence pairs.
- min_identity: bool
  Same parameter with min_overlap of merge. The parameter value is applied for all sequence pairs.
- max_identity: bool
  Same parameter with min_overlap of merge. The parameter value is applied for all sequence pairs.
- show_progress: bool
  If true, display progress bar of the operation.
- key_check: bool
  If true, for each sequence key in read2, the function will if the same sequence key exists in read1.
return
merged_reads, overlap_distributions
- mergd_reads: dict
```
 {*key1* (common sequence key of *r1_key1* and *r2_key*): 
 	{"r1_key"  : Original sequence key in read1:, 
 	 "r2_key"  : Original sequence key in read2 paired with *r1_key*, 
 	 "seq"     : Merged sequence,
 	 "quality" : Merged score,
 	 "identity": Sequence identity of the overlapping region}
  *key2*: ...,
  ...
  }
```
- overlap_distributions: dict
```
{*key1* (("innie" or "outie", *overlap_length*)): Number of paired sequences that share the overlapping region of length *overlap_length*, 
 *key2* : ...,
 ...
 } 
```

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
R1.fastq.gz		R1.fastq.gz
R2.fastq.gz		R2.fastq.gz
README.md		README.md
_flash.pyx		_flash.pyx
flash.py		flash.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

flashpy

Installation

Usage

About

Releases

Packages

Languages

License

ponnhide/flashpy

Folders and files

Latest commit

History

Repository files navigation

flashpy

Installation

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages