Skip to content

Commit 2bbd145

Browse files
committed
wording
1 parent 7f6d715 commit 2bbd145

File tree

1 file changed

+35
-34
lines changed

1 file changed

+35
-34
lines changed

man/formats/vsearch-fastq.5.md

+35-34
Original file line numberDiff line numberDiff line change
@@ -10,40 +10,41 @@ and their corresponding quality scores
1010

1111
# DESCRIPTION
1212

13-
(WIP)
14-
15-
In fastq files, each entry is made of _sequence header_ starting with
16-
a symbol '@', a nucleotidic _sequence_ (same rules as for fasta
17-
sequences), a _quality header_ starting with a symbol '+', and a
18-
_quality string_ of ASCII characters (offset 33 or 64), each one
19-
encoding the quality value of the corresponding position in the
20-
nucleotidic sequence.
21-
22-
#(./fragments/sequences.md)
23-
24-
In fastq files, each entry is made of a _sequence header_, a
25-
_sequence_, a _quality header_, ... The header is defined as the
26-
string comprised between the initial '>' symbol and the first space,
27-
tabulation, or new line symbol, unless the `--notrunclabels` option is
28-
in effect, in which case the entire line is included.
29-
30-
The _header_ should contain printable ascii characters (33-126). The
31-
program will terminate with a fatal error if there are unprintable
32-
ascii characters (see `ascii(7)`). A warning will be issued if
33-
non-ascii characters (128-255) are encountered.
34-
35-
If the header matches the pattern '>[;]size=integer;label', the
36-
pattern '>label;size=integer;label', or the pattern
37-
'>label;size=integer[;]', vsearch will interpret integer as the number
38-
of occurrences (or abundance) of the sequence in the study. That
39-
abundance information is used or created during chimera detection,
40-
clustering, dereplication, sorting and searching.
41-
42-
43-
44-
# EXAMPLES
45-
46-
(give examples of valid and invalid fastq files)
13+
In fastq files, each entry is made of *sequence header* starting with
14+
a symbol '@', a nucleotidic *sequence*, a *quality header* starting
15+
with a symbol '+', and a *quality string* of ASCII characters (offset
16+
33 or 64), each one encoding the quality value of the corresponding
17+
position in the nucleotidic sequence.
18+
19+
The *sequence header* is defined as the string comprised between the
20+
initial '@' symbol and the first space, tabulation, or new line
21+
symbol, unless the `--notrunclabels` option is in effect, in which
22+
case the entire line is included.
23+
24+
The sequence header should contain printable ASCII characters
25+
(33-126). The program will terminate with a fatal error if there are
26+
unprintable ASCII characters (see `ascii(7)`). A warning will be
27+
issued if non-ASCII characters (128-255) are encountered.
28+
29+
If the sequence header contains patterns such as `[@;]size=integer[;]`
30+
or `[@;]ee=float[;]`, vsearch can interpret these annotations and use
31+
them for chimera detection, clustering, dereplication, filtering and
32+
sorting.
33+
34+
#(./fragments/format_sequence.md)
35+
36+
The *quality header* is defined as the string comprised between the
37+
initial '+' symbol and the first space, tabulation, or new line
38+
symbol.
39+
40+
The *quality string* is a string of ASCII characters, starting after
41+
the end of the quality header line and ending before the next header
42+
line, or the file's end. The range of valid ASCII characters can
43+
extend from '!' to '~' when the offset is 33, and from '@' to '~' when
44+
the offset is 64. vsearch silently ignores ASCII characters 9 to 13,
45+
and exits with an error message if ASCII characters 0 to 8, 14 to 31,
46+
‘.’ or ‘-’ are present. All other ASCII or non-ASCII characters are
47+
stripped and complained about in a warning message.
4748

4849

4950
# SEE ALSO

0 commit comments

Comments
 (0)