Counts seems incorrect #2

enricorox · 2022-12-29T19:11:12Z

Hi!
I'm a computer engineering student and I'm doing my master thesis on improving UST basically (see here if interested).

I wrote a simple C++ program that extracts canonical kmers from simplitigs and appends sequentially its counts using UST output files.
Then I sorted the kmers list and compared to the one computed by Jellyfish-2.

There are difference between counts, though kmers are the same. Can you confirm this?

How to reproduce

Extract kmers and counts from ust output files:

g++ kmers-extractor.cpp -o kmers-extractor
./kmers-extractor <kmer-size> <ust-fasta> <ust-counts>
sort ust-kmers.txt -o ust-kmers-sorted.txt

Extract kmers and counts from starting sequence (not the bcalm one):

jellyfish-linux count -m <kmer-size> -C -s 100M -L 2 <starting-fasta>
jellyfish-linux dump -c mer_counts.jf > kmers.txt
sort kmers.txt -o kmers-sorted.txt

Compare the two files:

cmp kmers-sorted.txt ust-kmers-sorted.txt

kmers-extractor is attached.

Note that kmers with abundance 1 are ignored.

The text was updated successfully, but these errors were encountered:

enricorox · 2022-12-30T18:16:22Z

Making things simple, here there is an easy example.

Let's take the first 31-mer of the first simplitig: CCCTGACAAAAAGGGCCCCAAGCTTCCAATA
Take the first count of the counts file: 3.
Find it or its reverse-complement TATTGGAAGCTTGGGGCCCTTTTTGTCAGGG in the unitigs file: it's on unitig 0
Its count is the last element on the unitig counts vector: 2

I think it's because you don't reverse the unitig counts vector when you reverse-complement the unitig.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Counts seems incorrect #2

Counts seems incorrect #2

enricorox commented Dec 29, 2022

enricorox commented Dec 30, 2022

Counts seems incorrect #2

Counts seems incorrect #2

Comments

enricorox commented Dec 29, 2022

How to reproduce

enricorox commented Dec 30, 2022