Performance question #280
-
First of all thanks for the excellent library. I'm working on a program for copying certain references into a new bam file. Source here. It's running slower than I would expect, only managing ~6MB/s at 100% CPU usage. I suspect I may be using noodles incorrectly. Any suggestions? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 6 replies
-
Hi @anderspitman, When looking to optimize, I first recommend profiling with performance counters to better understand where time is being spent during execution. For example, on a Linux system using
This shows that 80+% of time is being spent in the DEFLATE codec. Like htslib, linking to a optimized DEFLATE implementation for BGZF reading and writing is beneficial for this particular application. noodles-bgzf supports linking to libdeflate via a feature flag.
This is a simple change that provides a fair amount of improvement, assuming there is no requirement preventing you from linking to C libraries.
Depending on how performance-sensitive your application is, the query reader can be improved to reduce allocations. While Also note that noodles does more comprehensive validation than other implementations upon serialization, which looks unfavorable in benchmarks at the expense of correctness. If this seems unnecessary, you may be interested in using rust-htslib instead. |
Beta Was this translation helpful? Give feedback.
Hi @anderspitman,
When looking to optimize, I first recommend profiling with performance counters to better understand where time is being spent during execution. For example, on a Linux system using
perf
:This shows that 80+% of time is being spent in the DEFLATE…