Skip to content

read recruitment: memory scaling behavior for many targets #17

@ptrebert

Description

@ptrebert

Hi,
as discussed, here the proposed enhancements:

locityper v1.2.0 @ 2025-10-07 19:33:08

parameterization:
locityper recruit --input HIFI_READS --seqs-all TARGETS --distinct --output OUTPUT --minimizer 21 15 --chunk-size 500 --match-len 10000 --threads 12

...
Collected 311391945 minimizers across 43380 loci and 43380 sequences
...
Cgroup mem limit exceeded ...
# fails with ~350 GB of available memory

The HIFI reads are a single SMRT cell dataset (Revio), other runs finish with the same/similar input, which points at the number of target sequences as being the root cause.

Suggested enhancements:

  1. no solution, but heads-up for users:
  • mention scaling behavior in docs / CLI help; if confirmed that the number of target sequences is the problem, provide a recommendation for maximal number of targets per target file such that users know right away how to split/divide-and-conquer the problem
  1. desired solution: implement chunking for processing target sequences

Best,
Peter

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions