Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG: avoid clones by using new Signature::try_into() -> KmerMinHash #471

Merged
merged 138 commits into from
Oct 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
138 commits
Select commit Hold shift + click to select a range
480f319
refactor & rename & consolidate
ctb Aug 17, 2024
e6b1c5b
remove 'lower'
ctb Aug 17, 2024
153f246
Merge branch 'main' of github.com:sourmash-bio/sourmash_plugin_branch…
ctb Aug 18, 2024
0d7a556
add cargo doc output for private fn
ctb Aug 18, 2024
df753db
Merge branch 'main' of github.com:sourmash-bio/sourmash_plugin_branch…
ctb Aug 18, 2024
1da0cf3
add a few comments/docs
ctb Aug 18, 2024
2e7f027
switch to dev version of sourmash
ctb Aug 18, 2024
6b9e00f
tracking
ctb Aug 18, 2024
2747935
cleaner
ctb Aug 18, 2024
4f49ef8
cleanup
ctb Aug 18, 2024
af1c82d
load rocksdb natively
ctb Aug 18, 2024
53924d6
foo
ctb Aug 18, 2024
e5faed8
Merge branch 'main' of github.com:sourmash-bio/sourmash_plugin_branch…
ctb Aug 19, 2024
7649375
Merge branch 'main' of github.com:sourmash-bio/sourmash_plugin_branch…
ctb Aug 19, 2024
e4618f0
Merge branch 'ctb_misc_cleanup' into ctb_misc2
ctb Aug 19, 2024
3462f92
cargo fmt
ctb Aug 19, 2024
9823ef6
upd
ctb Aug 20, 2024
bfb5053
upd
ctb Aug 20, 2024
c311a69
fix fmt
ctb Aug 20, 2024
28b43d8
MRG: create `MultiCollection` for collections that span multiple file…
ctb Aug 20, 2024
a1b19ae
clippy fixes
ctb Aug 20, 2024
51a14ac
compiling again
ctb Aug 20, 2024
99bd174
cleanup
ctb Aug 20, 2024
36d33a5
bump sourmash to v0.15.1
ctb Aug 21, 2024
02bf7e9
Merge branch 'bump_sourmash' into ctb_misc2
ctb Aug 21, 2024
7f0b010
check if is rocksdb
ctb Aug 21, 2024
b9972c6
Merge branch 'main' of github.com:sourmash-bio/sourmash_plugin_branch…
ctb Aug 21, 2024
5561911
weird error
ctb Aug 21, 2024
dfe56d3
use remove_unwrap branch of sourmash
ctb Aug 21, 2024
e6e80f3
get index to work with MultiCollection
ctb Aug 21, 2024
fed4db3
old bug now fixed
ctb Aug 21, 2024
f5331ef
clippy, format, and fix
ctb Aug 21, 2024
8f90129
make names clearer
ctb Aug 21, 2024
4511347
ditch MultiCollection for index, at least for now
ctb Aug 21, 2024
4ea6730
testy testy
ctb Aug 21, 2024
ac35b24
getting closer
ctb Aug 21, 2024
741a44a
update sourmash
ctb Aug 22, 2024
d429205
mark failing tests
ctb Aug 22, 2024
994fcec
upd
ctb Aug 24, 2024
8451259
cargo fmt
ctb Aug 24, 2024
91b04b5
MRG: test exit from `pairwise` and `multisearch` if no loaded sketche…
ctb Aug 24, 2024
b3e5b81
MRG: switch to more efficient use of `Collection` by removing cloning…
ctb Aug 24, 2024
97db857
MRG: add tests for RocksDB/RevIndex, standalone manifests, and flexib…
ctb Aug 24, 2024
551758f
reenable and fix test_fastgather.py::test_indexed_against
ctb Aug 24, 2024
e3e95fc
impl Deref for MultiCollection
ctb Aug 24, 2024
8d39a4f
clippy
ctb Aug 24, 2024
3439592
switch to using load_sketches method
ctb Aug 24, 2024
d3fa529
deref doesn't actually make sense for MultiCollection
ctb Aug 24, 2024
5a20381
update to latest sourmash code
ctb Aug 27, 2024
6563b0a
update to latest sourmash code
ctb Aug 27, 2024
45608c7
simplify
ctb Aug 27, 2024
bd256dd
update to latest sourmash code
ctb Aug 27, 2024
675974e
Merge branch 'main' of github.com:sourmash-bio/sourmash_plugin_branch…
ctb Aug 27, 2024
afa0faf
remove unnecessary flag
ctb Aug 27, 2024
89b1c08
Merge branch 'main' of github.com:sourmash-bio/sourmash_plugin_branch…
ctb Sep 8, 2024
d120547
MRG: support & test loading of standalone manifests within pathlists …
ctb Sep 9, 2024
73c7f53
MRG: documentation updates based on new collection loading (#444)
ctb Sep 9, 2024
e32638a
Merge branch 'main' of github.com:sourmash-bio/sourmash_plugin_branch…
ctb Sep 9, 2024
fcae8e7
Merge branch 'main' of github.com:sourmash-bio/sourmash_plugin_branch…
ctb Sep 10, 2024
74d7367
Update src/lib.rs
ctb Sep 10, 2024
4e28d64
switch unwrap to expect
ctb Sep 10, 2024
74e2217
Merge branch 'ctb_misc2' of github.com:sourmash-bio/sourmash_plugin_b…
ctb Sep 10, 2024
de35cd5
move unwrap to expect
ctb Sep 10, 2024
1e5ac07
minor cleanup
ctb Sep 10, 2024
388a49a
cargo fmt
ctb Sep 11, 2024
7be1883
provide legacy method to avoid xfail on index loading
ctb Sep 15, 2024
679b972
switch to using reference
ctb Sep 15, 2024
a9143d0
update docs to reflect pathlist behavior
ctb Sep 15, 2024
574cd28
test recursive nature of MultiCollection
ctb Sep 15, 2024
a5b4299
re-enable test that is now passing
ctb Sep 15, 2024
74b9ae6
update to latest sourmash
ctb Sep 16, 2024
9df421d
upd sourmash
ctb Sep 16, 2024
ccd26da
Merge branch 'main' of github.com:sourmash-bio/sourmash_plugin_branch…
ctb Sep 16, 2024
847917f
update sourmash
ctb Sep 17, 2024
9733d47
mut MultiCollection
ctb Sep 18, 2024
019fd1b
cleanup
ctb Sep 18, 2024
780fbda
update after merge of sourmash-bio/sourmash#3305
ctb Sep 21, 2024
84934a7
fix contains_revindex
ctb Sep 22, 2024
56fb948
add trace commands for tracing loading
ctb Sep 22, 2024
6550683
use released version of sourmash
ctb Sep 25, 2024
b510e8e
add support for ignoring abundance
ctb Oct 2, 2024
0993b39
cargo fmt
ctb Oct 2, 2024
ac82fb3
avoid downsampling until we know there is overlap
ctb Oct 4, 2024
7ea9a40
change downsample to true; add panic assertion
ctb Oct 5, 2024
03b9da0
move downsampling side guard
ctb Oct 5, 2024
b954daa
eliminate redundant overlap check
ctb Oct 5, 2024
b0bcc66
move calc_abund_stats
ctb Oct 5, 2024
a2871c0
extract abundance code into own function; avoid downsampling if poss
ctb Oct 5, 2024
d853ef3
cleanup
ctb Oct 5, 2024
207efb2
Merge branch 'toggle_manysearch_abund' into ctb_misc2
ctb Oct 5, 2024
453f943
fmt
ctb Oct 5, 2024
5380325
Merge branch 'toggle_manysearch_abund' into ctb_misc2
ctb Oct 6, 2024
4f5fefd
Merge branch 'main' of github.com:sourmash-bio/sourmash_plugin_branch…
ctb Oct 9, 2024
69fd38b
update to next sourmash release
ctb Oct 11, 2024
ee580b6
cargo fmt
ctb Oct 11, 2024
9814051
upd sourmash
ctb Oct 11, 2024
d27b03e
correct numbers
ctb Oct 11, 2024
e35111a
upd sourmash
ctb Oct 12, 2024
4778862
upd sourmash
ctb Oct 12, 2024
bd18277
Merge branch 'update_sourmash_latest' into ctb_misc2
ctb Oct 12, 2024
2563b0b
upd sourmash
ctb Oct 12, 2024
a0e02ef
upd sourmash
ctb Oct 13, 2024
9b448c8
use new try_into() and eliminate several clone()s
ctb Oct 13, 2024
58502d8
Merge branch 'update_sourmash_latest' into ctb_misc2
ctb Oct 13, 2024
4a780f4
refactor a bit more
ctb Oct 13, 2024
253e676
use new try_into() in manysearch; flag clones
ctb Oct 13, 2024
d0553b9
avoid a few more clones
ctb Oct 13, 2024
1fe6045
eliminate more clone
ctb Oct 13, 2024
53794a0
fix mismatched clauses
ctb Oct 13, 2024
7f95044
note minhash
ctb Oct 13, 2024
e5814bd
fix mastiff_manygather
ctb Oct 13, 2024
e42dd43
avoid more clone
ctb Oct 13, 2024
671d844
resolve comments
ctb Oct 13, 2024
66560c8
microchange
ctb Oct 13, 2024
8ea048c
microchange 2
ctb Oct 13, 2024
c371acb
eliminate more clone: fastgather
ctb Oct 13, 2024
81fc651
avoid more clone: fastmultigather
ctb Oct 13, 2024
fb2302a
refactor to avoid more clones
ctb Oct 13, 2024
fd31f03
rm one more clone
ctb Oct 13, 2024
b4192c3
cleanup
ctb Oct 13, 2024
c4519a8
cargo fmt
ctb Oct 13, 2024
aefd909
cargo fmt
ctb Oct 13, 2024
44df8f8
deallocate collection?
ctb Oct 13, 2024
c43f0d9
deallocate collection?
ctb Oct 13, 2024
87118de
upd sourmash
ctb Oct 13, 2024
1f10c29
Merge branch 'ctb_misc2' into avoid_clones
ctb Oct 13, 2024
ee296e7
cargo fmt
ctb Oct 13, 2024
78eefe1
Merge branch 'ctb_misc2' into avoid_clones
ctb Oct 13, 2024
a5bf5fa
fix merge foo
ctb Oct 13, 2024
521fbb4
Merge branch 'ctb_misc2' into avoid_clones
ctb Oct 13, 2024
cfa6095
try out new sourmash PR
ctb Oct 14, 2024
3971652
upd latest sourmash branch
ctb Oct 14, 2024
564fdc7
upd sourmash
ctb Oct 14, 2024
fff1cd9
Merge branch 'main' of github.com:sourmash-bio/sourmash_plugin_branch…
ctb Oct 14, 2024
ae74f66
Merge branch 'ctb_misc2' into avoid_clones
ctb Oct 14, 2024
2d8b2bb
merge
ctb Oct 15, 2024
e6633ea
upd
ctb Oct 15, 2024
580598c
Merge branch 'main' of github.com:sourmash-bio/sourmash_plugin_branch…
ctb Oct 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 15 additions & 12 deletions src/fastgather.rs
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,15 @@ pub fn fastgather(
// get single query sig and minhash
let query_sig = query_collection.get_first_sig().expect("no queries!?");

// @CTB avoid clone?
let query_sig_ds = query_sig.clone().select(selection)?; // downsample
let query_mh = match query_sig_ds.minhash() {
Some(query_mh) => query_mh,
None => {
let query_filename = query_sig.filename();
let query_name = query_sig.name();
let query_md5 = query_sig.md5sum();

// clone here is necessary b/c we use full query_sig in consume_query_by_gather
let query_sig_ds = query_sig.select(selection)?; // downsample
let query_mh = match query_sig_ds.try_into() {
Ok(query_mh) => query_mh,
Err(_) => {
bail!("No query sketch matching selection parameters.");
}
};
Expand Down Expand Up @@ -68,7 +72,7 @@ pub fn fastgather(
);

// load a set of sketches, filtering for those with overlaps > threshold
let result = load_sketches_above_threshold(against_collection, query_mh, threshold_hashes)?;
let result = load_sketches_above_threshold(against_collection, &query_mh, threshold_hashes)?;
let matchlist = result.0;
let skipped_paths = result.1;
let failed_paths = result.2;
Expand All @@ -91,12 +95,9 @@ pub fn fastgather(
}

if prefetch_output.is_some() {
let query_filename = query_sig.filename();
let query_name = query_sig.name();
let query_md5 = query_sig.md5sum();
write_prefetch(
query_filename,
query_name,
query_filename.clone(),
query_name.clone(),
query_md5,
prefetch_output,
&matchlist,
Expand All @@ -106,7 +107,9 @@ pub fn fastgather(

// run the gather!
consume_query_by_gather(
query_sig,
query_name,
query_filename,
query_mh,
scaled as u64,
matchlist,
threshold_hashes,
Expand Down
33 changes: 22 additions & 11 deletions src/fastmultigather.rs
Original file line number Diff line number Diff line change
Expand Up @@ -92,16 +92,25 @@ pub fn fastmultigather(
let query_name = query_sig.name();
let query_md5 = query_sig.md5sum();

let query_mh = query_sig.minhash().expect("cannot get sketch");
let query_mh: KmerMinHash = query_sig.try_into().expect("cannot get sketch");

// CTB refactor
let query_scaled = query_mh.scaled();
let query_ksize = query_mh.ksize().try_into().unwrap();
let query_hash_function = query_mh.hash_function().clone();
let query_seed = query_mh.seed();
let query_num = query_mh.num();

let mut matching_hashes = if save_matches { Some(Vec::new()) } else { None };
let matchlist: BinaryHeap<PrefetchResult> = against
.iter()
.filter_map(|against| {
let mut mm: Option<PrefetchResult> = None;
if let Ok(overlap) = against.minhash.count_common(query_mh, false) {
if let Ok(overlap) = against.minhash.count_common(&query_mh, false) {
if overlap >= threshold_hashes {
if save_matches {
if let Ok(intersection) = against.minhash.intersection(query_mh)
if let Ok(intersection) =
against.minhash.intersection(&query_mh)
{
matching_hashes.as_mut().unwrap().extend(intersection.0);
}
Expand All @@ -126,8 +135,8 @@ pub fn fastmultigather(

// Save initial list of matches to prefetch output
write_prefetch(
query_filename,
query_name,
query_filename.clone(),
query_name.clone(),
query_md5,
Some(prefetch_output),
&matchlist,
Expand All @@ -136,7 +145,9 @@ pub fn fastmultigather(

// Now, do the gather!
consume_query_by_gather(
query_sig.clone(),
query_name,
query_filename,
query_mh,
scaled as u64,
matchlist,
threshold_hashes,
Expand All @@ -151,12 +162,12 @@ pub fn fastmultigather(
if let Ok(mut file) = File::create(&sig_filename) {
let unique_hashes: HashSet<u64> = hashes.into_iter().collect();
let mut new_mh = KmerMinHash::new(
query_mh.scaled(),
query_mh.ksize().try_into().unwrap(),
query_mh.hash_function().clone(),
query_mh.seed(),
query_scaled,
query_ksize,
query_hash_function,
query_seed,
false,
query_mh.num(),
query_num,
);
new_mh
.add_many(&unique_hashes.into_iter().collect::<Vec<_>>())
Expand Down
13 changes: 8 additions & 5 deletions src/manysearch.rs
Original file line number Diff line number Diff line change
Expand Up @@ -73,12 +73,15 @@ pub fn manysearch(
// against downsampling happens here
match coll.sig_from_record(record) {
Ok(against_sig) => {
if let Some(against_mh) = against_sig.minhash() {
let against_name = against_sig.name();
let against_md5 = against_sig.md5sum();

if let Ok(against_mh) = against_sig.try_into() {
for query in query_sketchlist.iter() {
// avoid calculating details unless there is overlap
let overlap = query
.minhash
.count_common(against_mh, true)
.count_common(&against_mh, true)
.expect("incompatible sketches")
as f64;

Expand Down Expand Up @@ -115,7 +118,7 @@ pub fn manysearch(
median_abund,
std_abund,
) = if calc_abund_stats {
downsample_and_inflate_abundances(&query.minhash, against_mh)
downsample_and_inflate_abundances(&query.minhash, &against_mh)
.ok()?
} else {
(None, None, None, None, None)
Expand All @@ -124,10 +127,10 @@ pub fn manysearch(
results.push(SearchResult {
query_name: query.name.clone(),
query_md5: query.md5sum.clone(),
match_name: against_sig.name(),
match_name: against_name.clone(),
containment: containment_query_in_target,
intersect_hashes: overlap as usize,
match_md5: Some(against_sig.md5sum()),
match_md5: Some(against_md5.clone()),
jaccard: Some(jaccard),
max_containment: Some(max_containment),
average_abund,
Expand Down
18 changes: 11 additions & 7 deletions src/mastiff_manygather.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,19 +61,23 @@ pub fn mastiff_manygather(
// query downsampling happens here
match coll.sig_from_record(record) {
Ok(query_sig) => {
let query_filename = query_sig.filename();
let query_name = query_sig.name();
let query_md5 = query_sig.md5sum();

let mut results = vec![];
if let Some(query_mh) = query_sig.minhash() {
if let Ok(query_mh) = query_sig.try_into() {
let _ = processed_sigs.fetch_add(1, atomic::Ordering::SeqCst);
// Gather!
let (counter, query_colors, hash_to_color) =
db.prepare_gather_counters(query_mh);
db.prepare_gather_counters(&query_mh);

let matches = db.gather(
counter,
query_colors,
hash_to_color,
threshold,
query_mh,
&query_mh,
Some(selection.clone()),
);
if let Ok(matches) = matches {
Expand All @@ -94,9 +98,9 @@ pub fn mastiff_manygather(
unique_intersect_bp: match_.unique_intersect_bp(),
gather_result_rank: match_.gather_result_rank(),
remaining_bp: match_.remaining_bp(),
query_filename: query_sig.filename(),
query_name: query_sig.name().clone(),
query_md5: query_sig.md5sum().clone(),
query_filename: query_filename.clone(),
query_name: query_name.clone(),
query_md5: query_md5.clone(),
query_bp: query_mh.n_unique_kmers() as usize,
ksize: ksize as usize,
moltype: query_mh.hash_function().to_string(),
Expand Down Expand Up @@ -128,7 +132,7 @@ pub fn mastiff_manygather(
} else {
eprintln!(
"WARNING: no compatible sketches in path '{}'",
query_sig.filename()
query_filename
);
let _ = skipped_paths.fetch_add(1, atomic::Ordering::SeqCst);
}
Expand Down
56 changes: 30 additions & 26 deletions src/utils/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ use sourmash::manifest::{Manifest, Record};
use sourmash::selection::Selection;
use sourmash::signature::{Signature, SigsTrait};
use sourmash::sketch::minhash::KmerMinHash;
use sourmash::storage::SigStore;
use stats::{median, stddev};
use std::collections::{HashMap, HashSet};
use std::hash::{Hash, Hasher};
Expand Down Expand Up @@ -662,9 +661,9 @@ pub fn report_on_collection_loading(
#[allow(clippy::too_many_arguments)]
pub fn branchwater_calculate_gather_stats(
orig_query: &KmerMinHash,
query: KmerMinHash,
query: &KmerMinHash,
// these are separate in PrefetchResult, so just pass them separately in here
match_mh: KmerMinHash,
match_mh: &KmerMinHash,
match_name: String,
match_md5: String,
match_size: usize,
Expand Down Expand Up @@ -749,7 +748,7 @@ pub fn branchwater_calculate_gather_stats(
average_abund = n_unique_weighted_found as f64 / abunds.len() as f64;

// todo: try to avoid clone for these?
median_abund = median(abunds.iter().cloned()).unwrap();
median_abund = median(abunds.iter().cloned()).expect("cannot calculate median");
std_abund = stddev(abunds.iter().cloned());
}

Expand Down Expand Up @@ -788,7 +787,9 @@ pub fn branchwater_calculate_gather_stats(
/// removing matches in 'matchlist' from 'query'.

pub fn consume_query_by_gather(
query: SigStore,
query_name: String,
query_filename: String,
orig_query_mh: KmerMinHash,
scaled: u64,
matchlist: BinaryHeap<PrefetchResult>,
threshold_hashes: u64,
Expand Down Expand Up @@ -817,56 +818,59 @@ pub fn consume_query_by_gather(

let mut last_matches = matching_sketches.len();

let location = query.filename();

let orig_query_mh = query.minhash().unwrap();
let query_bp = orig_query_mh.n_unique_kmers() as usize;
let query_n_hashes = orig_query_mh.size();
let mut query_moltype = orig_query_mh.hash_function().to_string();
if query_moltype.to_lowercase() == "dna" {
query_moltype = query_moltype.to_uppercase();
}
let query_md5sum: String = orig_query_mh.md5sum().clone();
let query_name = query.name().clone();
let query_scaled = orig_query_mh.scaled() as usize;

let mut query_mh = orig_query_mh.clone();
let mut orig_query_ds = orig_query_mh.clone().downsample_scaled(scaled)?;
// to do == use this to subtract hashes instead
// let mut query_mht = KmerMinHashBTree::from(orig_query_mh.clone());
let total_weighted_hashes = orig_query_mh.sum_abunds();
let ksize = orig_query_mh.ksize();
let calc_abund_stats = orig_query_mh.track_abundance();
let orig_query_size = orig_query_mh.size();
let mut last_hashes = orig_query_size;

let mut last_hashes = orig_query_mh.size();
// this clone is necessary because we iteratively change things!
// to do == use this to subtract hashes instead
// let mut query_mh = KmerMinHashBTree::from(orig_query_mh.clone());
let mut query_mh = orig_query_mh.clone();

// some items for full gather results
let mut orig_query_ds = orig_query_mh.downsample_scaled(scaled)?;

// track for full gather results
let mut sum_weighted_found = 0;
let total_weighted_hashes = orig_query_mh.sum_abunds();
let ksize = orig_query_mh.ksize();

// set some bools
let calc_abund_stats = orig_query_mh.track_abundance();
let calc_ani_ci = false;
let ani_confidence_interval_fraction = None;

eprintln!(
"{} iter {}: start: query hashes={} matches={}",
location,
query_filename,
rank,
orig_query_mh.size(),
orig_query_size,
matching_sketches.len()
);

while !matching_sketches.is_empty() {
let best_element = matching_sketches.peek().unwrap();

query_mh = query_mh.downsample_scaled(best_element.minhash.scaled())?;
orig_query_ds = orig_query_ds.downsample_scaled(best_element.minhash.scaled())?;

// CTB: won't need this if we do not allow multiple scaleds;
// see sourmash-bio/sourmash#2951
orig_query_ds = orig_query_ds
.downsample_scaled(best_element.minhash.scaled())
.expect("cannot downsample");

//calculate full gather stats
let match_ = branchwater_calculate_gather_stats(
&orig_query_ds,
query_mh.clone(),
// KmerMinHash::from(query.clone()),
best_element.minhash.clone(),
&query_mh,
&best_element.minhash,
best_element.name.clone(),
best_element.md5sum.clone(),
best_element.overlap as usize,
Expand Down Expand Up @@ -896,7 +900,7 @@ pub fn consume_query_by_gather(
unique_intersect_bp: match_.unique_intersect_bp,
gather_result_rank: match_.gather_result_rank,
remaining_bp: match_.remaining_bp,
query_filename: query.filename(),
query_filename: query_filename.clone(),
query_name: query_name.clone(),
query_md5: query_md5sum.clone(),
query_bp,
Expand Down Expand Up @@ -937,7 +941,7 @@ pub fn consume_query_by_gather(

eprintln!(
"{} iter {}: remaining: query hashes={}(-{}) matches={}(-{})",
location,
query_filename,
rank,
query_mh.size(),
sub_hashes,
Expand Down
13 changes: 8 additions & 5 deletions src/utils/multicollection.rs
Original file line number Diff line number Diff line change
Expand Up @@ -322,13 +322,16 @@ impl MultiCollection {
_idx,
record.internal_location()
);
let selected_sig = sig.clone().select(selection).ok()?;
let minhash = selected_sig.minhash()?.clone();

let sig_name = sig.name();
let sig_md5 = sig.md5sum();
let selected_sig = sig.select(selection).ok()?;
let minhash = selected_sig.try_into().expect("cannot extract sketch");

Some(SmallSignature {
location: record.internal_location().to_string(),
name: sig.name(),
md5sum: sig.md5sum(),
name: sig_name,
md5sum: sig_md5,
minhash,
})
}
Expand Down Expand Up @@ -357,7 +360,7 @@ impl MultiCollection {
.par_iter()
.filter_map(|(coll, _idx, record)| match coll.sig_from_record(record) {
Ok(sig) => {
let sig = sig.clone().select(selection).ok()?;
let sig = sig.select(selection).ok()?;
Some(Signature::from(sig))
}
Err(_) => {
Expand Down