Best approach to handling searches across multiple object schemas

Hey! We have several different object types and a unified search across them - namely we have objects like:
- Workflows - with names, descriptions, folders
- Notebooks - with titles and content
- Actions - with just the action name
- Launch configs - with just the launch config name
- etc

Each of these has a different set of weightings e.g. you can imagine the ordering of most important fields for workflows is name, description and then the enclosing folder. We implement logic for these weightings at query-time with `BoostQuery`s (snippet below).

```
        // Add term queries for all words except the last one.
        if words.len() > 1 {
            for word in &words[0..words.len() - 1] {
                for (field, weight) in self.weighted_search_fields.values() {
                    let term = Term::from_field_text(*field, word);
                    let term_query = build_term_query(term);
                    let weighted_query = Box::new(BoostQuery::new(
                        term_query,
                        // Boost the term query by the field weight, normalized by the total weight so the final
                        // score is in the range of roughly 0-5. Complex queries might have a score exceeding 5.
                        *weight * SCORE_BOOST_FACTOR / self.normalizing_factor,
                    ));
                    subqueries.push((Occur::Should, weighted_query));
                }
            }
        }
```

**Currently, we've structured this as multiple Tantivy full-text searchers - one for each data source, where we define a schema for each object type.** Then, when we have a search (the user enters a search term on the command palette), we run the search across these different searchers asynchronously, and return an aggregated ranked set of results.

However, **we've seen this scales the number of threads we're spinning up proportionally to the number of data sources,** which isn't great (related to https://github.com/quickwit-oss/tantivy/issues/702).

An approach we're considering is the following:
- Define a unified schema with all possible fields from every object type, with no inherent weightings/boosts
- Objects like Actions would just have empty fields for any that aren't relevant for that object type
- Extend the query-time piece to filter by type of object first, and then **use type-conditional `BoostQuery`s to account for the weights**

This would result in a **single searcher running async.**

**Wanted to check if this is the recommended approach for this sort of search across different object types w/ different schemas?** Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Best approach to handling searches across multiple object schemas #2665

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Best approach to handling searches across multiple object schemas #2665

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions