Skip to content

Commit

Permalink
98-complete-documentation-section-on-search (#103)
Browse files Browse the repository at this point in the history
* Working on search docs

* Completed Basic Search section: Missing Faceted Search now

* Fixed bug in search_meta chaining
- (cf #102)

* Prevent facet from crashing due to kwargs

* Fixed detection of append facet case

* Prevent kwargs form causing Compound search to crash

* Completed search page doc
  • Loading branch information
VianneyMI committed Jan 27, 2024
1 parent de652a4 commit 726425f
Show file tree
Hide file tree
Showing 6 changed files with 212 additions and 43 deletions.
3 changes: 2 additions & 1 deletion docs/intro/mongodb-umbrella.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,8 @@ In the [following page](mongodb-aggregation-framework.md), we will do a deep-div

[Atlas Search ](https://www.mongodb.com/docs/atlas/atlas-search/atlas-search-overview/) is a full-text search service that is fully integrated with MongoDB Atlas. It allows you to perform text search on your data and is based on [Apache Lucene](https://lucene.apache.org/).

As Atlas Search is a part of the aggregation framework, `monggregate` also offers a way to use it.
As the aggregation framework is the entry point for Atlas Search, `monggregate` also offers a way to use it.
Check out the search page[here](../tutorial/search.md) for more details.

## **MongoDB Latest Features**

Expand Down
178 changes: 177 additions & 1 deletion docs/tutorial/search.md
Original file line number Diff line number Diff line change
@@ -1 +1,177 @@
Coming soon !
The aggregation framework provide advanced search functionalities through the `$search` and `$searchMeta` stages.

Note: the `$search` and `$searchMeta` stages are only available with MongoDB Atlas.

## **Atlas Search**

Atlas Search offers similar features than other search engines like ElasticSearch or Algolia.
Such features include:

- Full-text search
- Fuzzy search
- Autocompletion
- Highlighting
- Faceting
- Geospatial search
- Relevance scoring
- Query analytics

You can see a more detailled list of features [here](https://www.mongodb.com/docs/atlas/atlas-search/atlas-search-overview/).

## **Using Atlas Search through monggregate**

Like for the other stages `monggregate` defines a class and a `pipeline` method for the search stages.
However, there is a slight difference with the other stages. The search stages are themselves very similar to pipelines.
You will better grasp this concept in one [the below sections](#search-pipelines).

The search stages define their own set of operators called **search operators**.
Below an non-exhaustive list of the search operators:

* Autocomplete
* Compound
* Text
* Regex

Like for the other stages the search stages can be enhanced with one or several operators. Unlike the other stages, it is required to use at least one operator with the search stages.
The operators listed previously are some of most commonly used operators.

The `text` operator is the central operator that allows to perform full-text search. It takes in an optional fuzzy parameter which allows to perform fuzzy search.

The `autocomplete` operator allows to perform autocompletion.

The `compound` operator allows to combine several search operators together while giving each of them a different weight or role thanks to the clause types `filter`, `must`, `mustNot` and `should`.

* `filter` clauses define text that must be present in the documents matching the query.
* `must` clauses are similar to `filter` clauses, but they also affect the relevance score of the documents.
* `mustNot` clauses define text that must not be present in the documents matching the query.
* `should` clauses define text that may be present in the documents matching the query. They also affect the relevance score of the documents. A minimum number of `should` clauses matches can be defined through the `minimumShouldMatch` parameter.

The `facet` collector (sort of operator) allows to perform faceting on the results of the search. It is a very powerful feature and common feature in good search experiences.

Again, the search features are so vast, that they could have their own package, but fortunately for you, they have been included in `monggregate`.

How do you build search queries with `monggregate`? Let's see that in the next section.
In the next sections, we will only talk about the `$search` stage, but everything applies to the `$searchMeta` stage as well.

## **Basic Search**

The `Search` class the and the `search` method have default parameters so that it is easy to quickly get started.

Building your search request is as simple as, the following code:

```python

pipeline.search(
path="description"
query="apple",
)

```

By default, the search will be performed on the `text` operator.

You can also enhance your the search experience by making a fuzzy search, just by adding the `fuzzy` parameter:


```python

from monggregate.search.commons import FuzzyOptions

pipeline.search(
path="description"
query="apple",
fuzzy=FuzzyOptions(
max_edits=2
)
)

```

You can build even richer search queries by adding more operators to your search stage as shown in the next section.

## **Search Pipelines**


The search stages can be composed of multiple search operators, thus defining a compound search.
As such, unlike for other stages, calling the `search` method on a `pipeline` object several times will not add a new `search` stage every time. Instead, every call will complete the previous `search` stage by appending a new clause or a new facet.

NOTE: The `$search` stage has to be the first stage of the pipeline.

As an example, the following code:
```python
pipeline.search(
index="fruits",
operator_name="compound"
).search(
clause_type="must",
query="varieties",
path="description"
).search(
clause_type="mustNot",
query="apples",
path="description"
)
```
will generate the following pipeline:
```json
[
{
"$search": {
"index": "fruits",
"compound": {
"must": {
"query": "varieties",
"path": "description"
},
"mustNot": {
"query": "apples",
"path": "description"
}
}
}
}
]
```

This example was copied from a past version* of MongoDB official doc and has just been adapted to `monggregate` syntax.
Let's review what is going on here.

The first search call, initializes a `$search` stage with an "empty" `compound` operator.
The second search call, completes the `compound` operator by adding a `text` operator in a `must` clause.
The third search call, appends a `text` operator in a `mustNot` clause to the `compound` operator.

At the end, the generated query will return documents containing the word "varieties" in the "description" field, but not containing the word "apples" in the "description" field.

*Unfortunately, the current version of the doc does not provide such example anymore. It is planned that we update this page to use the movies collection instead.

## **Faceted Search**

Unlike previous sections, this section will be illustrated with the `search_meta` method instead of the `search` method, as it is a bit more relevant in the context of faceted search.

`monggregate` eases the process of building faceted search queries.

You can initialize a faceted search query as follows:

```python
pipeline = Pipeline()
pipeline.search_meta(
index="fruits",
collector_name="facet",

)
```

Then, you can add facets to your search query as follows:

```python
pipeline.search_meta(
facet_type="string",
path="category",
)
```

The first code sample initializes the faceted search but it is not usable as such. It is required to add at least one facet to the search query.

The second code sample adds a facet to the search query. The facet is of type `string` and will be performed on the `category` field.

After initializing the faceted search, you can add as many facets as you want to your search query and you can also add other search operators to your search query (in the order that you want) such that the facets will be performed on the results of the search.
1 change: 1 addition & 0 deletions docs/tutorial/vector-search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Coming soon !
47 changes: 15 additions & 32 deletions src/monggregate/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
Unwind,
Unset
)
from monggregate.stages.search.base import OperatorLiteral
from monggregate.stages.search.base import SearchBase, OperatorLiteral
from monggregate.search.operators import OperatorMap
from monggregate.search.operators.compound import Compound, ClauseType
from monggregate.search.collectors.facet import Facet, FacetType
Expand Down Expand Up @@ -1059,14 +1059,14 @@ def search_meta(

# If pipeline is not empty then the first stage must be Search stage.
# If so, adds the operator to the existing stage using Compound.
elif len(self) >= 1 and isinstance(self.stages[0], Search):
elif len(self) >= 1 and isinstance(self.stages[0], SearchMeta):
kwargs.update({
# "collector_name":collector_name,
"operator_name":operator_name,
"path":path,
"query":query,
})
has_facet_arg = self.__has_facet_arg(**kwargs)
has_facet_arg = self.__has_facet_arg(facet_type=facet_type, **kwargs)
if has_facet_arg:
self._append_facet(facet_type, **kwargs)
else:
Expand Down Expand Up @@ -1148,33 +1148,38 @@ def _append_clause(

minimum_should_match = kwargs.pop("minimum_should_match", default_minimum_should_match)

kwargs.update({
"path":path,
"query":query
})

if isinstance(first_stage.collector, Facet):
if isinstance(first_stage.collector.operator, Compound):
# Add clause to existing compound
first_stage.__get_operators_map__(operator_name=operator_name)(clause_type, path=path, query=query, **kwargs)
first_stage.__get_operators_map__(operator_name=operator_name)(clause_type, **kwargs)
elif first_stage.collector.operator is None:
# Create a compound operator with the to-be operator as a clause
new_operator = Compound(minimum_should_match=minimum_should_match)
new_operator.__get_operators_map__(operator_name=operator_name)(clause_type, path=path, query=query, **kwargs)
new_operator.__get_operators_map__(operator_name=operator_name)(clause_type, **kwargs)
first_stage.operator = new_operator
else:
# Retrieve current operator and create a compound operator
# and add the current operator as a clause
new_operator = Compound(should=[first_stage.collector.operator], minimum_should_match=minimum_should_match)
new_operator.__get_operators_map__(operator_name=operator_name)(clause_type, path=path, query=query, **kwargs)
new_operator.__get_operators_map__(operator_name=operator_name)(clause_type, **kwargs)
first_stage.operator = new_operator
elif isinstance(first_stage.operator, Compound):
# Add clause to existing compound
first_stage.__get_operators_map__(operator_name=operator_name)(clause_type, path=path, query=query, **kwargs)
first_stage.__get_operators_map__(operator_name=operator_name)(clause_type, **kwargs)
elif first_stage.operator is not None:
# Create a compound operator with the to-be operator as a clause
new_operator = Compound(minimum_should_match=minimum_should_match)
new_operator.__get_operators_map__(operator_name=operator_name)(clause_type, path=path, query=query, **kwargs)
new_operator.__get_operators_map__(operator_name=operator_name)(clause_type, **kwargs)
first_stage.operator = new_operator

else:
# Create an operator
first_stage.operator = OperatorMap[operator_name](path=path, query=query, **kwargs)
first_stage.operator = OperatorMap[operator_name](**kwargs)

return None

Expand Down Expand Up @@ -1208,7 +1213,7 @@ def __has_facet_arg(cls, **kwargs:Any)->bool:
has_facet_arg = False

for arg in facet_args:
if arg in kwargs:
if arg in kwargs and kwargs[arg] is not None:
has_facet_arg = True
break

Expand Down Expand Up @@ -1435,25 +1440,3 @@ def unset(self, field:str=None, fields:list[str]|None=None)->Self:
)

return self

if __name__ =="__main__":
from datetime import datetime
from monggregate.search.collectors import StringFacet, NumericFacet

pipeline = Pipeline()
pipeline.search_meta(
index="movies",
collector_name="facet",
operator=Search.Range(
path="released",
gte=datetime(year=2000, month=1, day=1),
lte=datetime(year=2015, month=1, day=31)
),
facets=[
StringFacet(name="directorsFacet", path="directors", num_buckets=7),
NumericFacet(name="yearFacet", path="year", boundaries=[2000, 2005, 2010, 2015]),
]
)
search_stage = pipeline[0]
statement = search_stage.statement
print(statement)
3 changes: 2 additions & 1 deletion src/monggregate/search/collectors/facet.py
Original file line number Diff line number Diff line change
Expand Up @@ -1009,7 +1009,8 @@ def facet(
type:FacetType='string',
num_buckets:int|None=None,
boundaries:list[int|float]|list[datetime]|None=None,
default:str|None=None
default:str|None=None,
**kwargs:Any # NOTE : To prevent errors from passing extra argumentscf #100 on GitHub <VM, 22/01/2024>
)->Self:

if type=="string":
Expand Down
Loading

0 comments on commit 726425f

Please sign in to comment.