-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDoc-2499_Corax-6.0-updates #1689
Conversation
{INFO Boosting is also available at the query level. You can read more about it [here](../client-api/session/querying/text-search/boost-search-results). /} | ||
|
||
{NOTE: } | ||
When using [Corax](../indexes/search-engine/corax) as the search engine, [boosting is available](../indexes/search-engine/corax#supported-features) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe indexing-time boosting
, because it may suggest we do not support query-time boosting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some Corax issues marked with Documentation required tag:
https://issues.hibernatingrhinos.com/issues/RavenDB?q=%23Corax%20AND%20%20%23%7BDocumentation%20Required%7D
@reebhub @maciejaszyk please verify if something needs to be added to the docs. If it's already done then @reebhub please remove the tag.
{NOTE: } | ||
|
||
* **Corax** is RavenDB's native search engine, introduced in RavenDB | ||
version 5.4 as an in-house searching alternative for Lucene. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "introduced in RavenDB version 6.0" since in 5.4 it was an experimental feature with very limited set of features it supports. I'd like to avoid confusion that our users think that they can use Corax in 5.4. Corax is 6.0 is completely rewritten feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Documentation/6.0/Raven.Documentation.Pages/indexes/search-engine/corax.markdown
Show resolved
Hide resolved
|
||
| Feature | Supported by Corax | Comment | ||
|----------------|--------------------|-------- | ||
| Dynamic Fields | `yes` | Corax' handling of dynamic fields is similar to Lucene's. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we have this comment? What does it mean? What was the intention here? CC @maciejaszyk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
|
||
Trying to use Corax with an unimplemented method (see | ||
[Supported Features](../../indexes/search-engine/corax#supported-features) above) | ||
will generate a `System.NotImplementedException` exception and end the search. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will throw NotSupportedInCoraxException
. Right @maciejaszyk ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Documentation/6.0/Raven.Documentation.Pages/indexes/search-engine/corax.markdown
Show resolved
Hide resolved
You can use Corax as your search engine, but explicitly disable the indexing | ||
of complex objects. | ||
When you disable the **indexing** of a field this way, the field's contents | ||
can still be **stored and projected**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rephrase to "can still be stored so it might be used in projection queries" with a link to: https://ravendb.net/docs/article-page/5.4/csharp/indexes/querying/projections#projections-and-stored-fields ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
#### If Corax Encounters a Complex Property While Indexing: | ||
|
||
* If an auto index exists for the document, Corax will throw | ||
`System.NotSupportedException` to notify the user that a search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maciejaszyk I checked the code and indeed it throws NotSupportedException
. Shouldn't it be changed to NotSupportedInCoraxException?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arekpalinski @maciejaszyk pls update me when it's implemented and i'll change it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reebhub this got changed to NotSupportedInCoraxException
| | [score()](../../indexes/querying/sorting#ordering-by-score) | `yes` | | ||
| | [spatial.distance()](../../client-api/session/querying/how-to-make-a-spatial-query#spatial-sorting) | `yes` | | ||
|
||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other unsupported features / cases that I found in code.
Querying:
- throw new NotSupportedInCoraxException($"{nameof(Corax)} doesn't support {nameof(Explanations)} yet.");
- throw new NotSupportedInCoraxException($"Corax doesn't support 'Distinct' operation on collection bigger than int32 ({int.MaxValue}).");
Indexing:
- throw new NotSupportedInCoraxException($"{nameof(Corax)} does not support indexing objects that are not points on a world map.");
CC @maciejaszyk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also "Custom analyzers" aren't supported in Indexing - https://ravendb.net/docs/article-page/5.4/csharp/studio/database/settings/custom-analyzers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added Explanations
, Distinct
and Custom analyzers
Indexing objects is documented (where WKT shapes are mentioned)
|
||
* [Indexing.Corax.IncludeSpatialDistance](../../server/configuration/indexing-configuration#indexing.corax.includespatialdistance) | ||
Used to include spatial information in document metadata when sorting by distance. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have more Corax specific settings. Look for "Indexing.Corax." here: https://github.com/ravendb/ravendb/blob/v6.0/src/Raven.Server/Config/Categories/IndexingConfiguration.cs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
"There are some Corax issues marked with Documentation required tag: @reebhub @maciejaszyk please verify if something needs to be added to the docs. If it's already done then @reebhub please remove the tag." these are the issues, 3 of the 4 are done, the fourth isn't ready for documentation yet.
|
Set the Corax index compression max documents limit used for dictionary creation. | ||
|
||
* [Indexing.Corax.MaxMemoizationSizeInMb](../../server/configuration/indexing-configuration#indexing.corax.maxmemoizationsizeinmb) | ||
The maximum amount of memory that Corax can use for a memoization clause during query processing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just added a new config option related to initial dictionary creation:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
* The benefits of compression dictionaries are most pronounced for large collections. | ||
* If upon creation there is less than 10000 documents in the collections involved, | ||
it may make sense to manually force an index reset after reaching 100000 documents | ||
to force a retraining. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that 10000 docs is configurable via Indexing.Corax.DocumentsLimitForCompressionDictionaryCreation
mentioned above. Also now we have Indexing.Corax.MaxAllocationsAtDictionaryTrainingInMb
- 2GB on x64 machines and 128MB on 32bits machines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
make the index more readable. | ||
|
||
* **Adding a Compound Field** | ||
In an index definition, add a compound field using the `CompoundFields.Add` method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still in PR but we'll have a strongly types API:
More specifically see the usage here:
ravendb/ravendb@04d16c6#diff-948bc0c50471bb01c99e3fac74d035edc0a452dd6015dfad197f36dd03ecfc6dR149-R150
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sample updated
|
||
{PANEL: Compound Fields} | ||
|
||
A compound field is a Corax index field comprised of multiple simple data elements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently a compound field can be composed of exactly 2 fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Documentation/6.0/Raven.Documentation.Pages/indexes/search-engine/corax.markdown
Show resolved
Hide resolved
- **Default**: `128` | ||
- **Scope**: Server-wide, per database, or per index | ||
|
||
{PANEL/} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mark this as expert level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
LGTM |
* If an auto index exists for the document, Corax will throw | ||
`System.NotSupportedException` to notify the user that a search | ||
that makes no sense has been attempted. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that in not supported in Corax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -11,6 +11,14 @@ By default, if the number of returned results exceeds **2048**, the server will | |||
The threshold can be adjusted by changing the `PerformanceHints.MaxNumberOfResults` configuration value. | |||
{INFO/} | |||
|
|||
{INFO:Limits} | |||
When [Corax](../../indexes/search-engine/corax) is used as the search engine, | |||
indexes of more than `int.MaxValue` documents can be created and used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put the number here, that is not always clear to users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -11,6 +11,14 @@ By default, if the number of returned results exceeds **2048**, the server will | |||
The threshold can be adjusted by changing the `PerformanceHints.MaxNumberOfResults` configuration value. | |||
{INFO/} | |||
|
|||
{INFO:Limits} | |||
When [Corax](../../indexes/search-engine/corax) is used as the search engine, | |||
indexes of more than `int.MaxValue` documents can be created and used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's Integer.MAX_VALUE
for Java, and let's just put the number here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -11,6 +11,14 @@ By default, if the number of returned results exceeds **2048**, the server will | |||
The threshold can be adjusted by changing the `PerformanceHints.MaxNumberOfResults` configuration value. | |||
{INFO/} | |||
|
|||
{INFO:Limits} | |||
When [Corax](../../indexes/search-engine/corax) is used as the search engine, | |||
indexes of more than `int.MaxValue` documents can be created and used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Documentation/6.0/Raven.Documentation.Pages/indexes/search-engine/corax.markdown
Show resolved
Hide resolved
| Searching by [Regex](../../client-api/session/querying/text-search/using-regex) | `yes` | ||
| [Fuzzy Search](../../client-api/session/querying/text-search/fuzzy-search) | **no** | ||
| [Explanations](../../client-api/session/querying/debugging/include-explanations) | **no** | ||
| [Distinct](../../indexes/querying/distinct) operation on a collection with more than int32 ({int.MaxValue}) documents | **no** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not supported by Lucene either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
not sure though what's better, removing it completely or explaining it isn't supported by both
(?)
* [Indexing.Corax.IncludeDocumentScore](../../server/configuration/indexing-configuration#indexing.corax.includedocumentscore) | ||
Choose whether to include the score value in document metadata when sorting by score. | ||
|
||
* [Indexing.Corax.IncludeSpatialDistance](../../server/configuration/indexing-configuration#indexing.corax.includespatialdistance) | ||
Choose whether to include spatial information in document metadata when sorting by distance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would I want one or the other?
I assume there is a perf cost associated with enabling thos?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, IIRC this is related to performance. CC @maciejaszyk
{NOTE: } | ||
Training stops when it reaches either the | ||
[number of documents](../../server/configuration/indexing-configuration#indexing.corax.documentslimitforcompressiondictionarycreation) | ||
threshold (10000 docs by default) or the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
threshold (10000 docs by default) or the | |
threshold (10,000 docs by default) or the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
[number of documents](../../server/configuration/indexing-configuration#indexing.corax.documentslimitforcompressiondictionarycreation) | ||
threshold (10000 docs by default) or the | ||
[amount of memory](../../server/configuration/indexing-configuration#indexing.corax.maxallocationsatdictionarytraininginmb) | ||
threshold (2GB/128MB by default). Both thresholds are configurable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is 128MB here? You refer to 32 bits, which can safely be skipped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed 128MB
Training will stop when it reaches this limit. | ||
|
||
- **Type**: `SizeUnit.Megabytes` | ||
- **Default**: 2 GB for x64, or 128 MB for x86 (32 bits) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- **Default**: 2 GB for x64, or 128 MB for x86 (32 bits) | |
- **Default**: 2 GB for 64 bits systems, or 128 MB for 32 bits systems |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Training will stop when it reaches this limit. | ||
|
||
- **Type**: `int` | ||
- **Default**: `100000` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That says 10K above, which one is right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right
fixed
(it's 100,000)
|
||
| Feature | Supported by Corax | ||
|------------------------------------------------------------------------|-------------------- | ||
| [MoreLikeThis](../../indexes/querying/morelikethis) | `yes` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add that Custom Sorters are Not supported by Corax as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added, thanks
select new | ||
{ | ||
// Handling the field as a string will allow Corax to index it | ||
Location = JsonConvert.Serialize(order.ShipTo.Location) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such index will not compile:
(25,32): error CS0103: The name 'JsonConvert' does not exist in the current context
, IndexDefinitionProperty='', ProblematicText='' at Raven.Server.Documents.Indexes.Static.IndexCompiler.CompileInternal(String originalName, String cSharpSafeName, MemberDeclarationSyntax class, IndexDefinition definition) in C:\Builds\RavenDB-6.0-Nightly\20230907-0851\src\Raven.Server\Documents\Indexes\Static\IndexCompiler.cs:line 244
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* [Indexing.Corax.IncludeDocumentScore](../../server/configuration/indexing-configuration#indexing.corax.includedocumentscore) | ||
Choose whether to include the score value in document metadata when sorting by score. | ||
|
||
* [Indexing.Corax.IncludeSpatialDistance](../../server/configuration/indexing-configuration#indexing.corax.includespatialdistance) | ||
Choose whether to include spatial information in document metadata when sorting by distance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, IIRC this is related to performance. CC @maciejaszyk
…omment (because using the suggested index wouldn't compile)
[number of documents](../../server/configuration/indexing-configuration#indexing.corax.documentslimitforcompressiondictionarycreation) | ||
threshold (100,000 docs by default) or the | ||
[amount of memory](../../server/configuration/indexing-configuration#indexing.corax.maxallocationsatdictionarytraininginmb) | ||
threshold (2GB by default). Both thresholds are configurable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: We changed that so now it's dependent the the amount of installed memory on the machine:
https://github.com/ravendb/ravendb/pull/17298/files#diff-a26823a405dc6540477281200849f5d7918be61c063cd3228b6034fbbc852fd2R44-R72
Maybe just mention that without saying explicitly exact number. It's up to 2GB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--> or the amount of memory threshold (up to 2GB).
changed in #1695
Corax indexes will not train compression dictionaries if they are created in the | ||
testing studio interface, because it is designed for indexing prototyping and the | ||
training process will add unnecessary overhead. | ||
{NOTE/} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we could reference Test Index feature article (once we'll have it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-->
{NOTE: Corax and Test Index}
Corax indexes will not train compression dictionaries if they are created in the Test Index interface, because the testing interface is designed for indexing prototyping and the training process will add unnecessary overhead.
{NOTE/}
changed in #1695
![Index Definition](images/corax-02_index-definition.png "Index Definition") | ||
1. Open the index **Configuration** tab. | ||
2. Select the search engine you prefer for this index. | ||
![Per-Index Search Engine](images/corax-03_index-definition_searcher-select.png "Per-Index Search Engine") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see on this image Corax - experimental
- it's no longer experimental. We need to update the image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retook all snapshots for the article
changed in #1695
#### If Corax Encounters a Complex Property While Indexing: | ||
|
||
* If an auto index exists for the document, Corax will throw | ||
`System.NotSupportedException` to notify the user that a search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: We're gonna change that behavior for auto indexes: https://issues.hibernatingrhinos.com/issue/RavenDB-21430/Corax-Alert-users-about-indexing-of-complex-objects-in-auto-indexes-instead-of-throwing-an-error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-->
- If an auto index exists for the document Corax will alert the user:
The value of '{fieldName}' field is a complex object. Indexing it as a text isn't supported. You should consider querying on individual fields of that object.
changed in #1695
Update the Corax documentation for RavenDB 6.0