Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDoc-2499_Corax-6.0-updates #1689

Merged
merged 15 commits into from
Sep 11, 2023
Merged

Conversation

reebhub
Copy link
Contributor

@reebhub reebhub commented Aug 29, 2023

Update the Corax documentation for RavenDB 6.0

{INFO Boosting is also available at the query level. You can read more about it [here](../client-api/session/querying/text-search/boost-search-results). /}

{NOTE: }
When using [Corax](../indexes/search-engine/corax) as the search engine, [boosting is available](../indexes/search-engine/corax#supported-features)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe indexing-time boosting, because it may suggest we do not support query-time boosting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@arekpalinski arekpalinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some Corax issues marked with Documentation required tag:
https://issues.hibernatingrhinos.com/issues/RavenDB?q=%23Corax%20AND%20%20%23%7BDocumentation%20Required%7D

@reebhub @maciejaszyk please verify if something needs to be added to the docs. If it's already done then @reebhub please remove the tag.

{NOTE: }

* **Corax** is RavenDB's native search engine, introduced in RavenDB
version 5.4 as an in-house searching alternative for Lucene.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "introduced in RavenDB version 6.0" since in 5.4 it was an experimental feature with very limited set of features it supports. I'd like to avoid confusion that our users think that they can use Corax in 5.4. Corax is 6.0 is completely rewritten feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


| Feature | Supported by Corax | Comment
|----------------|--------------------|--------
| Dynamic Fields | `yes` | Corax' handling of dynamic fields is similar to Lucene's.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we have this comment? What does it mean? What was the intention here? CC @maciejaszyk

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed


Trying to use Corax with an unimplemented method (see
[Supported Features](../../indexes/search-engine/corax#supported-features) above)
will generate a `System.NotImplementedException` exception and end the search.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will throw NotSupportedInCoraxException. Right @maciejaszyk ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

You can use Corax as your search engine, but explicitly disable the indexing
of complex objects.
When you disable the **indexing** of a field this way, the field's contents
can still be **stored and projected**.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rephrase to "can still be stored so it might be used in projection queries" with a link to: https://ravendb.net/docs/article-page/5.4/csharp/indexes/querying/projections#projections-and-stored-fields ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

#### If Corax Encounters a Complex Property While Indexing:

* If an auto index exists for the document, Corax will throw
`System.NotSupportedException` to notify the user that a search
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maciejaszyk I checked the code and indeed it throws NotSupportedException. Shouldn't it be changed to NotSupportedInCoraxException?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arekpalinski @maciejaszyk pls update me when it's implemented and i'll change it here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reebhub this got changed to NotSupportedInCoraxException

| | [score()](../../indexes/querying/sorting#ordering-by-score) | `yes` |
| | [spatial.distance()](../../client-api/session/querying/how-to-make-a-spatial-query#spatial-sorting) | `yes` |

---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other unsupported features / cases that I found in code.

Querying:

  • throw new NotSupportedInCoraxException($"{nameof(Corax)} doesn't support {nameof(Explanations)} yet.");
  • throw new NotSupportedInCoraxException($"Corax doesn't support 'Distinct' operation on collection bigger than int32 ({int.MaxValue}).");

Indexing:

  • throw new NotSupportedInCoraxException($"{nameof(Corax)} does not support indexing objects that are not points on a world map.");

CC @maciejaszyk

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@reebhub reebhub Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added Explanations, Distinct and Custom analyzers
Indexing objects is documented (where WKT shapes are mentioned)


* [Indexing.Corax.IncludeSpatialDistance](../../server/configuration/indexing-configuration#indexing.corax.includespatialdistance)
Used to include spatial information in document metadata when sorting by distance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have more Corax specific settings. Look for "Indexing.Corax." here: https://github.com/ravendb/ravendb/blob/v6.0/src/Raven.Server/Config/Categories/IndexingConfiguration.cs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@reebhub
Copy link
Contributor Author

reebhub commented Sep 1, 2023

"There are some Corax issues marked with Documentation required tag:
https://issues.hibernatingrhinos.com/issues/RavenDB?q=%23Corax%20AND%20%20%23%7BDocumentation%20Required%7D

@reebhub @maciejaszyk please verify if something needs to be added to the docs. If it's already done then @reebhub please remove the tag."


these are the issues, 3 of the 4 are done, the fourth isn't ready for documentation yet.

  • RavenDB-19603 Implement scoring generated by relevance of the documents (boosting)
    Boosting ranking: added a note to the boosting article
  • RavenDB-19990 Allow to skip over int.Max docs in query (for Corax) but limit projection up to int.Max
    Added a “limits” section to the corax page and an info box in the paging page.
  • RavenDB-20481 Failing test on Corax: Oregon.Fails
    Added a note to the “Turn the complex property into a string” section.
  • RavenDB-21021 Include Corax DebugView (expose query structure) in Timings()
    To be documented when the feature is done.

Set the Corax index compression max documents limit used for dictionary creation.

* [Indexing.Corax.MaxMemoizationSizeInMb](../../server/configuration/indexing-configuration#indexing.corax.maxmemoizationsizeinmb)
The maximum amount of memory that Corax can use for a memoization clause during query processing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@reebhub reebhub Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

* The benefits of compression dictionaries are most pronounced for large collections.
* If upon creation there is less than 10000 documents in the collections involved,
it may make sense to manually force an index reset after reaching 100000 documents
to force a retraining.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that 10000 docs is configurable via Indexing.Corax.DocumentsLimitForCompressionDictionaryCreation mentioned above. Also now we have Indexing.Corax.MaxAllocationsAtDictionaryTrainingInMb - 2GB on x64 machines and 128MB on 32bits machines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

make the index more readable.

* **Adding a Compound Field**
In an index definition, add a compound field using the `CompoundFields.Add` method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still in PR but we'll have a strongly types API:

ravendb/ravendb@04d16c6

More specifically see the usage here:
ravendb/ravendb@04d16c6#diff-948bc0c50471bb01c99e3fac74d035edc0a452dd6015dfad197f36dd03ecfc6dR149-R150

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sample updated


{PANEL: Compound Fields}

A compound field is a Corax index field comprised of multiple simple data elements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently a compound field can be composed of exactly 2 fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

- **Default**: `128`
- **Scope**: Server-wide, per database, or per index

{PANEL/}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark this as expert level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@redknightlois
Copy link
Member

LGTM

* If an auto index exists for the document, Corax will throw
`System.NotSupportedException` to notify the user that a search
that makes no sense has been attempted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that in not supported in Corax

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -11,6 +11,14 @@ By default, if the number of returned results exceeds **2048**, the server will
The threshold can be adjusted by changing the `PerformanceHints.MaxNumberOfResults` configuration value.
{INFO/}

{INFO:Limits}
When [Corax](../../indexes/search-engine/corax) is used as the search engine,
indexes of more than `int.MaxValue` documents can be created and used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put the number here, that is not always clear to users

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -11,6 +11,14 @@ By default, if the number of returned results exceeds **2048**, the server will
The threshold can be adjusted by changing the `PerformanceHints.MaxNumberOfResults` configuration value.
{INFO/}

{INFO:Limits}
When [Corax](../../indexes/search-engine/corax) is used as the search engine,
indexes of more than `int.MaxValue` documents can be created and used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's Integer.MAX_VALUE for Java, and let's just put the number here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -11,6 +11,14 @@ By default, if the number of returned results exceeds **2048**, the server will
The threshold can be adjusted by changing the `PerformanceHints.MaxNumberOfResults` configuration value.
{INFO/}

{INFO:Limits}
When [Corax](../../indexes/search-engine/corax) is used as the search engine,
indexes of more than `int.MaxValue` documents can be created and used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

| Searching by [Regex](../../client-api/session/querying/text-search/using-regex) | `yes`
| [Fuzzy Search](../../client-api/session/querying/text-search/fuzzy-search) | **no**
| [Explanations](../../client-api/session/querying/debugging/include-explanations) | **no**
| [Distinct](../../indexes/querying/distinct) operation on a collection with more than int32 ({int.MaxValue}) documents | **no**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not supported by Lucene either

Copy link
Contributor Author

@reebhub reebhub Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed
not sure though what's better, removing it completely or explaining it isn't supported by both
(?)

Comment on lines 453 to 457
* [Indexing.Corax.IncludeDocumentScore](../../server/configuration/indexing-configuration#indexing.corax.includedocumentscore)
Choose whether to include the score value in document metadata when sorting by score.

* [Indexing.Corax.IncludeSpatialDistance](../../server/configuration/indexing-configuration#indexing.corax.includespatialdistance)
Choose whether to include spatial information in document metadata when sorting by distance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would I want one or the other?
I assume there is a perf cost associated with enabling thos?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, IIRC this is related to performance. CC @maciejaszyk

{NOTE: }
Training stops when it reaches either the
[number of documents](../../server/configuration/indexing-configuration#indexing.corax.documentslimitforcompressiondictionarycreation)
threshold (10000 docs by default) or the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
threshold (10000 docs by default) or the
threshold (10,000 docs by default) or the

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

[number of documents](../../server/configuration/indexing-configuration#indexing.corax.documentslimitforcompressiondictionarycreation)
threshold (10000 docs by default) or the
[amount of memory](../../server/configuration/indexing-configuration#indexing.corax.maxallocationsatdictionarytraininginmb)
threshold (2GB/128MB by default). Both thresholds are configurable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is 128MB here? You refer to 32 bits, which can safely be skipped

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed 128MB

Training will stop when it reaches this limit.

- **Type**: `SizeUnit.Megabytes`
- **Default**: 2 GB for x64, or 128 MB for x86 (32 bits)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Default**: 2 GB for x64, or 128 MB for x86 (32 bits)
- **Default**: 2 GB for 64 bits systems, or 128 MB for 32 bits systems

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Training will stop when it reaches this limit.

- **Type**: `int`
- **Default**: `100000`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That says 10K above, which one is right?

Copy link
Contributor Author

@reebhub reebhub Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right
fixed
(it's 100,000)


| Feature | Supported by Corax
|------------------------------------------------------------------------|--------------------
| [MoreLikeThis](../../indexes/querying/morelikethis) | `yes`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add that Custom Sorters are Not supported by Corax as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added, thanks

select new
{
// Handling the field as a string will allow Corax to index it
Location = JsonConvert.Serialize(order.ShipTo.Location)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such index will not compile:

(25,32): error CS0103: The name 'JsonConvert' does not exist in the current context
, IndexDefinitionProperty='', ProblematicText=''   at Raven.Server.Documents.Indexes.Static.IndexCompiler.CompileInternal(String originalName, String cSharpSafeName, MemberDeclarationSyntax class, IndexDefinition definition) in C:\Builds\RavenDB-6.0-Nightly\20230907-0851\src\Raven.Server\Documents\Indexes\Static\IndexCompiler.cs:line 244

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 453 to 457
* [Indexing.Corax.IncludeDocumentScore](../../server/configuration/indexing-configuration#indexing.corax.includedocumentscore)
Choose whether to include the score value in document metadata when sorting by score.

* [Indexing.Corax.IncludeSpatialDistance](../../server/configuration/indexing-configuration#indexing.corax.includespatialdistance)
Choose whether to include spatial information in document metadata when sorting by distance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, IIRC this is related to performance. CC @maciejaszyk

@ppekrol ppekrol merged commit ce3635d into ravendb:master Sep 11, 2023
1 of 2 checks passed
[number of documents](../../server/configuration/indexing-configuration#indexing.corax.documentslimitforcompressiondictionarycreation)
threshold (100,000 docs by default) or the
[amount of memory](../../server/configuration/indexing-configuration#indexing.corax.maxallocationsatdictionarytraininginmb)
threshold (2GB by default). Both thresholds are configurable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: We changed that so now it's dependent the the amount of installed memory on the machine:
https://github.com/ravendb/ravendb/pull/17298/files#diff-a26823a405dc6540477281200849f5d7918be61c063cd3228b6034fbbc852fd2R44-R72

Maybe just mention that without saying explicitly exact number. It's up to 2GB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--> or the amount of memory threshold (up to 2GB).

changed in #1695

Corax indexes will not train compression dictionaries if they are created in the
testing studio interface, because it is designed for indexing prototyping and the
training process will add unnecessary overhead.
{NOTE/}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we could reference Test Index feature article (once we'll have it)

Copy link
Contributor Author

@reebhub reebhub Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-->
{NOTE: Corax and Test Index}
Corax indexes will not train compression dictionaries if they are created in the Test Index interface, because the testing interface is designed for indexing prototyping and the training process will add unnecessary overhead.
{NOTE/}

changed in #1695

![Index Definition](images/corax-02_index-definition.png "Index Definition")
1. Open the index **Configuration** tab.
2. Select the search engine you prefer for this index.
![Per-Index Search Engine](images/corax-03_index-definition_searcher-select.png "Per-Index Search Engine")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see on this image Corax - experimental - it's no longer experimental. We need to update the image.

Copy link
Contributor Author

@reebhub reebhub Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retook all snapshots for the article
changed in #1695

#### If Corax Encounters a Complex Property While Indexing:

* If an auto index exists for the document, Corax will throw
`System.NotSupportedException` to notify the user that a search
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-->

  • If an auto index exists for the document Corax will alert the user:
    The value of '{fieldName}' field is a complex object. Indexing it as a text isn't supported. You should consider querying on individual fields of that object.

changed in #1695

reebhub added a commit to reebhub/docs that referenced this pull request Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants