Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cosmos Full Text Search support #35868

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Cosmos Full Text Search support #35868

wants to merge 1 commit into from

Conversation

maumar
Copy link
Contributor

@maumar maumar commented Mar 29, 2025

  • Adding model building API to configure property as full-text search enabled, as well as setup the index for it,
  • Adding model validation (e.g. FTS index not matching FTS property),
  • Adding EF.Functions stubs and translations for FullTextContains, FullTextContainsAll, FullTextContainsAny, FullTextScore and RRF (for hybrid),
  • Adding logic in SelectExpression to produce ORDER BY RANK when necessary,
  • Adding validation when attempting to mix with ORDER BY RANK with regular ORDER BY,
  • Rewrite OFFSET/LIMIT from parameter to constant when ORDER BY RANK is present.

outstanding work:

  • support for FTS Container building using Azure.ResourceManager.CosmosDb (currently blocked)
  • support for Owned types (adjust model in tests and fix paths) - this also needs to happen for vector search
  • add model building support for default language,
  • add more tests for hybrid search,
  • add more model validation tests for invalid scenarios (index on multiple columns, FTS on non-string etc)
  • add xml docs,
  • clean up exception messages and put them in a resource file,

Fixes #35476
Fixes #35853
Fixes #35867

@@ -927,7 +927,19 @@ protected override ShapedQueryExpression TranslateCast(ShapedQueryExpression sou

if (TranslateLambdaExpression(source, keySelector) is SqlExpression translation)
{
((SelectExpression)source.QueryExpression).ApplyOrdering(new OrderingExpression(translation, ascending));
if (translation is SqlFunctionExpression { IsScoringFunction: true })
Copy link
Contributor Author

@maumar maumar Mar 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit magical (using ORDER BY vs ORDER BY RANK based on the type of argument passed to the OrderBy linq method). Initially I had an separate OrderByRank method (on CosmosQueryablEextension), so users could be explicit about wanting to order by scoring function, but hit all sorts of issues in the nav expansion - we need process the navigations inside the scoring function and/or push includes via pending selector/orderby, but Nav expansion doesn't know about the new OrderByRank method and there is no good way to extend it :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, adding a queryable operator to EF is kinda impossible because of our nav expansion design, I've ran into this several times in the past.

Will review when this is out of draft, we'll see what makes sense here...

{
}

internal SqlFunctionExpression(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add to SqlExpressionFactory later

@@ -276,9 +298,26 @@ private static async Task<bool> CreateContainerIfNotExistsOnceAsync(
containerProperties.VectorEmbeddingPolicy = new VectorEmbeddingPolicy(embeddings);
}

if (vectorIndexes.Any())
if (vectorIndexes.Any()/* || fullTextIndexes.Any()*/)
Copy link
Contributor Author

@maumar maumar Mar 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix and clean-up this

@maumar maumar requested a review from Copilot March 29, 2025 02:22
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for full-text search in Cosmos by introducing new full-text index types, translators, model validations, and extensions.

  • Introduces full-text indexing support in container creation and client wrappers.
  • Adds a new CosmosFullTextSearchTranslator and corresponding DbFunctions extensions.
  • Updates model validation and Fluent API extensions to include full-text search configuration.

Reviewed Changes

Copilot reviewed 23 out of 26 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/EFCore.Cosmos/Storage/Internal/CosmosClientWrapper.cs Adds support for full-text indexing in container creation, but the new full-text index assignment may override vector index settings.
src/EFCore.Cosmos/Query/Internal/Translators/CosmosVectorSearchTranslator.cs Modifies method checking logic that may prematurely reject valid method calls by changing && to
src/EFCore.Cosmos/Query/Internal/Translators/CosmosFullTextSearchTranslator.cs Implements a new translator for full-text search functions.
src/EFCore.Cosmos/Query/Internal/Expressions/SqlFunctionExpression.cs Introduces an extra constructor parameter to flag scoring functions.
src/EFCore.Cosmos/Query/Internal/Expressions/SelectExpression.cs Adds a RankOrdering property and methods to apply rank-based ordering.
src/EFCore.Cosmos/Query/Internal/CosmosQueryableMethodTranslatingExpressionVisitor.cs Adjusts ordering logic to support scoring functions and rank ordering.
src/EFCore.Cosmos/Query/Internal/CosmosQuerySqlGenerator.cs Updates SQL generation to include rank ordering handling.
src/EFCore.Cosmos/Metadata/Internal/CosmosAnnotationNames.cs Defines new annotations for full-text search configuration.
src/EFCore.Cosmos/Infrastructure/Internal/CosmosModelValidator.cs Validates full-text index configuration, ensuring only single-property indexes with language annotation are defined.
src/EFCore.Cosmos/Extensions/CosmosPropertyExtensions.cs, CosmosPropertyBuilderExtensions.cs, CosmosIndexExtensions.cs, CosmosIndexBuilderExtensions.cs, CosmosDbFunctionsExtensions.cs Adds Fluent API and extension methods for configuring full-text search on properties and indexes.
Files not reviewed (3)
  • Directory.Packages.props: Language not supported
  • src/EFCore.Cosmos/EFCore.Cosmos.csproj: Language not supported
  • src/EFCore.Cosmos/Properties/CosmosStrings.Designer.cs: Language not supported
Comments suppressed due to low confidence (2)

src/EFCore.Cosmos/Query/Internal/Translators/CosmosVectorSearchTranslator.cs:33

  • Changing the logical operator from '&&' to '||' alters the intended logic and causes the translator to return null for methods declared on CosmosDbFunctionsExtensions that are not named 'VectorDistance'. Revisit the condition to ensure it only returns null when both conditions are unmet.
if (method.DeclaringType != typeof(CosmosDbFunctionsExtensions) || method.Name != nameof(CosmosDbFunctionsExtensions.VectorDistance))

src/EFCore.Cosmos/Extensions/CosmosPropertyExtensions.cs:137

  • [nitpick] The Experimental attribute for full-text search methods is using 'CosmosVectorSearchExperimental' instead of the more appropriate 'CosmosFullTextSearchExperimental'. This may cause confusion regarding the feature's experiment status.
[Experimental(EFDiagnostics.CosmosVectorSearchExperimental)]

@@ -52,6 +52,41 @@ public static T CoalesceUndefined<T>(
T expression2)
=> throw new InvalidOperationException(CoreStrings.FunctionOnClient(nameof(CoalesceUndefined)));

/// <summary>
/// TODO
/// </summary>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leverage CoPilot to fill these in, and then review quickly

@@ -12,6 +12,7 @@
<NoWarn>$(NoWarn);EF9101</NoWarn> <!-- Metrics is experimental -->
<NoWarn>$(NoWarn);EF9102</NoWarn> <!-- Paging is experimental -->
<NoWarn>$(NoWarn);EF9103</NoWarn> <!-- Vector search is experimental -->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still experimental - I remember a discussion to update these, or am I mis-remembering?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right - I think it's no longer experimental. Open an issue for us to make this non-experimental for 10?

@maumar maumar force-pushed the cosmos_fts_take_omega branch 2 times, most recently from 0f39e44 to a836b63 Compare March 31, 2025 23:41
/// any release. You should only use it directly in your code with extreme caution and knowing that
/// doing so can result in application failures when updating to a new Entity Framework Core release.
/// </summary>
public class RankOrderingExpression(SqlExpression expression)
Copy link
Contributor Author

@maumar maumar Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use OrderingExpression and set IsAscending to true, cause we don't generate ASC for it on Cosmos, just DESC for descending. But it feels a bit hacky. Upside is less public surface. Thoughts?

- Adding model building API to configure property as full-text search enabled, as well as setup the index for it,
- Adding model validation (e.g. FTS index not matching FTS property),
- Adding EF.Functions stubs and translations for FullTextContains, FullTextContainsAll, FullTextContainsAny, FullTextScore and RRF (for hybrid),
- Adding logic in SelectExpression to produce ORDER BY RANK when necessary,
- Adding validation when attempting to mix with ORDER BY RANK with regular ORDER BY,
- Rewrite OFFSET/LIMIT from parameter to constant when ORDER BY RANK is present.

outstanding work:
- support for FTS Container building using Azure.ResourceManager.CosmosDb (currently blocked)
- support for Owned types (adjust model in tests and fix paths) - this also needs to happen for vector search
- add model building support for default language,
- add more tests for hybrid search,
- add more model validation tests for invalid scenarios (index on multiple columns, FTS on non-string etc)
- add xml docs,
- clean up exception messages and put them in a resource file,

Fixes #35476
Fixes #35853
Fixes #35867
@maumar maumar force-pushed the cosmos_fts_take_omega branch from a836b63 to 898cc5f Compare April 1, 2025 00:17
@roji
Copy link
Member

roji commented Apr 1, 2025

@maumar just to note that I'm holding back from reviewing since this is still in draft (you seemed to maybe want people to take a look in yesterday's meeting). Let me know if you want a review etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants