[RFC] Introduce Template Query to OpenSearch #2979

mingshl · 2024-09-23T23:00:21Z

Problem Statement:

When using search request processors, users need to send an initial search request with properly constructed query builders. However, if the initial request fails to meet the type constraints of the query builders, it will be rejected, and the search request cannot be processed by the search request processors.

The data flow is as follows:

Initial Search Request -> Search Request Processors -> New Search Request

However, when constructing the initial request every query builder has type constrains. For example, for knn query,

(Happy case) this is a valid query accepted by knn query builder, and it can be passed to search request processors:

GET my-knn-index-1/_search
{
  "query": {
    "knn": {
      "my_vector": {
        "vector": [2, 3, 5, 6], // vector field requires a list of int/float 
        "k": 2
      }
    }
  }
}

(Sad case) this is not a valid query that would be throwing exceptions when constructing knn query builder. It cannot reach search request processors.

GET my-knn-index-1/_search
{
  "query": {
    "knn": {
      "my_vector": {
        "vector": "sunny", // vector field requires a list of int/float 
        "k": 2
      }
    }
  }
}

In the sad case, the "vector" field is provided with a string value ("sunny") instead of the required list of integers or floats, violating the type constraints of the knn query builder. As a result, an exception will be thrown during the construction of the query builder, preventing the search request from reaching the search request processors.

Scope:

The initial processing of the search request by the search request processors is decoupled from the validation and construction of the query builders.
The query body inside the template query type can be validated against the type constraints of the respective query builders at a later stage in the processing pipeline.

Proposed Design:

To allow the initial search request to pass the search request processors, we are introducing a template query type, which contains the query body.

Instead of directly constructing the query (e.g., knn query) in the initial search request, the query body is wrapped inside a template query type.
The template query type acts as a container for the actual query body, allowing the search request processors to accept the initial search request without performing strict type checking or validation on the query body.
The search request processors will process the initial search request as usual, but they will not attempt to construct or validate the query builders based on the query body inside the template query type.
After the search request processors have finished their processing, the query body inside the template query type can be extracted and validated against the type constraints of the respective query builders (e.g., knn query builder).
If the query body inside the template query type is valid and meets the type constraints of the query builders, it can be used to construct the actual query and execute the search.
If the query body inside the template query type is invalid or violates the type constraints of the query builders, appropriate error handling or fallback mechanisms can be implemented.

for the same example above, here is the sample curl command using query extensions:

GET my-knn-index-1/_search
{
  "query": {
    "template": {
      "knn": {
        "my_vector": {
          "vector": "${vector}", // this is the field generated from ml_inference search request processor 
          "k": 2
        }
      }
    }
  },
  "ext": {
    "ml_inference": {
      "params": {
        "text": "sunny"
      }
    }
  }
}

combing with a ml_inference search request processor and query extension:

this is the sample search pipeline config:

PUT /_search/pipeline/my_pipeline
{
  "request_processors": [
    {
      "ml_inference": {
        "tag": "ml_inference",
        "description": "This processor is going to run ml inference during search request",
        "model_id": "<model_id>",
        "input_map": [
          {
            "inputs": "ext.ml_inference.params.text"
          }
        ],
        "output_map": [
          {
            "ext.ml_inference.params.vector": "response"
          }
        ],
        "ignore_missing":false,
        "ignore_failure": false
        
      }
    }
  ]
}

after ml inference search request processor run, it will rewrite to the new search requests as follows:

GET my-knn-index-1/_search
{
  "query": {
    "template": {
      "knn": {
        "my_vector": {
          "vector": [1,2,3], // this is the result substituted by from ml_inference search request processor 
          "k": 2
        }
      }
    }
  },
  "ext": {
    "ml_inference": {
      "params": {
        "text": "sunny",
        "vector: [1,2,3]
      }
    }
  }
}

By using the template query type, the initial search request can bypass the strict type checking and validation of the query builders during the initial processing by the search request processors. This allows the search request to flow through the search request processors, even if the query body contains invalid or incorrect data types.

After the search request processors have completed their processing, the query body inside the template query type can be validated and processed according to the type constraints of the respective query builders.

This approach separates the initial processing of the search request from the validation and construction of the query builders, allowing for more flexibility and error handling in the overall search request processing pipeline.

Limitations:

template query cannot be execute (doToQuery) without a search request processor that can help rewrite the query string, for example (ml_inference search request processors).
If the new search request is invalid, the efforts spent in search request rewrite is wasted.

The text was updated successfully, but these errors were encountered:

yuye-aws · 2024-09-24T02:19:48Z

From your example, it seems that vector is the embedding of text sunny. A better option is to direct use the neural query. Can you provide more examples?

austintlee · 2024-09-24T18:00:57Z

Will this build on this feature - https://opensearch.org/docs/latest/api-reference/search-template/?

mingshl · 2024-09-25T18:25:11Z

From your example, it seems that vector is the embedding of text sunny. A better option is to direct use the neural query. Can you provide more examples?

Right, this is limitation to knn query which requires a list of vectors. For string type, we can use neural query to pass a string to model input(even though we cannot parse different model output format, it requires post processing functions).

Extending use cases, for example,
when user input

array
image bytes
map
...

these can be send to model inputs and generate vectors but it cannot be passed to the existing query builders

yuye-aws · 2024-09-26T03:08:40Z

Extending use cases, for example, when user input

array

image bytes

map
...

Would you like to provide a few examples to showcase how to use your feature with ml_inference?

mingshl added enhancement New feature or request untriaged labels Sep 23, 2024

mingshl changed the title ~~[FEATURE] Introduce Template Query to OpenSearch~~ [RFC] Introduce Template Query to OpenSearch Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Introduce Template Query to OpenSearch #2979

[RFC] Introduce Template Query to OpenSearch #2979

mingshl commented Sep 23, 2024 •

edited

Loading

yuye-aws commented Sep 24, 2024

austintlee commented Sep 24, 2024

mingshl commented Sep 25, 2024

yuye-aws commented Sep 26, 2024

[RFC] Introduce Template Query to OpenSearch #2979

[RFC] Introduce Template Query to OpenSearch #2979

Comments

mingshl commented Sep 23, 2024 • edited Loading

Problem Statement:

Scope:

Proposed Design:

Limitations:

yuye-aws commented Sep 24, 2024

austintlee commented Sep 24, 2024

mingshl commented Sep 25, 2024

yuye-aws commented Sep 26, 2024

mingshl commented Sep 23, 2024 •

edited

Loading