Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Introduce Template Query to OpenSearch #2979

Open
mingshl opened this issue Sep 23, 2024 · 4 comments
Open

[RFC] Introduce Template Query to OpenSearch #2979

mingshl opened this issue Sep 23, 2024 · 4 comments
Labels
enhancement New feature or request untriaged

Comments

@mingshl
Copy link
Collaborator

mingshl commented Sep 23, 2024

Problem Statement:

When using search request processors, users need to send an initial search request with properly constructed query builders. However, if the initial request fails to meet the type constraints of the query builders, it will be rejected, and the search request cannot be processed by the search request processors.

The data flow is as follows:

Initial Search Request -> Search Request Processors -> New Search Request

However, when constructing the initial request every query builder has type constrains. For example, for knn query,

(Happy case) this is a valid query accepted by knn query builder, and it can be passed to search request processors:

GET my-knn-index-1/_search
{
  "query": {
    "knn": {
      "my_vector": {
        "vector": [2, 3, 5, 6], // vector field requires a list of int/float 
        "k": 2
      }
    }
  }
}

(Sad case) this is not a valid query that would be throwing exceptions when constructing knn query builder. It cannot reach search request processors.

GET my-knn-index-1/_search
{
  "query": {
    "knn": {
      "my_vector": {
        "vector": "sunny", // vector field requires a list of int/float 
        "k": 2
      }
    }
  }
}

In the sad case, the "vector" field is provided with a string value ("sunny") instead of the required list of integers or floats, violating the type constraints of the knn query builder. As a result, an exception will be thrown during the construction of the query builder, preventing the search request from reaching the search request processors.

Scope:

  1. The initial processing of the search request by the search request processors is decoupled from the validation and construction of the query builders.
  2. The query body inside the template query type can be validated against the type constraints of the respective query builders at a later stage in the processing pipeline.

Proposed Design:

To allow the initial search request to pass the search request processors, we are introducing a template query type, which contains the query body.

  1. Instead of directly constructing the query (e.g., knn query) in the initial search request, the query body is wrapped inside a template query type.
  2. The template query type acts as a container for the actual query body, allowing the search request processors to accept the initial search request without performing strict type checking or validation on the query body.
  3. The search request processors will process the initial search request as usual, but they will not attempt to construct or validate the query builders based on the query body inside the template query type.
  4. After the search request processors have finished their processing, the query body inside the template query type can be extracted and validated against the type constraints of the respective query builders (e.g., knn query builder).
  5. If the query body inside the template query type is valid and meets the type constraints of the query builders, it can be used to construct the actual query and execute the search.
  6. If the query body inside the template query type is invalid or violates the type constraints of the query builders, appropriate error handling or fallback mechanisms can be implemented.

for the same example above, here is the sample curl command using query extensions:

GET my-knn-index-1/_search
{
  "query": {
    "template": {
      "knn": {
        "my_vector": {
          "vector": "${vector}", // this is the field generated from ml_inference search request processor 
          "k": 2
        }
      }
    }
  },
  "ext": {
    "ml_inference": {
      "params": {
        "text": "sunny"
      }
    }
  }
}

combing with a ml_inference search request processor and query extension:

this is the sample search pipeline config:

PUT /_search/pipeline/my_pipeline
{
  "request_processors": [
    {
      "ml_inference": {
        "tag": "ml_inference",
        "description": "This processor is going to run ml inference during search request",
        "model_id": "<model_id>",
        "input_map": [
          {
            "inputs": "ext.ml_inference.params.text"
          }
        ],
        "output_map": [
          {
            "ext.ml_inference.params.vector": "response"
          }
        ],
        "ignore_missing":false,
        "ignore_failure": false
        
      }
    }
  ]
}

after ml inference search request processor run, it will rewrite to the new search requests as follows:

GET my-knn-index-1/_search
{
  "query": {
    "template": {
      "knn": {
        "my_vector": {
          "vector": [1,2,3], // this is the result substituted by from ml_inference search request processor 
          "k": 2
        }
      }
    }
  },
  "ext": {
    "ml_inference": {
      "params": {
        "text": "sunny",
        "vector: [1,2,3]
      }
    }
  }
}

By using the template query type, the initial search request can bypass the strict type checking and validation of the query builders during the initial processing by the search request processors. This allows the search request to flow through the search request processors, even if the query body contains invalid or incorrect data types.

After the search request processors have completed their processing, the query body inside the template query type can be validated and processed according to the type constraints of the respective query builders.

This approach separates the initial processing of the search request from the validation and construction of the query builders, allowing for more flexibility and error handling in the overall search request processing pipeline.

Limitations:

  1. template query cannot be execute (doToQuery) without a search request processor that can help rewrite the query string, for example (ml_inference search request processors).
  2. If the new search request is invalid, the efforts spent in search request rewrite is wasted.
@mingshl mingshl added enhancement New feature or request untriaged labels Sep 23, 2024
@yuye-aws
Copy link
Member

From your example, it seems that vector is the embedding of text sunny. A better option is to direct use the neural query. Can you provide more examples?

@mingshl mingshl changed the title [FEATURE] Introduce Template Query to OpenSearch [RFC] Introduce Template Query to OpenSearch Sep 24, 2024
@austintlee
Copy link
Collaborator

Will this build on this feature - https://opensearch.org/docs/latest/api-reference/search-template/?

@mingshl
Copy link
Collaborator Author

mingshl commented Sep 25, 2024

From your example, it seems that vector is the embedding of text sunny. A better option is to direct use the neural query. Can you provide more examples?

Right, this is limitation to knn query which requires a list of vectors. For string type, we can use neural query to pass a string to model input(even though we cannot parse different model output format, it requires post processing functions).

Extending use cases, for example,
when user input

  • array
  • image bytes
  • map
    ...

these can be send to model inputs and generate vectors but it cannot be passed to the existing query builders

@yuye-aws
Copy link
Member

Extending use cases, for example, when user input

  • array
  • image bytes
  • map
    ...

Would you like to provide a few examples to showcase how to use your feature with ml_inference?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request untriaged
Projects
Status: In Progress
Development

No branches or pull requests

3 participants