Add text embedding processor #304

miguel-vila · 2024-05-31T13:52:30Z

Signed-off-by: miguel-vila [email protected]

Description

Adds a definition for the text embedding processor: https://opensearch.org/docs/latest/ingest-pipelines/processors/text-embedding/

Please confirm whether the field_map definition makes sense

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: miguel-vila <[email protected]>

dblock · 2024-05-31T14:06:52Z

Thank you!

We just merged a test framework in #299. Want to try to add a test for this API? Check out https://github.com/opensearch-project/opensearch-api-specification/blob/main/tests/index_lifecycle.yaml for an example. You can run it with npm run test:spec against a live local instance of OpenSearch (documentation incoming).

github-actions · 2024-05-31T14:07:23Z

Changes Analysis

Commit SHA: 32f992a
Comparing To SHA: 6da4a7c

API Changes

Summary

└─┬Components
  ├──[➕] schemas (43686:7)
  ├─┬ingest._common:ProcessorContainer
  │ └──[➕] properties (43570:9)
  └─┬ingest._common:Pipeline
    └──[➖] required (43464:11)❌

Document Element	Total Changes	Breaking Changes
components	3	1

❌ BREAKING Changes: 1 out of 3
Removals: 1
Additions: 2
Breaking Removals: 1

Report

The full API changes report is available at: https://github.com/opensearch-project/opensearch-api-specification/actions/runs/9356073256/artifacts/1564049761

API Coverage

	Before	After	Δ
Covered (%)	476 (46.62 %)	476 (46.62 %)	0 (0 %)
Uncovered (%)	545 (53.38 %)	545 (53.38 %)	0 (0 %)
Unknown	24	24	0

dblock · 2024-05-31T14:14:21Z

@nhtruong @Xtansia spec looks good to you?

Signed-off-by: miguel-vila <[email protected]>

miguel-vila · 2024-05-31T16:00:35Z

@dblock I have added a test at ee776b4 but it fails locally because the response doesn't seem to include the _meta field. Might look into it later but not sure if it's because there's some real misalignment between the spec and the implementation.

dblock · 2024-05-31T17:41:45Z

_meta

So likely a bug! That's why we need tests ;)

nhtruong · 2024-06-03T19:09:57Z

@dblock @miguel-vila Small discrepancies between the spec and the actual implementation of OS are to be expected right now. Much of spec of the core features that we have is inherited from ElasticSearch. Changes to OS since have not been reflected in the spec. We will need help from OS core team to review the spec of the core features. For now, @miguel-vila, you can remove the _meta as a required property.

Signed-off-by: miguel-vila <[email protected]>

dblock

I think we're going to have many ingest pipeline tests. How about we organize things in folders that match the schema?

So this test should probably go into something like tests/ingest/ingest_with_text_embedding_processor.yaml?

It would be great if the story included a GET of the pipeline, maybe even used the processor.

dblock · 2024-06-03T19:22:51Z

tests/text_embedding_processor.yaml

+skip: false
+description: |
+  This test story checks that we can create an ingest pipeline with a text 
+  embedding processor


Let's sweat some small stuff since this is new. Add a period. Maybe shorten, "Create and use an ingest pipeline with a text embedding processor."

dblock · 2024-06-03T19:23:07Z

tests/text_embedding_processor.yaml

+    method: DELETE
+    status: [200, 404]
+chapters:
+  - synopsis: Create ingest pipeline for text embedding


Create an ingest pipeline ... + add a period.

nhtruong · 2024-06-05T14:31:39Z

I think this can be merge now. We address those minor wording issues later in the test.

miguel-vila · 2024-06-05T15:13:12Z

I could address some of the bigger changes in a separate PR.

In particular, I wanted to test a whole flow of creating a model and using search against it (I think the search was failing due to the model not existing, which makes sense because I set some random "text-embedding-model" as model id) but then I noticed your test framework would need some extensions:

allow remembering some ids (e.g. the model id) and using them in later requests
allow retrying some requests (we need to wait for the model to be deployed)

I have some work in progress in that direction but these might be big/controversial changes so would be good to know your thoughts, might push my WIP PR later.

miguel-vila · 2024-06-05T15:25:48Z

Created #315 to show you the possible changes, it's still in a very rough shape.

Add text embedding processor

eda5009

Signed-off-by: miguel-vila <[email protected]>

miguel-vila requested review from dblock, harshavamsi, sachetalva, nhtruong, Xtansia, VachaShah, Tokesh and aabeshov as code owners May 31, 2024 13:52

miguel-vila mentioned this pull request May 31, 2024

[FEATURE] Support text embedding processor opensearch-project/opensearch-java#1005

Closed

dblock mentioned this pull request May 31, 2024

[FEATURE] Expand API coverage to include parameters and response bodies #305

Open

dblock previously approved these changes May 31, 2024

View reviewed changes

Add test for pipeline creation with text_embedding processor

ee776b4

Signed-off-by: miguel-vila <[email protected]>

miguel-vila dismissed dblock’s stale review via ee776b4 May 31, 2024 15:58

remove get from test

32f992a

Signed-off-by: miguel-vila <[email protected]>

miguel-vila force-pushed the add-text-embedding-processor branch from 3f3c584 to 32f992a Compare June 3, 2024 19:24

dblock reviewed Jun 3, 2024

View reviewed changes

nhtruong approved these changes Jun 5, 2024

View reviewed changes

nhtruong merged commit 7acd0fc into opensearch-project:main Jun 5, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add text embedding processor #304

Add text embedding processor #304

miguel-vila commented May 31, 2024

dblock commented May 31, 2024

github-actions bot commented May 31, 2024 •

edited

Loading

dblock commented May 31, 2024

miguel-vila commented May 31, 2024

dblock commented May 31, 2024

nhtruong commented Jun 3, 2024 •

edited

Loading

dblock left a comment

dblock Jun 3, 2024

dblock Jun 3, 2024

nhtruong commented Jun 5, 2024

miguel-vila commented Jun 5, 2024

miguel-vila commented Jun 5, 2024

Add text embedding processor #304

Add text embedding processor #304

Conversation

miguel-vila commented May 31, 2024

Description

dblock commented May 31, 2024

github-actions bot commented May 31, 2024 • edited Loading

Changes Analysis

API Changes

Summary

Report

API Coverage

dblock commented May 31, 2024

miguel-vila commented May 31, 2024

dblock commented May 31, 2024

nhtruong commented Jun 3, 2024 • edited Loading

dblock left a comment

Choose a reason for hiding this comment

dblock Jun 3, 2024

Choose a reason for hiding this comment

dblock Jun 3, 2024

Choose a reason for hiding this comment

nhtruong commented Jun 5, 2024

miguel-vila commented Jun 5, 2024

miguel-vila commented Jun 5, 2024

github-actions bot commented May 31, 2024 •

edited

Loading

nhtruong commented Jun 3, 2024 •

edited

Loading