Skip to content

Conversation

@mkhludnev
Copy link

@mkhludnev mkhludnev commented Nov 26, 2025

Description

This contributes connector blueprint for Yandex Cloud.

Check List

  • [v] New functionality has been documented.
  • [v] Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Note for reviews

I've contributed to OpenSearch before.
And thanks for reviewing it!

Summary by CodeRabbit

  • Documentation
    • Added a guide for configuring a Yandex Cloud AI Studio embeddings connector: cluster trust setup, creating a connector with API-key auth, specifying model parameters and predict action, registering and deploying embedding models, sample inference requests/responses, placeholder/token usage and bearer token guidance, required roles, note on separate connectors for query vs. document processing, and guidance on pre/post-processing hooks.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 26, 2025

Walkthrough

A new documentation file adds a Yandex Cloud AI Studio connector embedding standard blueprint, detailing cluster connection, creating a YC Embeddings connector (parameters, credentials, predict action, hooks), registering/deploying embedding models, and testing inference with sample requests and responses.

Changes

Cohort / File(s) Change Summary
Yandex Cloud Embedding Blueprint Documentation
docs/remote_inference_blueprints/standard_blueprints/yandexcloud_connector_embedding_standard_blueprint.md
Added new end-to-end blueprint documenting: cluster connection settings; creating a YC Embeddings connector (name, description, version, protocol, modelUri, folder_id, api_key); predict action payload, headers, and sample _predict request/response; pre/post-processing hook conventions; placeholder substitution and bearer token/role guidance; note on separate connectors for query vs. document processing.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

  • Single new documentation file; review focus: correctness of examples, placeholder substitution, bearer token guidance, and clarity of pre/post processing hook descriptions.

Poem

🐇 I nibbled on a JSON byte,
Sprinkled headers through the night,
Model URIs in a row,
Embeddings hum and softly glow,
Blueprint stitched — the hop's just right.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description is missing the 'Related Issues' section and incomplete checklist items. Only two of four checklist items are checked, and API specification/documentation PRs sections are not addressed. Add the 'Related Issues' section (even if 'N/A'), complete the remaining checklist items, and clarify whether API specification or public documentation PRs were created.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'add Yandex Cloud embeddings connector blueprint' directly and clearly describes the main change - adding documentation for a Yandex Cloud embeddings connector blueprint.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d8dc2c and 690bc1f.

📒 Files selected for processing (1)
  • docs/remote_inference_blueprints/standard_blueprints/yandexcloud_connector_embedding_standard_blueprint.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/remote_inference_blueprints/standard_blueprints/yandexcloud_connector_embedding_standard_blueprint.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: spotless

Comment @coderabbitai help to get the list of available commands and usage tips.

@mkhludnev mkhludnev requested a deployment to ml-commons-cicd-env-require-approval November 26, 2025 22:20 — with GitHub Actions Waiting
@mkhludnev mkhludnev requested a deployment to ml-commons-cicd-env-require-approval November 26, 2025 22:20 — with GitHub Actions Waiting
@mkhludnev mkhludnev requested a deployment to ml-commons-cicd-env-require-approval November 26, 2025 22:20 — with GitHub Actions Waiting
@mkhludnev mkhludnev requested a deployment to ml-commons-cicd-env-require-approval November 26, 2025 22:20 — with GitHub Actions Waiting
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 87e742e and 8a05765.

📒 Files selected for processing (1)
  • docs/remote_inference_blueprints/standard_blueprints/yandexcloud_connector_embedding_standard_blueprint.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/remote_inference_blueprints/standard_blueprints/yandexcloud_connector_embedding_standard_blueprint.md

[grammar] ~120-~120: Ensure spelling is correct
Context: ...of life?" } } Sample response of Yadex Cloud AI Studio Embedding: json { ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: spotless
🔇 Additional comments (1)
docs/remote_inference_blueprints/standard_blueprints/yandexcloud_connector_embedding_standard_blueprint.md (1)

45-46: Verify pre/post-processing functions are correct for Yandex Cloud.

The connector references bedrock pre/post-processing functions, but this blueprint is for Yandex Cloud. Verify that these processing functions are:

  1. Generic/universal and work correctly with Yandex Cloud API responses, or
  2. Should be replaced with Yandex-specific processing functions.

If these are not the correct functions for Yandex Cloud, update them accordingly.

@mkhludnev mkhludnev requested a deployment to ml-commons-cicd-env-require-approval November 26, 2025 22:29 — with GitHub Actions Waiting
@mkhludnev mkhludnev requested a deployment to ml-commons-cicd-env-require-approval November 26, 2025 22:29 — with GitHub Actions Waiting
@mkhludnev mkhludnev requested a deployment to ml-commons-cicd-env-require-approval November 26, 2025 22:29 — with GitHub Actions Waiting
@mkhludnev mkhludnev requested a deployment to ml-commons-cicd-env-require-approval November 26, 2025 22:29 — with GitHub Actions Waiting
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8a05765 and 5d8dc2c.

📒 Files selected for processing (1)
  • docs/remote_inference_blueprints/standard_blueprints/yandexcloud_connector_embedding_standard_blueprint.md (1 hunks)
🔇 Additional comments (1)
docs/remote_inference_blueprints/standard_blueprints/yandexcloud_connector_embedding_standard_blueprint.md (1)

45-46: Clarify why Bedrock pre/post-processing functions are used for Yandex Cloud.

The pre/post-processing functions reference bedrock for a Yandex Cloud connector. Clarify whether Yandex's request/response format is compatible with Bedrock's processing, or if Yandex-specific processing functions should be used instead.

If compatibility is intentional, add a brief comment explaining why Bedrock functions are appropriate here. If these should be Yandex-specific, update them accordingly.

Comment on lines 78 to 107
```json
POST /_plugins/_ml/models/_register
{
"name": "yc-embedding",
"function_name": "remote",
"model_group_id": "4THNtZoBdUNOOrVAzj_V",
"description": "YC embedding model",
"connector_id": "CTEou5oBdUNOOrVArUAU"
}
```


```json
POST /_plugins/_ml/models/_register
{
"name": "YC text embedding model",
"function_name": "remote",
"description": "test model",
"connector_id": "nzh9PIsBnGXNcxYpPEcv"
}
```

Sample response:
```json
{
"task_id": "5THZtZoBdUNOOrVAEj_I",
"status": "CREATED",
"model_id": "CzEou5oBdUNOOrVA10Db"
}
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove or clarify the duplicate model registration (lines 90–98).

The section shows two separate model registrations:

  1. Lines 79–87: yc-embedding with the connector created in section 2 (CTEou5oBdUNOOrVArUAU) and a model_group_id
  2. Lines 90–98: YC text embedding model with a hardcoded, unrelated connector_id (nzh9PIsBnGXNcxYpPEcv) and no model_group_id

The second registration uses a connector_id that does not match the one established earlier, making it inconsistent and confusing. Users will not know which registration to follow. The sample response and test section both reference the second registration's model_id, but the instructions don't explain why two registrations are shown.

Either remove the second registration if it's leftover code, or clearly explain when/why to use both variants.

If the second registration should be removed, apply this diff:

 ```json
 POST /_plugins/_ml/models/_register
 {
     "name": "yc-embedding",
     "function_name": "remote",
     "model_group_id": "4THNtZoBdUNOOrVAzj_V",
     "description": "YC embedding model",
     "connector_id": "CTEou5oBdUNOOrVArUAU"
 }

-```json
-POST /_plugins/_ml/models/_register
-{

  • "name": "YC text embedding model",
  • "function_name": "remote",
  • "description": "test model",
  • "connector_id": "nzh9PIsBnGXNcxYpPEcv"
    -}
    -```

Sample response:


Then update the sample response and test section to use the first registration's model IDs consistently.

<!-- This is an auto-generated comment by CodeRabbit -->

Signed-off-by: Mikhail Khludnev <[email protected]>
@mkhludnev mkhludnev requested a deployment to ml-commons-cicd-env-require-approval November 27, 2025 06:29 — with GitHub Actions Waiting
@mkhludnev mkhludnev requested a deployment to ml-commons-cicd-env-require-approval November 27, 2025 06:29 — with GitHub Actions Waiting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant