Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion docs/en/guides/01-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ Embedding model configuration for vector search, supporting dense, sparse, and h
{
"embedding": {
"max_concurrent": 10,
"max_retries": 3,
"dense": {
"provider": "volcengine",
"api_key": "your-api-key",
Expand All @@ -115,13 +116,16 @@ Embedding model configuration for vector search, supporting dense, sparse, and h
| Parameter | Type | Description |
|-----------|------|-------------|
| `max_concurrent` | int | Maximum concurrent embedding requests (`embedding.max_concurrent`, default: `10`) |
| `max_retries` | int | Maximum retry attempts for transient embedding provider errors (`embedding.max_retries`, default: `3`; `0` disables retry) |
| `provider` | str | `"volcengine"`, `"openai"`, `"vikingdb"`, `"jina"`, `"voyage"`, or `"gemini"` |
| `api_key` | str | API key |
| `model` | str | Model name |
| `dimension` | int | Vector dimension. For Voyage, this maps to `output_dimension` |
| `input` | str | Input type: `"text"` or `"multimodal"` |
| `batch_size` | int | Batch size for embedding requests |

`embedding.max_retries` only applies to transient errors such as `429`, `5xx`, timeouts, and connection failures. Permanent errors such as `400`, `401`, `403`, and `AccountOverdue` are not retried automatically. The backoff strategy is exponential backoff with jitter, starting at `0.5s` and capped at `8s`.

**Available Models**

| Model | Dimension | Input Type | Notes |
Expand Down Expand Up @@ -355,7 +359,8 @@ Vision Language Model for semantic extraction (L0/L1 generation).
"vlm": {
"api_key": "your-api-key",
"model": "doubao-seed-2-0-pro-260215",
"api_base": "https://ark.cn-beijing.volces.com/api/v3"
"api_base": "https://ark.cn-beijing.volces.com/api/v3",
"max_retries": 3
}
}
```
Expand All @@ -369,9 +374,12 @@ Vision Language Model for semantic extraction (L0/L1 generation).
| `api_base` | str | API endpoint (optional) |
| `thinking` | bool | Enable thinking mode for VolcEngine models (default: `false`) |
| `max_concurrent` | int | Maximum concurrent semantic LLM calls (default: `100`) |
| `max_retries` | int | Maximum retry attempts for transient VLM provider errors (default: `3`; `0` disables retry) |
| `extra_headers` | object | Custom HTTP headers (for OpenAI-compatible providers, optional) |
| `stream` | bool | Enable streaming mode (for OpenAI-compatible providers, default: `false`) |

`vlm.max_retries` only applies to transient errors such as `429`, `5xx`, timeouts, and connection failures. Permanent authentication, authorization, and billing errors are not retried automatically. The backoff strategy is exponential backoff with jitter, starting at `0.5s` and capped at `8s`.

**Available Models**

| Model | Notes |
Expand Down Expand Up @@ -956,6 +964,7 @@ For detailed encryption explanations, see [Data Encryption](../concepts/10-encry
{
"embedding": {
"max_concurrent": 10,
"max_retries": 3,
"dense": {
"provider": "volcengine",
"api_key": "string",
Expand All @@ -971,6 +980,7 @@ For detailed encryption explanations, see [Data Encryption](../concepts/10-encry
"api_base": "string",
"thinking": false,
"max_concurrent": 100,
"max_retries": 3,
"extra_headers": {},
"stream": false
},
Expand Down Expand Up @@ -1058,7 +1068,9 @@ Error: VLM request timeout

- Check network connectivity
- Increase timeout in config
- For intermittent timeouts, increase `vlm.max_retries` moderately
- Try a smaller model
- For bulk ingestion, consider lowering `vlm.max_concurrent`

### Rate Limiting

Expand All @@ -1067,6 +1079,8 @@ Error: Rate limit exceeded
```

Volcengine has rate limits. Consider batch processing with delays or upgrading your plan.
- Lower `embedding.max_concurrent` / `vlm.max_concurrent` first
- Keep a small `max_retries` value for occasional `429`s; set it to `0` if you prefer fail-fast behavior

## Related Documentation

Expand Down
16 changes: 15 additions & 1 deletion docs/zh/guides/01-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@ OpenViking 使用 JSON 配置文件(`ov.conf`)进行设置。配置文件支
{
"embedding": {
"max_concurrent": 10,
"max_retries": 3,
"dense": {
"provider": "volcengine",
"api_key": "your-api-key",
Expand All @@ -121,13 +122,16 @@ OpenViking 使用 JSON 配置文件(`ov.conf`)进行设置。配置文件支
| 参数 | 类型 | 说明 |
|------|------|------|
| `max_concurrent` | int | 最大并发 Embedding 请求数(`embedding.max_concurrent`,默认:`10`) |
| `max_retries` | int | Embedding provider 瞬时错误的最大重试次数(`embedding.max_retries`,默认:`3`;`0` 表示禁用重试) |
| `provider` | str | `"volcengine"`、`"openai"`、`"vikingdb"`、`"jina"`、`"voyage"`、`"minimax"` 或 `"gemini"` |
| `api_key` | str | API Key |
| `model` | str | 模型名称 |
| `dimension` | int | 向量维度 |
| `input` | str | 输入类型:`"text"` 或 `"multimodal"` |
| `batch_size` | int | 批量请求大小 |

`embedding.max_retries` 仅对瞬时错误生效,例如 `429`、`5xx`、超时和连接错误;`400`、`401`、`403`、`AccountOverdue` 这类永久错误不会自动重试。退避策略为指数退避,初始延迟 `0.5s`,上限 `8s`,并带随机抖动。

**可用模型**

| 模型 | 维度 | 输入类型 | 说明 |
Expand Down Expand Up @@ -330,7 +334,8 @@ OpenViking 使用 JSON 配置文件(`ov.conf`)进行设置。配置文件支
"provider": "volcengine",
"api_key": "your-api-key",
"model": "doubao-seed-2-0-pro-260215",
"api_base": "https://ark.cn-beijing.volces.com/api/v3"
"api_base": "https://ark.cn-beijing.volces.com/api/v3",
"max_retries": 3
}
}
```
Expand All @@ -344,9 +349,12 @@ OpenViking 使用 JSON 配置文件(`ov.conf`)进行设置。配置文件支
| `api_base` | str | API 端点(可选) |
| `thinking` | bool | 启用思考模式(仅对部分火山模型生效,默认:`false`) |
| `max_concurrent` | int | 语义处理阶段 LLM 最大并发调用数(默认:`100`) |
| `max_retries` | int | VLM provider 瞬时错误的最大重试次数(默认:`3`;`0` 表示禁用重试) |
| `extra_headers` | object | 自定义 HTTP 请求头(OpenAI 兼容 provider 可用,可选) |
| `stream` | bool | 启用流式模式(OpenAI 兼容 provider 可用,默认:`false`) |

`vlm.max_retries` 仅对瞬时错误生效,例如 `429`、`5xx`、超时和连接错误;认证、鉴权、欠费等永久错误不会自动重试。退避策略为指数退避,初始延迟 `0.5s`,上限 `8s`,并带随机抖动。

**可用模型**

| 模型 | 说明 |
Expand Down Expand Up @@ -933,6 +941,7 @@ openviking --account acme --user alice --agent-id assistant-2 ls viking://
{
"embedding": {
"max_concurrent": 10,
"max_retries": 3,
"dense": {
"provider": "volcengine",
"api_key": "string",
Expand All @@ -948,6 +957,7 @@ openviking --account acme --user alice --agent-id assistant-2 ls viking://
"api_base": "string",
"thinking": false,
"max_concurrent": 100,
"max_retries": 3,
"extra_headers": {},
"stream": false
},
Expand Down Expand Up @@ -1035,7 +1045,9 @@ Error: VLM request timeout

- 检查网络连接
- 增加配置中的超时时间
- 对偶发超时,适当增大 `vlm.max_retries`
- 尝试更小的模型
- 如为批量导入场景,结合降低 `vlm.max_concurrent`

### 速率限制

Expand All @@ -1044,6 +1056,8 @@ Error: Rate limit exceeded
```

火山引擎有速率限制。考虑批量处理时添加延迟或升级套餐。
- 优先降低 `embedding.max_concurrent` / `vlm.max_concurrent`
- 对偶发 `429` 可保留少量 `max_retries`;若希望快速失败,可将其设为 `0`

## 相关文档

Expand Down
13 changes: 12 additions & 1 deletion openviking/models/embedder/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
from dataclasses import dataclass
from typing import Any, Callable, Dict, List, Optional, TypeVar

from openviking.utils.model_retry import retry_sync

T = TypeVar("T")


Expand Down Expand Up @@ -74,6 +76,7 @@ def __init__(self, model_name: str, config: Optional[Dict[str, Any]] = None):
"""
self.model_name = model_name
self.config = config or {}
self.max_retries = int(self.config.get("max_retries", 3))

@abstractmethod
def embed(self, text: str, is_query: bool = False) -> EmbedResult:
Expand Down Expand Up @@ -104,6 +107,14 @@ def close(self):
"""Release resources, subclasses can override as needed"""
pass

def _run_with_retry(self, func: Callable[[], T], *, logger=None, operation_name: str) -> T:
return retry_sync(
func,
max_retries=self.max_retries,
logger=logger,
operation_name=operation_name,
)

@property
def is_dense(self) -> bool:
"""Check if result contains dense vector"""
Expand Down Expand Up @@ -255,7 +266,7 @@ def embed_batch(self, texts: List[str], is_query: bool = False) -> List[EmbedRes

return [
EmbedResult(dense_vector=d.dense_vector, sparse_vector=s.sparse_vector)
for d, s in zip(dense_results, sparse_results)
for d, s in zip(dense_results, sparse_results, strict=True)
]

def get_dimension(self) -> int:
Expand Down
39 changes: 32 additions & 7 deletions openviking/models/embedder/gemini_embedders.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,9 +151,9 @@ def __init__(
api_key=api_key,
http_options=HttpOptions(
retry_options=HttpRetryOptions(
attempts=3,
initial_delay=1.0,
max_delay=30.0,
attempts=max(self.max_retries + 1, 1),
initial_delay=0.5,
max_delay=8.0,
exp_base=2.0,
)
),
Expand Down Expand Up @@ -207,15 +207,25 @@ def embed(
task_type = self.query_param
elif not is_query and self.document_param:
task_type = self.document_param

# SDK accepts plain str; converts to REST Parts format internally.
try:
def _call() -> EmbedResult:
result = self.client.models.embed_content(
model=self.model_name,
contents=text,
config=self._build_config(task_type=task_type, title=title),
)
vector = truncate_and_normalize(list(result.embeddings[0].values), self._dimension)
return EmbedResult(dense_vector=vector)

try:
if _HTTP_RETRY_AVAILABLE:
return _call()
return self._run_with_retry(
_call,
logger=logger,
operation_name="Gemini embedding",
)
except (APIError, ClientError) as e:
_raise_api_error(e, self.model_name)

Expand All @@ -233,7 +243,7 @@ def embed_batch(
if titles is not None:
return [
self.embed(text, is_query=is_query, task_type=task_type, title=title)
for text, title in zip(texts, titles)
for text, title in zip(texts, titles, strict=True)
]
# Resolve effective task_type from is_query when no explicit override
if task_type is None:
Expand All @@ -253,14 +263,29 @@ def embed_batch(
continue

non_empty_texts = [batch[j] for j in non_empty_indices]
try:

def _call_batch(
non_empty_texts: List[str] = non_empty_texts,
config: types.EmbedContentConfig = config,
) -> Any:
response = self.client.models.embed_content(
model=self.model_name,
contents=non_empty_texts,
config=config,
)
return response

try:
if _HTTP_RETRY_AVAILABLE:
response = _call_batch()
else:
response = self._run_with_retry(
_call_batch,
logger=logger,
operation_name="Gemini batch embedding",
)
batch_results = [None] * len(batch)
for j, emb in zip(non_empty_indices, response.embeddings):
for j, emb in zip(non_empty_indices, response.embeddings, strict=True):
batch_results[j] = EmbedResult(
dense_vector=truncate_and_normalize(list(emb.values), self._dimension)
)
Expand Down
22 changes: 20 additions & 2 deletions openviking/models/embedder/jina_embedders.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
# SPDX-License-Identifier: AGPL-3.0
"""Jina AI Embedder Implementation"""

import logging
from typing import Any, Dict, List, Optional

import openai
Expand All @@ -11,6 +12,8 @@
EmbedResult,
)

logger = logging.getLogger(__name__)

# Default dimensions for Jina embedding models
JINA_MODEL_DIMENSIONS = {
"jina-embeddings-v5-text-small": 1024, # 677M params, max seq 32768
Expand Down Expand Up @@ -165,7 +168,8 @@ def embed(self, text: str, is_query: bool = False) -> EmbedResult:
Raises:
RuntimeError: When API call fails
"""
try:

def _call() -> EmbedResult:
kwargs: Dict[str, Any] = {"input": text, "model": self.model_name}
if self.dimension:
kwargs["dimensions"] = self.dimension
Expand All @@ -178,6 +182,13 @@ def embed(self, text: str, is_query: bool = False) -> EmbedResult:
vector = response.data[0].embedding

return EmbedResult(dense_vector=vector)

try:
return self._run_with_retry(
_call,
logger=logger,
operation_name="Jina embedding",
)
except openai.APIError as e:
self._raise_task_error(e)
raise RuntimeError(f"Jina API error: {e.message}") from e
Expand All @@ -200,7 +211,7 @@ def embed_batch(self, texts: List[str], is_query: bool = False) -> List[EmbedRes
if not texts:
return []

try:
def _call() -> List[EmbedResult]:
kwargs: Dict[str, Any] = {"input": texts, "model": self.model_name}
if self.dimension:
kwargs["dimensions"] = self.dimension
Expand All @@ -212,6 +223,13 @@ def embed_batch(self, texts: List[str], is_query: bool = False) -> List[EmbedRes
response = self.client.embeddings.create(**kwargs)

return [EmbedResult(dense_vector=item.embedding) for item in response.data]

try:
return self._run_with_retry(
_call,
logger=logger,
operation_name="Jina batch embedding",
)
except openai.APIError as e:
self._raise_task_error(e)
raise RuntimeError(f"Jina API error: {e.message}") from e
Expand Down
19 changes: 17 additions & 2 deletions openviking/models/embedder/litellm_embedders.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,13 +154,21 @@ def embed(self, text: str, is_query: bool = False) -> EmbedResult:
Raises:
RuntimeError: When embedding call fails
"""
try:

def _call() -> EmbedResult:
kwargs = self._build_kwargs(is_query=is_query)
kwargs["input"] = [text]
response = litellm.embedding(**kwargs)
self._update_telemetry_token_usage(response)
vector = response.data[0]["embedding"]
return EmbedResult(dense_vector=vector)

try:
return self._run_with_retry(
_call,
logger=logger,
operation_name="LiteLLM embedding",
)
except Exception as e:
raise RuntimeError(f"LiteLLM embedding failed: {e}") from e

Expand All @@ -180,12 +188,19 @@ def embed_batch(self, texts: List[str], is_query: bool = False) -> List[EmbedRes
if not texts:
return []

try:
def _call() -> List[EmbedResult]:
kwargs = self._build_kwargs(is_query=is_query)
kwargs["input"] = texts
response = litellm.embedding(**kwargs)
self._update_telemetry_token_usage(response)
return [EmbedResult(dense_vector=item["embedding"]) for item in response.data]

try:
return self._run_with_retry(
_call,
logger=logger,
operation_name="LiteLLM batch embedding",
)
except Exception as e:
raise RuntimeError(f"LiteLLM batch embedding failed: {e}") from e

Expand Down
Loading
Loading