Skip to content

Commit 53594c7

Browse files
EvenssTeingiRipcord55
authored
feat: Support embedded seekdb (#393)
* feat: Add native hybird search (#197) * add sparse vector embedding * hybrid search add sparse vector search * add checking version logic * add qwen sparse vector * adjust weight * update sparse vector function * update sparse vector function * fix bug * fix bug * optimise function * optimise function * optimise function * optimise function * optimise function * optimise function * optimise function * fix bug * add migrate function * update alembic function * update alembic function * update alembic function * adjust file struct * update alembic * update version * optimise * fix bug * update * update schema update method * update schema update method * update schema update method * update schema update method * update schema update method * update migrate method * update migrate method * update env.example * update env.example * update migrate sparse vector * update migrate sparse vector * adjust threshold score logic * update remark * add guides and examples * add benchmark param * fix bug * fulltext parsers support * adjust enable sparse vector setting * adjust env.example * adjust docs * update version * fix bug * optimise check * adjust file construct * adjust file construct * add native search * add file * remove log * remove log * fix bug * update pyobvector * add rerank * adjust * add limit * adjust config * feat: User Profile Support native language output (#198) * support native language * support native language * add docs * Reconstruct LLM setting (#200) * feat(llm): enhance configuration management with pydantic-settings - Introduced a unified configuration system for LLM providers using pydantic-settings. - Added provider-specific settings for Anthropic, Azure, DeepSeek, Ollama, OpenAI, Qwen, Vllm, and Zai. - Improved environment variable handling and validation through Field and AliasChoices. - Removed legacy initialization methods in favor of a cleaner, more maintainable structure. - Updated LLMFactory to utilize the new provider registration mechanism. * chore: Update LLM configuration management and improve environment variable handling - Refactor LLM configuration imports to use BaseLLMConfig. - Replace direct attribute access with getattr for safer environment variable retrieval. - Remove deprecated LLMConfig and streamline related code for better maintainability. * refactor: unify configuration governance for agent, core, and server modules (#199) * Reconstruct setting in Rerank,Vector,Graph (#202) * feat(llm): enhance configuration management with pydantic-settings - Introduced a unified configuration system for LLM providers using pydantic-settings. - Added provider-specific settings for Anthropic, Azure, DeepSeek, Ollama, OpenAI, Qwen, Vllm, and Zai. - Improved environment variable handling and validation through Field and AliasChoices. - Removed legacy initialization methods in favor of a cleaner, more maintainable structure. - Updated LLMFactory to utilize the new provider registration mechanism. * chore: Update LLM configuration management and improve environment variable handling - Refactor LLM configuration imports to use BaseLLMConfig. - Replace direct attribute access with getattr for safer environment variable retrieval. - Remove deprecated LLMConfig and streamline related code for better maintainability. * feat: Enhance rerank configuration and integration - Introduced BaseRerankConfig for improved configuration management across rerank providers. - Updated rerank integration files to utilize the new base configuration structure. - Added support for additional configuration fields such as api_base_url and top_n. - Refactored rerank factory to accommodate new configuration handling and provider registration. - Removed deprecated RerankConfig and streamlined related code for better maintainability. - Updated API request handling in rerank classes to support custom HTTP clients. * * refactor(powermem): remove unused storage configuration management module - Removed `VectorStoreConfig` and `GraphStoreConfig` classes - Deleted associated validation logic and import statements - Streamlined codebase by eliminating unused components * feat(powermem): enhance sparse embedder configuration management - Introduced BaseSparseEmbedderConfig for unified sparse embedding configuration. - Updated MemoryConfig to utilize BaseSparseEmbedderConfig. - Refactored SparseEmbedderFactory to support new configuration handling. - Improved handling of sparse embedder settings across various components. * feat(powermem): enhance user profile storage with provider registration - Added a registry mechanism to UserProfileStoreBase for automatic provider registration. - Implemented class paths for OceanBase and SQLite user profile storage implementations. - Updated UserProfileStoreFactory to utilize the new registry for provider class retrieval. - Refactored imports to trigger auto-registration of user profile storage classes. - Improved handling of provider names in the factory for better compatibility. * feat(powermem): synchronize embedding model dimensions across configurations - Added logic to sync `embedding_model_dims` from the embedder to both `vector_store` and `graph_store` if not already set. - Updated `config_loader.py` and `configs.py` to ensure consistent embedding dimensions across components. * feat(powermem): enhance OceanBase configuration and query handling - Added `enable_native_hybrid` field to `OceanBaseConfig` for native hybrid search support. - Updated query handling in `OceanBaseVectorStore` to use a safe query format, preventing SQL injection risks. * oceanbase native language case (#220) * Oceanbase Native Hybrid Search Cases (#223) * oceanbase native language case * Oceanbase Native Hybrid Search Cases * Optimise searching in Intelligent mode And fix SILICONFLOW_LLM_BASE_URL bug (#224) * Enhance memory operations with background threading support - Added a global background thread pool for asynchronous memory updates and deletions in the Memory class. - Updated the handling of memory updates and deletions to submit tasks to the background executor, improving performance and responsiveness. * format * Enhance SiliconFlowConfig API key handling - Updated `SiliconFlowConfig` to improve API key and base URL handling by adding new validation aliases for better compatibility. * add enable_native_hybrid in benchmark * Enhance configuration management for OceanBase in config_loader.py - Added backward compatibility for OceanBase by constructing connection arguments from vector store configuration. - Updated unit tests to verify the inclusion of internal settings in the configuration. * disable env file * Fix unit test issues caused by setting changes (#228) * Enhance configuration management for OceanBase in config_loader.py - Added backward compatibility for OceanBase by constructing connection arguments from vector store configuration. - Updated unit tests to verify the inclusion of internal settings in the configuration. * disable env file * Fixed run failure caused by incorrect folder name * Fixed run failure caused by incorrect folder name (#229) * Enhance configuration management for OceanBase in config_loader.py - Added backward compatibility for OceanBase by constructing connection arguments from vector store configuration. - Updated unit tests to verify the inclusion of internal settings in the configuration. * disable env file * Fixed run failure caused by incorrect folder name * feat: Enhance rerank configuration and integration - Introduced new RERANKER_* environment variables for improved configuration management across rerank providers. - Updated .env.example to include new rerank settings and reorganized sections for clarity. - Refactored rerank integration files to utilize the new base configuration structure, ensuring consistency in API key and base URL handling. - Updated error messages in rerank classes to reflect the new environment variable naming convention. * adjust version * add rerank setting in OceanBaseConfig * Enhance OceanBase configuration for embedded SeekDB support - Updated .env.example to clarify connection settings for OceanBase, allowing an empty host for embedded SeekDB. - Modified config_loader.py to include 'ob_path' in connection arguments for OceanBase. - Enhanced CLI prompts in config.py to guide users on using embedded SeekDB. - Updated base.py and oceanbase.py to support 'ob_path' for embedded SeekDB data directory. - Refactored oceanbase_graph.py and oceanbase.py to handle connections based on host presence, supporting both remote and embedded modes. - Introduced utility methods in oceanbase_util.py for safe fetching of results from OceanBase. - Adjusted user_memory and user_profile modules to accommodate new connection parameters. * Add ensure_embedded_database_exists utility method and update OceanBase classes - Introduced `ensure_embedded_database_exists` method in `oceanbase_util.py` to verify and create the target database for embedded SeekDB. - Updated `oceanbase_graph.py`, `oceanbase.py`, and `user_profile.py` to call this method, ensuring the database exists before establishing connections. - Enhanced error handling and logging for database creation failures. * Update Python version requirements to 3.11 across project files - Modified `pyproject.toml` to require Python 3.11 and updated target versions for tools like Black and Mypy. - Updated README files in English, Chinese, and Japanese to reflect the new Python version. - Adjusted GitHub workflows to use Python 3.11 in build, publish, regression, and test configurations. - Revised documentation and examples to specify Python 3.11 as the minimum requirement. * Update database provider in .env.example from sqlite to oceanbase * Update OceanBase index type to HNSW and enhance handling for embedded SeekDB - Changed the default index type in .env.example from IVF_FLAT to HNSW. - Added logic in oceanbase.py to automatically switch to HNSW for embedded SeekDB when using IVF-family indexes on small datasets to prevent crashes. - Improved row mapping to support both SQLAlchemy Row objects and plain dicts, ensuring compatibility and stability during data fetching. * fix query bug * Enhance memory update and delete operations for embedded SeekDB support - Added logic to handle updates and deletes synchronously for embedded SeekDB to prevent crashes due to concurrent connections. - Updated the memory management code to check if the storage is an embedded store and adjust the execution of update and delete operations accordingly. * Add limitation note for AsyncMemory with embedded SeekDB - Documented that AsyncMemory cannot be used with embedded SeekDB due to its single-threaded C++ engine, which leads to crashes from concurrent access. - Provided code examples illustrating the correct usage of Memory class for embedded SeekDB and the supported use of AsyncMemory with remote OceanBase. * add pyseekdb dependents * Refactor update method in OceanBaseVectorStore to improve payload merging and preserve existing fields - Updated the update method to fetch all existing columns, ensuring that partial payloads do not overwrite existing data. - Enhanced the merging logic for payloads, allowing new keys to override existing ones while preserving necessary fields. - Improved handling of sparse embeddings to prevent accidental data loss during updates. - Streamlined the preparation of update data for better clarity and maintainability. * Update version to 1.0.3 in project files - Incremented version number in pyproject.toml, version.py, and audit/telemetry modules to reflect the new release. - Ensured consistency across all relevant files for the updated version. * Implement service singletons and error handling in FastAPI application - Added an async context manager for initializing and cleaning up service singletons at application startup and shutdown. - Updated service retrieval functions in API routes to fetch services from the app state, with error handling for unavailable services. - Introduced logging for service initialization and shutdown processes, enhancing observability of the application lifecycle. - Added a utility function to check for embedded storage in the CLI server configuration, ensuring proper worker settings for embedded databases. * Update version to 1.1.0 across project files - Incremented version number in pyproject.toml, version.py, and relevant modules to reflect the new release. - Ensured consistency in versioning across all affected files for the updated version. --------- Co-authored-by: jingshun.tq <35712518+Teingi@users.noreply.github.com> Co-authored-by: Chifang <40140008+Ripcord55@users.noreply.github.com>
1 parent a512b5c commit 53594c7

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+547
-246
lines changed

.env.example

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ TIMEZONE=Asia/Shanghai
1414
# 1. Database Configuration (Required)
1515
# =============================================================================
1616
# Choose your database provider: sqlite, oceanbase, postgres
17-
DATABASE_PROVIDER=sqlite
17+
DATABASE_PROVIDER=oceanbase
1818

1919
# -----------------------------------------------------------------------------
2020
# SQLite Configuration (Default - Recommended for development)
@@ -27,15 +27,17 @@ SQLITE_COLLECTION=memories
2727
# -----------------------------------------------------------------------------
2828
# OceanBase Configuration
2929
# -----------------------------------------------------------------------------
30-
OCEANBASE_HOST=127.0.0.1
30+
# Connection mode: set OCEANBASE_HOST for remote, leave empty for embedded SeekDB
31+
OCEANBASE_HOST=
32+
OCEANBASE_PATH=./seekdb_data
3133
OCEANBASE_PORT=2881
3234
OCEANBASE_USER=root@sys
3335
OCEANBASE_PASSWORD=your_password
3436
OCEANBASE_DATABASE=powermem
3537
OCEANBASE_COLLECTION=memories
3638

3739
## Keep the default settings, as modifications are generally not needed.
38-
OCEANBASE_INDEX_TYPE=IVF_FLAT
40+
OCEANBASE_INDEX_TYPE=HNSW
3941
OCEANBASE_VECTOR_METRIC_TYPE=cosine
4042
OCEANBASE_TEXT_FIELD=document
4143
OCEANBASE_VECTOR_FIELD=embedding

.github/workflows/build.yml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,9 @@ on:
1212
python_version:
1313
description: 'Python version to build with'
1414
required: false
15-
default: '3.10'
15+
default: '3.11'
1616
type: choice
1717
options:
18-
- '3.10'
1918
- '3.11'
2019
- '3.12'
2120

@@ -53,7 +52,7 @@ jobs:
5352
needs: build-dashboard
5453
strategy:
5554
matrix:
56-
python-version: ["3.10", "3.11", "3.12"]
55+
python-version: ["3.11", "3.12"]
5756
fail-fast: false
5857

5958
steps:

.github/workflows/publish.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ jobs:
3636
- name: Set up Python
3737
uses: actions/setup-python@v4
3838
with:
39-
python-version: "3.10"
39+
python-version: "3.11"
4040

4141
- name: Inject Frontend and Build Package
4242
run: |

.github/workflows/regression.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ jobs:
3636
if: github.event_name == 'push' || github.event_name == 'workflow_dispatch' || github.event_name == 'schedule' || (github.event_name == 'pull_request_target' && github.event.pull_request.head.repo.full_name == github.repository)
3737
strategy:
3838
matrix:
39-
python-version: ["3.10"]
39+
python-version: ["3.11"]
4040
fail-fast: false
4141

4242
steps:
@@ -174,7 +174,7 @@ jobs:
174174
if: github.event_name == 'push' || github.event_name == 'workflow_dispatch' || github.event_name == 'schedule' || (github.event_name == 'pull_request_target' && github.event.pull_request.head.repo.full_name == github.repository)
175175
strategy:
176176
matrix:
177-
python-version: ["3.10"]
177+
python-version: ["3.11"]
178178
fail-fast: false
179179

180180
steps:

.github/workflows/test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ jobs:
2626
runs-on: ubuntu-latest
2727
strategy:
2828
matrix:
29-
python-version: ["3.10", "3.11", "3.12"]
29+
python-version: ["3.11", "3.12"]
3030
fail-fast: false
3131

3232
steps:

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,8 @@ One command to add PowerMem memory to OpenClaw: `openclaw plugins install memory
2727
<a href="https://github.com/oceanbase/powermem/blob/master/LICENSE">
2828
<img alt="license" src="https://img.shields.io/badge/license-Apache%202.0-green.svg" />
2929
</a>
30-
<a href="https://img.shields.io/badge/python%20-3.10.0%2B-blue.svg">
31-
<img alt="pyversions" src="https://img.shields.io/badge/python%20-3.10.0%2B-blue.svg" />
30+
<a href="https://img.shields.io/badge/python%20-3.11.0%2B-blue.svg">
31+
<img alt="pyversions" src="https://img.shields.io/badge/python%20-3.11.0%2B-blue.svg" />
3232
</a>
3333
<a href="https://deepwiki.com/oceanbase/powermem">
3434
<img alt="Ask DeepWiki" src="https://deepwiki.com/badge.svg" />

README_CN.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,8 @@
2727
<a href="https://github.com/oceanbase/powermem/blob/master/LICENSE">
2828
<img alt="license" src="https://img.shields.io/badge/license-Apache%202.0-green.svg" />
2929
</a>
30-
<a href="https://img.shields.io/badge/python%20-3.10.0%2B-blue.svg">
31-
<img alt="pyversions" src="https://img.shields.io/badge/python%20-3.10.0%2B-blue.svg" />
30+
<a href="https://img.shields.io/badge/python%20-3.11.0%2B-blue.svg">
31+
<img alt="pyversions" src="https://img.shields.io/badge/python%20-3.11.0%2B-blue.svg" />
3232
</a>
3333
<a href="https://deepwiki.com/oceanbase/powermem">
3434
<img alt="Ask DeepWiki" src="https://deepwiki.com/badge.svg" />

README_JP.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,8 @@
2727
<a href="https://github.com/oceanbase/powermem/blob/master/LICENSE">
2828
<img alt="license" src="https://img.shields.io/badge/license-Apache%202.0-green.svg" />
2929
</a>
30-
<a href="https://img.shields.io/badge/python%20-3.10.0%2B-blue.svg">
31-
<img alt="pyversions" src="https://img.shields.io/badge/python%20-3.10.0%2B-blue.svg" />
30+
<a href="https://img.shields.io/badge/python%20-3.11.0%2B-blue.svg">
31+
<img alt="pyversions" src="https://img.shields.io/badge/python%20-3.11.0%2B-blue.svg" />
3232
</a>
3333
<a href="https://deepwiki.com/oceanbase/powermem">
3434
<img alt="Ask DeepWiki" src="https://deepwiki.com/badge.svg" />

benchmark/server/main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,7 @@ def create(self, *args: Any, **kwargs: Any) -> Any:
153153
app = FastAPI(
154154
title="PowerMem Benchmark REST APIs",
155155
description="A REST API for managing and searching memories for benchmark testing scenarios.",
156-
version="1.0.0",
156+
version="1.1.0",
157157
docs_url="/docs",
158158
redoc_url="/redoc",
159159
)

docs/api/0002-async_memory.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -263,16 +263,36 @@ async def batch_process():
263263
asyncio.run(batch_process())
264264
```
265265

266+
### Limitation: Embedded SeekDB Does Not Support Async
267+
268+
Embedded SeekDB (local file mode with no `host` configured) uses a single-threaded C++ engine that **does not support concurrent multi-threaded access**. `AsyncMemory` internally submits synchronous operations to a `ThreadPoolExecutor`, which causes multiple threads to read and write the same embedded SeekDB instance simultaneously. This leads to C++-level crashes such as `pure virtual method called` or `Segmentation fault`.
269+
270+
**`AsyncMemory` cannot be used with embedded SeekDB.** Use the synchronous `Memory` class instead.
271+
272+
```python
273+
# ❌ Not supported with embedded SeekDB
274+
from powermem import AsyncMemory
275+
async_memory = AsyncMemory(config=embedded_seekdb_config) # crashes
276+
277+
# ✓ Use the synchronous interface with embedded SeekDB
278+
from powermem import Memory
279+
memory = Memory(config=embedded_seekdb_config)
280+
```
281+
282+
Remote OceanBase (with `host` configured) is not affected by this limitation and fully supports `AsyncMemory`.
283+
266284
### When to Use AsyncMemory
267285

268286
Use `AsyncMemory` when:
269287
- Processing many memories concurrently
270288
- Building async web applications (FastAPI, aiohttp)
271289
- Implementing batch processing pipelines
272290
- Need non-blocking memory operations
291+
- Using **remote OceanBase** (with `host` configured)
273292

274293
Use `Memory` when:
275294
- Simple synchronous scripts
276295
- Interactive notebooks
277296
- Simple use cases without concurrency needs
297+
- Using **embedded SeekDB** (local file mode, no `host`)
278298

0 commit comments

Comments
 (0)