Skip to content

Embedded seekdb#393

Merged
Teingi merged 54 commits intooceanbase:mainfrom
Evenss:embedded_seekdb
Apr 2, 2026
Merged

Embedded seekdb#393
Teingi merged 54 commits intooceanbase:mainfrom
Evenss:embedded_seekdb

Conversation

@Evenss
Copy link
Copy Markdown
Member

@Evenss Evenss commented Apr 2, 2026

Summary

Embedded seekdb


Summary

Embedded seekdb

Evenss and others added 30 commits January 27, 2026 11:51
* add sparse vector embedding

* hybrid search add sparse vector search

* add checking version logic

* add qwen sparse vector

* adjust weight

* update sparse vector function

* update sparse vector function

* fix bug

* fix bug

* optimise function

* optimise function

* optimise function

* optimise function

* optimise function

* optimise function

* optimise function

* fix bug

* add migrate function

* update alembic function

* update alembic function

* update alembic function

* adjust file struct

* update alembic

* update version

* optimise

* fix bug

* update

* update schema update method

* update schema update method

* update schema update method

* update schema update method

* update schema update method

* update migrate method

* update migrate method

* update env.example

* update env.example

* update migrate sparse vector

* update migrate sparse vector

* adjust threshold score logic

* update remark

* add guides and examples

* add benchmark param

* fix bug

* fulltext parsers support

* adjust enable sparse vector setting

* adjust env.example

* adjust docs

* update version

* fix bug

* optimise check

* adjust file construct

* adjust file construct

* add native search

* add file

* remove log

* remove log

* fix bug

* update pyobvector

* add rerank

* adjust

* add limit

* adjust config
* support native language

* support native language

* add docs
* feat(llm): enhance configuration management with pydantic-settings

- Introduced a unified configuration system for LLM providers using pydantic-settings.
- Added provider-specific settings for Anthropic, Azure, DeepSeek, Ollama, OpenAI, Qwen, Vllm, and Zai.
- Improved environment variable handling and validation through Field and AliasChoices.
- Removed legacy initialization methods in favor of a cleaner, more maintainable structure.
- Updated LLMFactory to utilize the new provider registration mechanism.

* chore: Update LLM configuration management and improve environment variable handling

- Refactor LLM configuration imports to use BaseLLMConfig.
- Replace direct attribute access with getattr for safer environment variable retrieval.
- Remove deprecated LLMConfig and streamline related code for better maintainability.
* feat(llm): enhance configuration management with pydantic-settings

- Introduced a unified configuration system for LLM providers using pydantic-settings.
- Added provider-specific settings for Anthropic, Azure, DeepSeek, Ollama, OpenAI, Qwen, Vllm, and Zai.
- Improved environment variable handling and validation through Field and AliasChoices.
- Removed legacy initialization methods in favor of a cleaner, more maintainable structure.
- Updated LLMFactory to utilize the new provider registration mechanism.

* chore: Update LLM configuration management and improve environment variable handling

- Refactor LLM configuration imports to use BaseLLMConfig.
- Replace direct attribute access with getattr for safer environment variable retrieval.
- Remove deprecated LLMConfig and streamline related code for better maintainability.

* feat: Enhance rerank configuration and integration

- Introduced BaseRerankConfig for improved configuration management across rerank providers.
- Updated rerank integration files to utilize the new base configuration structure.
- Added support for additional configuration fields such as api_base_url and top_n.
- Refactored rerank factory to accommodate new configuration handling and provider registration.
- Removed deprecated RerankConfig and streamlined related code for better maintainability.
- Updated API request handling in rerank classes to support custom HTTP clients.

* * refactor(powermem): remove unused storage configuration management module

- Removed `VectorStoreConfig` and `GraphStoreConfig` classes
- Deleted associated validation logic and import statements
- Streamlined codebase by eliminating unused components

* feat(powermem): enhance sparse embedder configuration management

- Introduced BaseSparseEmbedderConfig for unified sparse embedding configuration.
- Updated MemoryConfig to utilize BaseSparseEmbedderConfig.
- Refactored SparseEmbedderFactory to support new configuration handling.
- Improved handling of sparse embedder settings across various components.

* feat(powermem): enhance user profile storage with provider registration

- Added a registry mechanism to UserProfileStoreBase for automatic provider registration.
- Implemented class paths for OceanBase and SQLite user profile storage implementations.
- Updated UserProfileStoreFactory to utilize the new registry for provider class retrieval.
- Refactored imports to trigger auto-registration of user profile storage classes.
- Improved handling of provider names in the factory for better compatibility.

* feat(powermem): synchronize embedding model dimensions across configurations

- Added logic to sync `embedding_model_dims` from the embedder to both `vector_store` and `graph_store` if not already set.
- Updated `config_loader.py` and `configs.py` to ensure consistent embedding dimensions across components.

* feat(powermem): enhance OceanBase configuration and query handling

- Added `enable_native_hybrid` field to `OceanBaseConfig` for native hybrid search support.
- Updated query handling in `OceanBaseVectorStore` to use a safe query format, preventing SQL injection risks.
* oceanbase native language case

* Oceanbase Native Hybrid Search Cases
…RL bug (oceanbase#224)

* Enhance memory operations with background threading support

- Added a global background thread pool for asynchronous memory updates and deletions in the Memory class.
- Updated the handling of memory updates and deletions to submit tasks to the background executor, improving performance and responsiveness.

* format

* Enhance SiliconFlowConfig API key handling

- Updated `SiliconFlowConfig` to improve API key and base URL handling by adding new validation aliases for better compatibility.
# Conflicts:
#	benchmark/server/main.py
#	pyproject.toml
#	src/powermem/storage/configs.py
#	src/powermem/storage/factory.py
#	src/powermem/user_memory/user_memory.py
- Added backward compatibility for OceanBase by constructing connection arguments from vector store configuration.
- Updated unit tests to verify the inclusion of internal settings in the configuration.
* Enhance configuration management for OceanBase in config_loader.py

- Added backward compatibility for OceanBase by constructing connection arguments from vector store configuration.
- Updated unit tests to verify the inclusion of internal settings in the configuration.

* disable env file
# Conflicts:
#	tests/unit/test_config_loader.py
* Enhance configuration management for OceanBase in config_loader.py

- Added backward compatibility for OceanBase by constructing connection arguments from vector store configuration.
- Updated unit tests to verify the inclusion of internal settings in the configuration.

* disable env file

* Fixed run failure caused by incorrect folder name
- Introduced new RERANKER_* environment variables for improved configuration management across rerank providers.
- Updated .env.example to include new rerank settings and reorganized sections for clarity.
- Refactored rerank integration files to utilize the new base configuration structure, ensuring consistency in API key and base URL handling.
- Updated error messages in rerank classes to reflect the new environment variable naming convention.
Evenss and others added 24 commits February 26, 2026 17:28
- Updated .env.example to clarify connection settings for OceanBase, allowing an empty host for embedded SeekDB.
- Modified config_loader.py to include 'ob_path' in connection arguments for OceanBase.
- Enhanced CLI prompts in config.py to guide users on using embedded SeekDB.
- Updated base.py and oceanbase.py to support 'ob_path' for embedded SeekDB data directory.
- Refactored oceanbase_graph.py and oceanbase.py to handle connections based on host presence, supporting both remote and embedded modes.
- Introduced utility methods in oceanbase_util.py for safe fetching of results from OceanBase.
- Adjusted user_memory and user_profile modules to accommodate new connection parameters.
…se classes

- Introduced `ensure_embedded_database_exists` method in `oceanbase_util.py` to verify and create the target database for embedded SeekDB.
- Updated `oceanbase_graph.py`, `oceanbase.py`, and `user_profile.py` to call this method, ensuring the database exists before establishing connections.
- Enhanced error handling and logging for database creation failures.
- Modified `pyproject.toml` to require Python 3.11 and updated target versions for tools like Black and Mypy.
- Updated README files in English, Chinese, and Japanese to reflect the new Python version.
- Adjusted GitHub workflows to use Python 3.11 in build, publish, regression, and test configurations.
- Revised documentation and examples to specify Python 3.11 as the minimum requirement.
… SeekDB

- Changed the default index type in .env.example from IVF_FLAT to HNSW.
- Added logic in oceanbase.py to automatically switch to HNSW for embedded SeekDB when using IVF-family indexes on small datasets to prevent crashes.
- Improved row mapping to support both SQLAlchemy Row objects and plain dicts, ensuring compatibility and stability during data fetching.
- Added logic to handle updates and deletes synchronously for embedded SeekDB to prevent crashes due to concurrent connections.
- Updated the memory management code to check if the storage is an embedded store and adjust the execution of update and delete operations accordingly.
- Documented that AsyncMemory cannot be used with embedded SeekDB due to its single-threaded C++ engine, which leads to crashes from concurrent access.
- Provided code examples illustrating the correct usage of Memory class for embedded SeekDB and the supported use of AsyncMemory with remote OceanBase.
…ging and preserve existing fields

- Updated the update method to fetch all existing columns, ensuring that partial payloads do not overwrite existing data.
- Enhanced the merging logic for payloads, allowing new keys to override existing ones while preserving necessary fields.
- Improved handling of sparse embeddings to prevent accidental data loss during updates.
- Streamlined the preparation of update data for better clarity and maintainability.
- Incremented version number in pyproject.toml, version.py, and audit/telemetry modules to reflect the new release.
- Ensured consistency across all relevant files for the updated version.
- Added an async context manager for initializing and cleaning up service singletons at application startup and shutdown.
- Updated service retrieval functions in API routes to fetch services from the app state, with error handling for unavailable services.
- Introduced logging for service initialization and shutdown processes, enhancing observability of the application lifecycle.
- Added a utility function to check for embedded storage in the CLI server configuration, ensuring proper worker settings for embedded databases.
- Incremented version number in pyproject.toml, version.py, and relevant modules to reflect the new release.
- Ensured consistency in versioning across all affected files for the updated version.
Copy link
Copy Markdown
Member

@Teingi Teingi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Teingi Teingi merged commit 53594c7 into oceanbase:main Apr 2, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants