Skip to content

Conversation

@zhcn000000
Copy link
Contributor

@zhcn000000 zhcn000000 commented Jun 26, 2025

Fixed the bug of reindex and drop_index when specifying the schema and
Add the use_jsonb parameter to PGEngine for storing metadata using JSONB , and the default value is False

@averikitsch
Copy link
Collaborator

Hi @zhcn000000, thank you for this PR. Can you provide more details on the purpose of this change? Currently, we recommend that any metadata that should be indexed and filtered on to be specified as specific "metadata_columns" for even better performance than using JSONB. Additionally, the JSON data type has faster insertion performance than JSONB.

@zhcn000000
Copy link
Contributor Author

Although this may not have an obvious effect, jsonb is faster in reading and json is faster in writing, but it can provide users with additional options, just like the use_jsonb option in traditional engines PGVector

    def __init__(
        self,
        embeddings: Embeddings,
        *,
        connection: Union[None, DBConnection, Engine, AsyncEngine, str] = None,
        embedding_length: Optional[int] = None,
        collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,
        collection_metadata: Optional[dict] = None,
        distance_strategy: DistanceStrategy = DEFAULT_DISTANCE_STRATEGY,
        pre_delete_collection: bool = False,
        logger: Optional[logging.Logger] = None,
        relevance_score_fn: Optional[Callable[[float], float]] = None,
        engine_args: Optional[dict[str, Any]] = None,
        use_jsonb: bool = True,
        create_extension: bool = True,
        async_mode: bool = False,
    ) -> None:
        """Initialize the PGVector store.
        For an async version, use `PGVector.acreate()` instead.

        Args:
            connection: Postgres connection string or (async)engine.
            embeddings: Any embedding function implementing
                `langchain.embeddings.base.Embeddings` interface.
            embedding_length: The length of the embedding vector. (default: None)
                NOTE: This is not mandatory. Defining it will prevent vectors of
                any other size to be added to the embeddings table but, without it,
                the embeddings can't be indexed.
            collection_name: The name of the collection to use. (default: langchain)
                NOTE: This is not the name of the table, but the name of the collection.
                The tables will be created when initializing the store (if not exists)
                So, make sure the user has the right permissions to create tables.
            distance_strategy: The distance strategy to use. (default: COSINE)
            pre_delete_collection: If True, will delete the collection if it exists.
                (default: False). Useful for testing.
            engine_args: SQLAlchemy's create engine arguments.
            use_jsonb: Use JSONB instead of JSON for metadata. (default: True)
                Strongly discouraged from using JSON as it's not as efficient
                for querying.
                It's provided here for backwards compatibility with older versions,
                and will be removed in the future.
            create_extension: If True, will create the vector extension if it
                doesn't exist. disabling creation is useful when using ReadOnly
                Databases.
        """

@zhcn000000
Copy link
Contributor Author

Setting the default value of use_jsonb to false enables users to still store using the original scheme (json) by default

@zhcn000000 zhcn000000 changed the title Add the use_jsonb parameter to PGEngine for storing metadata using JSONB Fixed the bug of reindex and drop_index when specifying the schema Jul 23, 2025
@zhcn000000
Copy link
Contributor Author

Fixed the bug of reindex and drop_index when specifying the schema

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants