-
Notifications
You must be signed in to change notification settings - Fork 103
Fixed the bug of reindex and drop_index when specifying the schema #222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Hi @zhcn000000, thank you for this PR. Can you provide more details on the purpose of this change? Currently, we recommend that any metadata that should be indexed and filtered on to be specified as specific "metadata_columns" for even better performance than using JSONB. Additionally, the JSON data type has faster insertion performance than JSONB. |
|
Although this may not have an obvious effect, jsonb is faster in reading and json is faster in writing, but it can provide users with additional options, just like the use_jsonb option in traditional engines PGVector def __init__(
self,
embeddings: Embeddings,
*,
connection: Union[None, DBConnection, Engine, AsyncEngine, str] = None,
embedding_length: Optional[int] = None,
collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,
collection_metadata: Optional[dict] = None,
distance_strategy: DistanceStrategy = DEFAULT_DISTANCE_STRATEGY,
pre_delete_collection: bool = False,
logger: Optional[logging.Logger] = None,
relevance_score_fn: Optional[Callable[[float], float]] = None,
engine_args: Optional[dict[str, Any]] = None,
use_jsonb: bool = True,
create_extension: bool = True,
async_mode: bool = False,
) -> None:
"""Initialize the PGVector store.
For an async version, use `PGVector.acreate()` instead.
Args:
connection: Postgres connection string or (async)engine.
embeddings: Any embedding function implementing
`langchain.embeddings.base.Embeddings` interface.
embedding_length: The length of the embedding vector. (default: None)
NOTE: This is not mandatory. Defining it will prevent vectors of
any other size to be added to the embeddings table but, without it,
the embeddings can't be indexed.
collection_name: The name of the collection to use. (default: langchain)
NOTE: This is not the name of the table, but the name of the collection.
The tables will be created when initializing the store (if not exists)
So, make sure the user has the right permissions to create tables.
distance_strategy: The distance strategy to use. (default: COSINE)
pre_delete_collection: If True, will delete the collection if it exists.
(default: False). Useful for testing.
engine_args: SQLAlchemy's create engine arguments.
use_jsonb: Use JSONB instead of JSON for metadata. (default: True)
Strongly discouraged from using JSON as it's not as efficient
for querying.
It's provided here for backwards compatibility with older versions,
and will be removed in the future.
create_extension: If True, will create the vector extension if it
doesn't exist. disabling creation is useful when using ReadOnly
Databases.
""" |
|
Setting the default value of use_jsonb to false enables users to still store using the original scheme (json) by default |
|
Fixed the bug of reindex and drop_index when specifying the schema |
Fixed the bug of reindex and drop_index when specifying the schema and
Add the use_jsonb parameter to PGEngine for storing metadata using JSONB , and the default value is False