-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[TST] add schema proptest to python #5823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
394633e to
801ec5a
Compare
061ee06 to
fe04c8f
Compare
|
Add Hypothesis-based schema reconciliation tests for collection creation Introduces comprehensive property-based tests covering collection creation and retrieval across local and distributed configurations. The new suites validate reconciliation between metadata, explicit configuration, and schema-provided vector/index settings for both HNSW and SPANN modes, ensuring collection state (configuration blocks, schema defaults/keys, and embedding function metadata) matches expectations derived from randomized inputs. Key Changes• Added Affected Areas• This summary was automatically generated by @propel-code-bot |
46bddae to
148fe32
Compare
fe04c8f to
db3b74e
Compare
This comment has been minimized.
This comment has been minimized.
148fe32 to
ab35cf0
Compare
db3b74e to
f8cfb53
Compare
| for key, value in hnsw_non_none.items(): | ||
| if value is not None and value != HNSW_DEFAULTS[key]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
Potential KeyError risk: The code accesses HNSW_DEFAULTS[key] on lines 128 and 197 without checking if the key exists in the dictionary. If a key from hnsw_non_none is not present in HNSW_DEFAULTS, this will raise a KeyError. Add a safety check:
if key in HNSW_DEFAULTS and value is not None and value != HNSW_DEFAULTS[key]:
should_try_metadata = FalseThis prevents crashes when unexpected configuration keys are encountered.
Context for Agents
[**BestPractice**]
Potential KeyError risk: The code accesses `HNSW_DEFAULTS[key]` on lines 128 and 197 without checking if the key exists in the dictionary. If a key from `hnsw_non_none` is not present in `HNSW_DEFAULTS`, this will raise a `KeyError`. Add a safety check:
```python
if key in HNSW_DEFAULTS and value is not None and value != HNSW_DEFAULTS[key]:
should_try_metadata = False
```
This prevents crashes when unexpected configuration keys are encountered.
File: chromadb/test/property/test_schema.py
Line: 128ab35cf0 to
ed59ae9
Compare
f8cfb53 to
2b76659
Compare
ed59ae9 to
fa7744a
Compare
e0d238f to
5fafa38
Compare
fa7744a to
19836ec
Compare
5fafa38 to
7c682b1
Compare
19836ec to
f1b64d9
Compare
f1b64d9 to
95c36fd
Compare
7c682b1 to
8671ea1
Compare
95c36fd to
6cd92fc
Compare
8671ea1 to
c129192
Compare
c129192 to
3d9843b
Compare
6cd92fc to
a021c46
Compare
a021c46 to
f8f6b6f
Compare
| def _compute_expected_config( | ||
| spann_active: bool, | ||
| metadata: Optional[CollectionMetadata], | ||
| configuration: Optional[CreateCollectionConfiguration], | ||
| schema_vector_index_config: Optional[Dict[str, Any]], | ||
| ) -> Dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Maintainability] The function _compute_expected_config is quite long and handles two distinct logical paths based on spann_active. To improve readability and maintainability, consider splitting this function into two smaller, focused helpers: one for the SPANN logic and one for the HNSW logic.
This would make the complex reconciliation logic easier to understand and maintain for each vector index type.
Example structure:
def _compute_expected_config_spann(
metadata: Optional[CollectionMetadata],
configuration: Optional[CreateCollectionConfiguration],
schema_vector_index_config: Optional[Dict[str, Any]],
) -> Dict[str, Any]:
# ... logic from the `if spann_active:` block ...
def _compute_expected_config_hnsw(
metadata: Optional[CollectionMetadata],
configuration: Optional[CreateCollectionConfiguration],
schema_vector_index_config: Optional[Dict[str, Any]],
) -> Dict[str, Any]:
# ... logic from the `else:` block ...
def _compute_expected_config(
spann_active: bool,
metadata: Optional[CollectionMetadata],
configuration: Optional[CreateCollectionConfiguration],
schema_vector_index_config: Optional[Dict[str, Any]],
) -> Dict[str, Any]:
if spann_active:
return _compute_expected_config_spann(
metadata, configuration, schema_vector_index_config
)
else:
return _compute_expected_config_hnsw(
metadata, configuration, schema_vector_index_config
)Context for Agents
The function `_compute_expected_config` is quite long and handles two distinct logical paths based on `spann_active`. To improve readability and maintainability, consider splitting this function into two smaller, focused helpers: one for the SPANN logic and one for the HNSW logic.
This would make the complex reconciliation logic easier to understand and maintain for each vector index type.
Example structure:
```python
def _compute_expected_config_spann(
metadata: Optional[CollectionMetadata],
configuration: Optional[CreateCollectionConfiguration],
schema_vector_index_config: Optional[Dict[str, Any]],
) -> Dict[str, Any]:
# ... logic from the `if spann_active:` block ...
def _compute_expected_config_hnsw(
metadata: Optional[CollectionMetadata],
configuration: Optional[CreateCollectionConfiguration],
schema_vector_index_config: Optional[Dict[str, Any]],
) -> Dict[str, Any]:
# ... logic from the `else:` block ...
def _compute_expected_config(
spann_active: bool,
metadata: Optional[CollectionMetadata],
configuration: Optional[CreateCollectionConfiguration],
schema_vector_index_config: Optional[Dict[str, Any]],
) -> Dict[str, Any]:
if spann_active:
return _compute_expected_config_spann(
metadata, configuration, schema_vector_index_config
)
else:
return _compute_expected_config_hnsw(
metadata, configuration, schema_vector_index_config
)
```
File: chromadb/test/property/test_schema.py
Line: 97
Description of changes
Summarize the changes made by this PR.
Test plan
How are these changes tested?
pytestfor python,yarn testfor js,cargo testfor rustMigration plan
Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?
Observability plan
What is the plan to instrument and monitor this change?
Documentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?