-
-
Notifications
You must be signed in to change notification settings - Fork 513
PHPORM-381 Add class metadata for vector search indexes #2820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
{ | ||
/** @param list<VectorSearchIndexField> $fields */ | ||
public function __construct( | ||
public array $fields, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the fields are required, it has to be the 1st parameter of the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for vector search indexes in MongoDB ODM, introducing a new #[VectorSearchIndex]
attribute for creating Atlas Vector Search Indexes. This feature enables efficient similarity search on vector fields, commonly used for machine learning embeddings.
Key changes include:
- New
#[VectorSearchIndex]
attribute for annotating document classes - Extended ClassMetadata to handle vector search indexes alongside regular search indexes
- XML mapping support for vector search index definitions
Reviewed Changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
lib/Doctrine/ODM/MongoDB/Mapping/Annotations/VectorSearchIndex.php |
New annotation class for vector search index definition |
lib/Doctrine/ODM/MongoDB/Mapping/ClassMetadata.php |
Extended metadata handling to support vector search indexes with validation |
lib/Doctrine/ODM/MongoDB/Mapping/Driver/AttributeDriver.php |
Added attribute driver support for processing VectorSearchIndex annotations |
lib/Doctrine/ODM/MongoDB/Mapping/Driver/XmlDriver.php |
Added XML driver support for vector-search-indexes configuration |
lib/Doctrine/ODM/MongoDB/Mapping/MappingException.php |
New exception for empty vector search index validation |
tests/Documents/VectorEmbedding.php |
Test document demonstrating vector search index usage |
tests/Doctrine/ODM/MongoDB/Tests/SchemaManagerTest.php |
Updated tests to handle vector search indexes in schema management |
tests/Doctrine/ODM/MongoDB/Tests/Mapping/ClassMetadataTest.php |
Added tests for vector search index validation and creation |
doctrine-mongo-mapping.xsd |
Updated XSD schema to support vector search index XML definitions |
docs/en/reference/attributes-reference.rst |
Documentation for the new VectorSearchIndex attribute |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
lib/Doctrine/ODM/MongoDB/Mapping/Annotations/VectorSearchIndex.php
Outdated
Show resolved
Hide resolved
} | ||
|
||
if (isset($vectorField['hnswNumEdgeCandidates'])) { | ||
$field['hnswOptions']['numEdgeCandidates'] = (int) $vectorField['hnswNumEdgeCandidates']; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my information, when/how do the types specified in the XML schema get checked? I see that the schema defines the types, but does that get validated somewhere before we get here and cast them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type is only used when validating the XML file with the XSD. It's never used to cast the node to the correct type.
$this->validateSchema($file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I mean it's cast here on line 778 so I was wondering if we're making sure somewhere else that it actually is a number. But ok cool, so it is validated!
Please hold, I accepted and realized I'd only reviewed one commit. Can't remove the review 💀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good now as far as I'm concerned
- ``fields`` (required): A list of field definitions. Each field definition is an associative array describing a vector or filter field. For vector fields, the following keys are supported: | ||
|
||
- ``type``: Must be set to ``'vector'`` for vector fields or ``'filter'`` for filter fields. | ||
- ``path``: The name of the field in the document to index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it use the class metadata to map the doctrine field name to the mongodb field name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, we do not do that for the fields
option on SearchIndex. The values here should correspond to the exact names in the database schema.
This may be something worth clarifying in the documentation files for both, though.
- ``type``: Must be set to ``'vector'`` for vector fields or ``'filter'`` for filter fields. | ||
- ``path``: The name of the field in the document to index. | ||
- ``numDimensions``: (vector fields only) The number of dimensions in the vector. | ||
- ``similarity``: (vector fields only) The vector similarity function to use. Supported values include ``'euclidean'``, ``'cosine'``, and ``'dotProduct'``. Use the constants from ``Doctrine\ODM\MongoDB\Mapping\ClassMetadata::VECTOR_SIMILARITY_*`` for best compatibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason you chose not to use an enum here? I suppose keeping this open makes it easier to be forward-compatible should the server introduce more similarity types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I want to keep it open for new values, and let people use strings from the documentation.
- ``fields`` (required): A list of field definitions. Each field definition is an associative array describing a vector or filter field. For vector fields, the following keys are supported: | ||
|
||
- ``type``: Must be set to ``'vector'`` for vector fields or ``'filter'`` for filter fields. | ||
- ``path``: The name of the field in the document to index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, we do not do that for the fields
option on SearchIndex. The values here should correspond to the exact names in the database schema.
This may be something worth clarifying in the documentation files for both, though.
Summary
Add a new
#[ODM\VectorSearchIndex]
attribute to create Atlas Vector Search Indexes.In XML, attribute and annotations, this is a new type of object. But they create Search Indexes in the class metadata with the type
vectorSearch
. The index management is exactly the same for search and vector search indexes.