Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions docs/en/reference/attributes-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1167,6 +1167,10 @@ Optional arguments:
This attribute is used to specify :ref:`search indexes <search_indexes>` for
`MongoDB Atlas Search <https://www.mongodb.com/docs/atlas/atlas-search/>`__.

.. note::

For vector search indexes, see :ref:`vector_search_index` below.

The arguments correspond to arguments for
`MongoDB\Collection::createSearchIndex() <https://www.mongodb.com/docs/php-library/current/reference/method/MongoDBCollection-createSearchIndex/>`__.
Excluding ``name``, arguments are used to create the
Expand Down Expand Up @@ -1397,6 +1401,73 @@ for the related collection.
// rest of the class code...
}

#[VectorSearchIndex]
--------------------

.. _vector_search_index:

The ``#[VectorSearchIndex]`` attribute is used to define a vector search index
on a document class. This enables efficient similarity search on vector fields,
such as those used for machine learning embeddings.

Optional arguments:

- ``name``: (optional) The name of the vector search index. If omitted, a default name is used.
- ``fields`` (required): A list of field definitions. Each field definition is an associative array describing a vector or filter field. For vector fields, the following keys are supported:

- ``type``: Must be set to ``'vector'`` for vector fields or ``'filter'`` for filter fields.
- ``path``: The name of the field in the document to index.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it use the class metadata to map the doctrine field name to the mongodb field name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, we do not do that for the fields option on SearchIndex. The values here should correspond to the exact names in the database schema.

This may be something worth clarifying in the documentation files for both, though.

- ``numDimensions``: (vector fields only) The number of dimensions in the vector.
- ``similarity``: (vector fields only) The vector similarity function to use. Supported values include ``'euclidean'``, ``'cosine'``, and ``'dotProduct'``. Use the constants from ``Doctrine\ODM\MongoDB\Mapping\ClassMetadata::VECTOR_SIMILARITY_*`` for best compatibility.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you chose not to use an enum here? I suppose keeping this open makes it easier to be forward-compatible should the server introduce more similarity types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I want to keep it open for new values, and let people use strings from the documentation.

- ``quantization``: (vector fields only, optional) The quantization method, e.g., ``'scalar'``.
- ``hnswOptions``: (vector fields only, optional) Options for the HNSW algorithm: ``maxEdges`` and ``numEdgeCandidates``.

For filter fields, only ``type: 'filter'`` and ``path`` are required.


Example:

.. code-block:: php

<?php
use Doctrine\ODM\MongoDB\Mapping\Annotations\Document;
use Doctrine\ODM\MongoDB\Mapping\Annotations\Field;
use Doctrine\ODM\MongoDB\Mapping\Annotations\Id;
use Doctrine\ODM\MongoDB\Mapping\Annotations\VectorSearchIndex;
use Doctrine\ODM\MongoDB\Mapping\ClassMetadata;
use Doctrine\ODM\MongoDB\Types\Type;

#[Document(collection: 'vector_embeddings')]
#[VectorSearchIndex(
fields: [
[
'type' => 'vector',
'path' => 'plotEmbeddingVoyage3Large',
'numDimensions' => 2048,
'similarity' => ClassMetadata::VECTOR_SIMILARITY_DOT_PRODUCT,
'quantization' => ClassMetadata::VECTOR_QUANTIZATION_SCALAR,
],
[
'type' => 'filter',
'path' => 'category',
],
],
)]
class VectorEmbedding
{
#[Id]
public ?string $id = null;

/** @var list<float> */
#[Field(type: Type::COLLECTION)]
public array $plotEmbeddingVoyage3Large = [];

#[Field)]
public string $category;
}

For more details, see the MongoDB documentation on `Atlas Vector Search <https://www.mongodb.com/docs/atlas/atlas-vector-search/>`_.

#[Version]
----------

Expand Down
30 changes: 30 additions & 0 deletions doctrine-mongo-mapping.xsd
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,7 @@
<xs:element name="also-load-methods" type="odm:also-load-methods" minOccurs="0" />
<xs:element name="indexes" type="odm:indexes" minOccurs="0" />
<xs:element name="search-indexes" type="odm:search-indexes" minOccurs="0" />
<xs:element name="vector-search-indexes" type="odm:vector-search-indexes" minOccurs="0" />
<xs:element name="shard-key" type="odm:shard-key" minOccurs="0" />
<xs:element name="read-preference" type="odm:read-preference" minOccurs="0" />
<xs:element name="schema-validation" type="odm:schema-validation" minOccurs="0" />
Expand Down Expand Up @@ -640,6 +641,35 @@
</xs:restriction>
</xs:simpleType>

<!-- https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-type/#atlas-vector-search-index-fields -->
<xs:complexType name="vector-search-indexes">
<xs:choice maxOccurs="unbounded">
<xs:element name="vector-search-index" type="odm:vector-search-index" maxOccurs="unbounded" />
</xs:choice>
</xs:complexType>

<xs:complexType name="vector-search-index">
<xs:choice maxOccurs="unbounded">
<xs:element name="vector-field" type="odm:vector-search-vector-field" />
<xs:element name="filter-field" type="odm:vector-search-filter-field" minOccurs="0" maxOccurs="unbounded" />
</xs:choice>

<xs:attribute name="name" type="xs:string" />
</xs:complexType>

<xs:complexType name="vector-search-vector-field">
<xs:attribute name="path" type="xs:string" use="required" />
<xs:attribute name="numDimensions" type="xs:int" use="required" />
<xs:attribute name="similarity" type="xs:string" use="required" />
<xs:attribute name="quantization" type="xs:string" />
<xs:attribute name="hnswMaxEdges" type="xs:int" />
<xs:attribute name="hnswNumEdgeCandidates" type="xs:int" />
</xs:complexType>

<xs:complexType name="vector-search-filter-field">
<xs:attribute name="path" type="xs:string" use="required" />
</xs:complexType>

<xs:complexType name="shard-key">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="key" type="odm:shard-key-key" maxOccurs="unbounded" />
Expand Down
27 changes: 27 additions & 0 deletions lib/Doctrine/ODM/MongoDB/Mapping/Annotations/VectorSearchIndex.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
<?php

declare(strict_types=1);

namespace Doctrine\ODM\MongoDB\Mapping\Annotations;

use Attribute;
use Doctrine\Common\Annotations\Annotation\NamedArgumentConstructor;
use Doctrine\ODM\MongoDB\Mapping\ClassMetadata;

/**
* Defines a search index on a class.
*
* @Annotation
* @NamedArgumentConstructor
* @phpstan-import-type VectorSearchIndexField from ClassMetadata
*/
#[Attribute(Attribute::TARGET_CLASS | Attribute::IS_REPEATABLE)]
class VectorSearchIndex implements Annotation
{
/** @param list<VectorSearchIndexField> $fields */
public function __construct(
public array $fields,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the fields are required, it has to be the 1st parameter of the constructor.

public ?string $name = null,
) {
}
}
35 changes: 32 additions & 3 deletions lib/Doctrine/ODM/MongoDB/Mapping/ClassMetadata.php
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
use ReflectionNamedType;
use ReflectionProperty;

use function array_column;
use function array_filter;
use function array_key_exists;
use function array_keys;
Expand Down Expand Up @@ -262,6 +263,17 @@
* name: string,
* definition: SearchIndexDefinition
* }
* @phpstan-type VectorSearchIndexField array{
* type: 'vector'|'filter',
* path: string,
* numDimensions?: int,
* similarity?: self::VECTOR_SIMILARITY_*,
* quantization?: self::VECTOR_QUANTIZATION_*,
* hnswOptions?: array{maxEdges?: int, numEdgeCandidates?: int}
* }
* @phpstan-type VectorSearchIndexDefinition array{
* fields: list<VectorSearchIndexField>
* }
* @phpstan-type ShardKeys array<string, mixed>
* @phpstan-type ShardOptions array<string, mixed>
* @phpstan-type ShardKey array{
Expand Down Expand Up @@ -459,6 +471,13 @@
*/
public const DEFAULT_SEARCH_INDEX_NAME = 'default';

public const VECTOR_SIMILARITY_EUCLIDEAN = 'euclidean';
public const VECTOR_SIMILARITY_COSINE = 'cosine';
public const VECTOR_SIMILARITY_DOT_PRODUCT = 'dot_product';
public const VECTOR_QUANTIZATION_NONE = 'none';
public const VECTOR_QUANTIZATION_SCALAR = 'scalar';
public const VECTOR_QUANTIZATION_BINARY = 'binary';

private const ALLOWED_GRIDFS_FIELDS = ['_id', 'chunkSize', 'filename', 'length', 'metadata', 'uploadDate'];

/**
Expand Down Expand Up @@ -1243,19 +1262,29 @@ public function hasIndexes(): bool
/**
* Add a search index for this Document.
*
* @phpstan-param SearchIndexDefinition $definition
* @phpstan-param SearchIndexDefinition|VectorSearchIndexDefinition $definition
* @phpstan-param 'search'|'vectorSearch' $type
*/
public function addSearchIndex(array $definition, ?string $name = null): void
public function addSearchIndex(array $definition, ?string $name = null, string $type = 'search'): void
{
$name ??= self::DEFAULT_SEARCH_INDEX_NAME;

if (empty($definition['mappings']['dynamic']) && empty($definition['mappings']['fields'])) {
if ($type !== 'search' && $type !== 'vectorSearch') {
throw new InvalidArgumentException(sprintf('Search index type must be either "search" or "vectorSearch", "%s" given.', $type));
}

if ($type === 'search' && empty($definition['mappings']['dynamic']) && empty($definition['mappings']['fields'])) {
throw MappingException::emptySearchIndexDefinition($this->name, $name);
}

if ($type === 'vectorSearch' && ! in_array('vector', array_column($definition['fields'] ?? [], 'type'), true)) {
throw MappingException::emptyVectorSearchIndexDefinition($this->name, $name);
}

$this->searchIndexes[] = [
'definition' => $definition,
'name' => $name,
'type' => $type,
];
}

Expand Down
19 changes: 16 additions & 3 deletions lib/Doctrine/ODM/MongoDB/Mapping/Driver/AttributeDriver.php
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
use Doctrine\ODM\MongoDB\Events;
use Doctrine\ODM\MongoDB\Mapping\Annotations as ODM;
use Doctrine\ODM\MongoDB\Mapping\Annotations\AbstractIndex;
use Doctrine\ODM\MongoDB\Mapping\Annotations\SearchIndex;
use Doctrine\ODM\MongoDB\Mapping\Annotations\ShardKey;
use Doctrine\ODM\MongoDB\Mapping\Annotations\TimeSeries;
use Doctrine\ODM\MongoDB\Mapping\ClassMetadata;
Expand Down Expand Up @@ -108,6 +107,10 @@ public function loadMetadataForClass($className, PersistenceClassMetadata $metad
$this->addSearchIndex($metadata, $attribute);
}

if ($attribute instanceof ODM\VectorSearchIndex) {
$this->addVectorSearchIndex($metadata, $attribute);
}

if ($attribute instanceof ODM\Indexes) {
trigger_deprecation(
'doctrine/mongodb-odm',
Expand Down Expand Up @@ -370,7 +373,7 @@ private function addIndex(ClassMetadata $class, AbstractIndex $index, array $key
}

/** @param ClassMetadata<object> $class */
private function addSearchIndex(ClassMetadata $class, SearchIndex $index): void
private function addSearchIndex(ClassMetadata $class, ODM\SearchIndex $index): void
{
$definition = [];

Expand All @@ -386,7 +389,17 @@ private function addSearchIndex(ClassMetadata $class, SearchIndex $index): void
}
}

$class->addSearchIndex($definition, $index->name ?? null);
$class->addSearchIndex($definition, $index->name ?? null, 'search');
}

/** @param ClassMetadata<object> $class */
private function addVectorSearchIndex(ClassMetadata $class, ODM\VectorSearchIndex $index): void
{
$definition = [
'fields' => $index->fields,
];

$class->addSearchIndex($definition, $index->name ?? null, 'vectorSearch');
}

/**
Expand Down
45 changes: 45 additions & 0 deletions lib/Doctrine/ODM/MongoDB/Mapping/Driver/XmlDriver.php
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,12 @@ public function loadMetadataForClass($className, \Doctrine\Persistence\Mapping\C
}
}

if (isset($xmlRoot->{'vector-search-indexes'})) {
foreach ($xmlRoot->{'vector-search-indexes'}->{'vector-search-index'} as $searchIndex) {
$this->addVectorSearchIndex($metadata, $searchIndex);
}
}

if (isset($xmlRoot->{'shard-key'})) {
$this->setShardKey($metadata, $xmlRoot->{'shard-key'}[0]);
}
Expand Down Expand Up @@ -748,6 +754,45 @@ private function getSearchIndexFieldDefinition(SimpleXMLElement $field): array
return $fieldDefinition;
}

/** @param ClassMetadata<object> $class */
private function addVectorSearchIndex(ClassMetadata $class, SimpleXMLElement $searchIndex): void
{
$definition = ['fields' => []];

foreach ($searchIndex->{'vector-field'} as $vectorField) {
$field = [
'type' => 'vector',
'path' => (string) $vectorField['path'],
'numDimensions' => (int) $vectorField['numDimensions'],
'similarity' => (string) $vectorField['similarity'],
];
if (isset($vectorField['quantization'])) {
$field['quantization'] = (string) $vectorField['quantization'];
}

if (isset($vectorField['hnswMaxEdges'])) {
$field['hnswOptions']['maxEdges'] = (int) $vectorField['hnswMaxEdges'];
}

if (isset($vectorField['hnswNumEdgeCandidates'])) {
$field['hnswOptions']['numEdgeCandidates'] = (int) $vectorField['hnswNumEdgeCandidates'];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my information, when/how do the types specified in the XML schema get checked? I see that the schema defines the types, but does that get validated somewhere before we get here and cast them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type is only used when validating the XML file with the XSD. It's never used to cast the node to the correct type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I mean it's cast here on line 778 so I was wondering if we're making sure somewhere else that it actually is a number. But ok cool, so it is validated!

}

$definition['fields'][] = $field;
}

foreach ($searchIndex->{'filter-field'} as $filterField) {
$definition['fields'][] = [
'type' => 'filter',
'path' => (string) $filterField['path'],
];
}

$name = isset($searchIndex['name']) ? (string) $searchIndex['name'] : null;

$class->addSearchIndex($definition, $name, 'vectorSearch');
}

/** @return array<string, array<string, mixed>|scalar|null> */
private function getPartialFilterExpression(SimpleXMLElement $fields): array
{
Expand Down
5 changes: 5 additions & 0 deletions lib/Doctrine/ODM/MongoDB/Mapping/MappingException.php
Original file line number Diff line number Diff line change
Expand Up @@ -297,6 +297,11 @@ public static function emptySearchIndexDefinition(string $className, string $ind
return new self(sprintf('%s search index "%s" must be dynamic or specify a field mapping', $className, $indexName));
}

public static function emptyVectorSearchIndexDefinition(string $className, string $indexName): self
{
return new self(sprintf('%s vector search index "%s" must have a vector field', $className, $indexName));
}

public static function timeSeriesFieldNotFound(string $className, string $fieldName, string $field): self
{
return new self(sprintf(
Expand Down
12 changes: 12 additions & 0 deletions phpstan-baseline.neon
Original file line number Diff line number Diff line change
Expand Up @@ -1866,6 +1866,18 @@ parameters:
count: 1
path: tests/Doctrine/ODM/MongoDB/Tests/Mapping/ClassMetadataLoadEventTest.php

-
message: '#^Method Doctrine\\ODM\\MongoDB\\Tests\\Mapping\\ClassMetadataTest\:\:testEmptyVectorSearchIndexDefinition\(\) has parameter \$definition with no value type specified in iterable type array\.$#'
identifier: missingType.iterableValue
count: 1
path: tests/Doctrine/ODM/MongoDB/Tests/Mapping/ClassMetadataTest.php

-
message: '#^Method Doctrine\\ODM\\MongoDB\\Tests\\Mapping\\ClassMetadataTest\:\:testSearchIndexDefinition\(\) has parameter \$definition with no value type specified in iterable type array\.$#'
identifier: missingType.iterableValue
count: 1
path: tests/Doctrine/ODM/MongoDB/Tests/Mapping/ClassMetadataTest.php

-
message: '#^Property DoctrineGlobal_User\:\:\$email is unused\.$#'
identifier: property.unused
Expand Down
Loading
Loading