Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
138 changes: 60 additions & 78 deletions modules/ROOT/pages/embeddings.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,6 @@ This page shows how these embeddings can be created and stored as properties on
[TIP]
For a hands-on guide on how to use the GenAI plugin on a Neo4j database, see link:https://neo4j.com/docs/genai/tutorials/embeddings-vector-indexes/[Embeddings & Vector Indexes Tutorial -> Create embeddings with cloud AI providers].

[TIP]
To learn more about using embeddings in combination with vector indexes, see link:{neo4j-docs-base-uri}/cypher-manual/25/indexes/semantic-indexes/vector-indexes/#embeddings[Cypher -> Vector indexes -> Vectors and embeddings in Neo4j].


[[example-graph]]
== Example graph
Expand All @@ -30,13 +27,36 @@ Dump files can be imported for both link:{neo4j-docs-base-uri}/aura/auradb/impor
The embeddings on this page are generated using the link:https://platform.openai.com/docs/guides/embeddings[OpenAI] model `text-embedding-ada-002` (1536-dimensional vectors).


[[single-embedding]]
== Generate a single embedding and store it

Use the `genai.vector.encode()` function to generate a vector embedding for a single value.

.Signature for `genai.vector.encode()` label:function[]
[source,syntax]
----
ai.text.embed(resource :: STRING, provider :: STRING, configuration = {} :: MAP) :: VECTOR
----

resource (`STRING`):: The string to transform into an embedding.
provider (`STRING`):: Case-insensitive identifier of the AI provider to use.
See xref:reference/ai-providers.adoc[] for supported options.
configuration (`MAP`):: Provider-specific options.
See xref:reference/ai-providers.adoc[] for details of each supported provider.
Note that because this argument may contain sensitive data, it is obfuscated in the link:https://neo4j.com/docs/operations-manual/current/monitoring/logging/[query.log].
However, if the function call is misspelled or the query is otherwise malformed, it may be logged without being obfuscated.

[NOTE]
This function sends one API request every time it is called, which may result in a lot of overhead in terms of both network traffic and latency.
If you want to generate many embeddings at once, use xref:multiple-embeddings[].

[.tabbed-example]
====
[.include-with-Store-embedding-as-a-vector]
======
label:enteprise-edition[]

`genai.vector.encode()` returns a `LIST<FLOAT>`.
To convert and store this value as a link:https://neo4j.com/docs/cypher-manual/25/values-and-types/vector/[`VECTOR`], use the link:https://neo4j.com/docs/cypher-manual/25/functions/vector/#functions-vector[`vector()`] function.
`ai.text.embed()` returns a link:https://neo4j.com/docs/cypher-manual/25/values-and-types/vector/[`VECTOR`].
Storing `VECTOR` values on self-managed instances requires Enterprise Edition and link:{neo4j-docs-base-uri}/operations-manual/current/database-internals/store-formats/#store-format-overview[block format].

.Create a `VECTOR` embedding property for the Godfather
Expand All @@ -45,14 +65,14 @@ Storing `VECTOR` values on self-managed instances requires Enterprise Edition an
MATCH (m:Movie {title:'Godfather, The'})
WHERE m.plot IS NOT NULL AND m.title IS NOT NULL
WITH m, m.title || ' ' || m.plot AS titleAndPlot // <1>
WITH m, genai.vector.encode(titleAndPlot, 'OpenAI', { token: $openaiToken }) AS vector // <2>
SET m.embedding = vector(vector, 1536, FLOAT32) // <3>
WITH m, ai.text.embed(titleAndPlot, 'OpenAI', { token: $openaiToken }) AS vector // <2>
SET m.embedding = vector // <3>
RETURN m.embedding AS embedding
----

<1> Concatenate the `title` and `plot` of the `Movie` into a single `STRING`.
<2> Create a 1536 dimensional embedding from `titleAndPlot`.
<3> Store the `propertyVector` as a new `VECTOR` `embedding` property on `The Godfather` node.
<2> Create a 1536-dimensional embedding from `titleAndPlot`.
<3> Store the `propertyVector` into an `embedding` property (type `VECTOR`) on `The Godfather` node.

.Result (output capped after 4 entries)
[source]
Expand All @@ -70,42 +90,23 @@ RETURN m.embedding AS embedding
[.include-with-Store-embedding-as-a-list]
======

Use the `db.create.setNodeVectorProperty` procedure to store an embedding as a `LIST<FLOAT>` value to a node property.

.Signature for `db.create.setNodeVectorProperty()` label:procedure[]
[source,syntax]
----
db.create.setNodeVectorProperty(node :: NODE, key :: STRING, vector :: ANY)
----

Use the `db.create.setRelationshipVectorProperty` procedure to store an embedding as a `LIST<FLOAT>` value to a relationship property.

.Signature for `db.create.setRelationshipVectorProperty()` label:procedure[]
[source,syntax]
----
db.create.setRelationshipVectorProperty(relationship :: RELATIONSHIP, key :: STRING, vector :: ANY)
----

* `node` or `relationship` is the entity in which the new property will be stored.
* `key` (a `STRING`) is the name of the new property containing the embedding.
* `vector` is the object containing the embedding.

The embeddings are stored as properties on nodes or relationships with the type `LIST<INTEGER | FLOAT>`.
`ai.text.embed()` returns a link:https://neo4j.com/docs/cypher-manual/25/values-and-types/vector/[`VECTOR`], which can be converted into a list with link:https://neo4j.com/docs/cypher-manual/25/functions/list/#functions-tofloatlist[`toFloatList()`].

.Create a `LIST<FLOAT>` embedding property for the Godfather
[source,cypher]
----
MATCH (m:Movie {title:'Godfather, The'})
WHERE m.plot IS NOT NULL AND m.title IS NOT NULL
WITH m, m.title || ' ' || m.plot AS titleAndPlot // <1>
WITH m, genai.vector.encode(titleAndPlot, 'OpenAI', { token: $openaiToken }) AS vector // <2>
CALL db.create.setNodeVectorProperty(m, 'embedding', vector) // <3>
WITH m, ai.text.embed(titleAndPlot, 'OpenAI', { token: $openaiToken }) AS vector // <2>
CALL db.create.setNodeVectorProperty(m, 'embedding', toFloatList(vector)) // <3>
RETURN m.embedding AS embedding
----

<1> Concatenate the `title` and `plot` of the `Movie` into a single `STRING`.
<2> Create a 1536 dimensional embedding from `titleAndPlot`.
<3> Store the `propertyVector` as a new `LIST<FLOAT>` `embedding` property on `The Godfather` node.
<3> Store the `propertyVector` into an `embedding` property (type `LIST<FLOAT>`) on `The Godfather` node.
The procedures link:https://neo4j.com/docs/operations-manual/current/procedures/#procedure_db_create_setNodeVectorProperty[`db.create.setNodeVectorProperty`] and link:https://neo4j.com/docs/operations-manual/current/procedures/#procedure_db_create_setRelationshipVectorProperty[`db.create.setRelationshipVectorProperty`] store the list with a more space-efficient representation.

.Result (output capped after 4 entries)
[source]
Expand All @@ -124,42 +125,42 @@ RETURN m.embedding AS embedding
[[multiple-embeddings]]
== Generate a batch of embeddings and store them

Use the `genai.vector.encodeBatch` procedure to generate many vector embeddings with a single API request.
Use the `ai.text.embedBatch` procedure to generate many vector embeddings with a single API request.
This procedure takes a list of resources as an input, and returns the same number of result rows.

[IMPORTANT]
====
This procedure attempts to generate embeddings for all supplied resources in a single API request.
Check the respective provider's documentation for details on, for example, the maximum number of embeddings that can be generated per request.
Providing too many resources may cause the AI provider to time out or to reject the request.
====

.Signature for `genai.vector.encodeBatch` label:procedure[]
.Signature for `ai.text.embedBatch` label:procedure[]
[source,syntax]
----
genai.vector.encodeBatch(resources :: LIST<STRING>, provider :: STRING, configuration :: MAP = {}) :: (index :: INTEGER, resource :: STRING, vector :: LIST<FLOAT>)
ai.text.embedBatch(resources :: LIST<STRING>, provider :: STRING, configuration :: MAP = {}) :: (index :: INTEGER, resource :: STRING, vector :: VECTOR)
----

* The `resources` (a `LIST<STRING>`) parameter is the list of objects to transform into embeddings, such as chunks of text.
* The `provider` (a `STRING`) is the case-insensitive identifier of the provider to use.
resources (`LIST<STRING>`):: The strings to transform into embeddings.
provider (`STRING`):: Case-insensitive identifier of the provider to use.
See xref:reference/ai-providers.adoc[] for supported options.
* The `configuration` (a `MAP`) specifies provider-specific settings such as which model to invoke, as well as any required API credentials.
configuration (`MAP`):: Provider-specific options.
See xref:reference/ai-providers.adoc[] for details of each supported provider.
Note that because this argument may contain sensitive data, it is obfuscated in the link:https://neo4j.com/docs/operations-manual/current/monitoring/logging/[query.log].
However, if the function call is misspelled or the query is otherwise malformed, it will be logged without being obfuscated.

Each returned row contains the following columns:

* The `index` (an `INTEGER`) is the index of the corresponding element in the input list, to aid in correlating results back to inputs.
* The `resource` (a `STRING`) is the name of the input resource.
* The `vector` (a `LIST<FLOAT>`) is the generated vector embedding for this resource.
index (`INTEGER`):: The index of the corresponding element in the input list, to correlate results back to inputs.
resource (`STRING`):: The given input resource.
vector (`VECTOR`):: The generated vector embedding for this resource.

[.tabbed-example]
====
[.include-with-Store-embeddings-as-vectors]
======
label:enterprise-edition[]

`genai.vector.encode()` returns a `LIST<FLOAT>`.
To convert and store this value as a link:https://neo4j.com/docs/cypher-manual/25/values-and-types/vector/[`VECTOR`], use the link:https://neo4j.com/docs/cypher-manual/25/functions/vector/#functions-vector[`vector()`] function.
`ai.text.embedBatch()` returns a `VECTOR` for each input resource.
Storing `VECTOR` values on an on-prem instance requires Enterprise Edition and link:{neo4j-docs-base-uri}/operations-manual/current/database-internals/store-formats/#store-format-overview[block format].

.Create embeddings from a limited number of properties and store them as `VECTOR` properties
Expand All @@ -170,15 +171,15 @@ WITH m
LIMIT 20
WITH collect(m) AS moviesList // <1>
WITH moviesList, [movie IN moviesList | movie.title || ': ' || movie.plot] AS batch // <2>
CALL genai.vector.encodeBatch(batch, 'OpenAI', { token: $openaiToken }) YIELD index, vector
CALL ai.text.embedBatch(batch, 'OpenAI', { token: $openaiToken }) YIELD index, vector
WITH moviesList, index, vector
MATCH (toUpdate:Movie {title: moviesList[index]['title']})
SET toUpdate.embedding = vector(vector, 1536, FLOAT32) // <3>
SET toUpdate.embedding = vector // <3>
----

<1> link:https://neo4j.com/docs/cypher-manual/25/functions/aggregating/#functions-collect[Collect] all 20 `Movie` nodes into a `LIST<NODE>`.
<2> A link:https://neo4j.com/docs/cypher-manual/25/expressions/list-expressions/#list-comprehension[list comprehension] (`[]`) extracts the `title` and `plot` properties of the movies in `moviesList` into a new `LIST<STRING>`.
<3> `db.create.setNodeVectorProperty` is run for each `vector` returned by `genai.vector.encodeBatch()`, and stores that vector as a property named `embedding` on the corresponding node.
<3> `SET` is run for each `vector` returned by `ai.text.embedBatch()`, and stores that vector as a property named `embedding` on the corresponding node.

.Create embeddings from a large number properties and store them as `VECTOR` properties
[source, cypher]
Expand All @@ -190,9 +191,9 @@ WITH collect(m) AS moviesList, // <1>
UNWIND range(0, total-1, batchSize) AS batchStart // <3>
CALL (moviesList, batchStart, batchSize) { // <4>
WITH [movie IN moviesList[batchStart .. batchStart + batchSize] | movie.title || ': ' || movie.plot] AS batch // <5>
CALL genai.vector.encodeBatch(batch, 'OpenAI', { token: $openaiToken }) YIELD index, vector
CALL ai.text.embedBatch(batch, 'OpenAI', { token: $openaiToken }) YIELD index, vector
MATCH (toUpdate:Movie {title: moviesList[batchStart + index]['title']})
SET toUpdate.embedding = vector(vector, 1536, FLOAT32) // <6>
SET toUpdate.embedding = vector // <6>
} IN CONCURRENT TRANSACTIONS OF 1 ROW // <7>
----

Expand All @@ -207,7 +208,7 @@ Note that this `CALL` subquery uses a link:https://neo4j.com/docs/cypher-manual/
<5> `batch` is a list of strings, each being the concatenation of `title` and `plot` of one movie.
<6> The procedure sets `vector` as value for the property named `embedding` for the node at position `batchStart + index` in the `moviesList`.
<7> Set to `1` the amount of batches to be processed at once.
For more information on concurrency in transactions, see link:https://neo4j.com/docs/cypher-manual/25/subqueries/subqueries-in-transactions/#concurrent-transactions[`CALL` subqueries -> Concurrent transactions]).
For more information on concurrency in transactions, see link:https://neo4j.com/docs/cypher-manual/25/subqueries/subqueries-in-transactions/#concurrent-transactions[`CALL` subqueries -> Concurrent transactions].

[NOTE]
This example may not scale to larger datasets, as `collect(m)` requires the whole result set to be loaded in memory.
Expand All @@ -219,27 +220,7 @@ For an alternative method more suitable to processing large amounts of data, see
[.include-with-Store-embeddings-as-lists]
======

Use the `db.create.setNodeVectorProperty` procedure to store an embedding as a `LIST<FLOAT>` value to a node property.

.Signature for `db.create.setNodeVectorProperty` label:procedure[]
[source,syntax]
----
db.create.setNodeVectorProperty(node :: NODE, key :: STRING, vector :: ANY)
----

Use the `db.create.setRelationshipVectorProperty` procedure to store an embedding as a `LIST<FLOAT>` value to a relationship property.

.Signature for `db.create.setRelationshipVectorProperty` label:procedure[]
[source,syntax]
----
db.create.setRelationshipVectorProperty(relationship :: RELATIONSHIP, key :: STRING, vector :: ANY)
----

* `node` or `relationship` is the entity in which the new property will be stored.
* `key` (a `STRING`) is the name of the new property containing the embedding.
* `vector` is the object containing the embedding.

The embeddings are stored as properties on nodes or relationships with the type `LIST<INTEGER | FLOAT>`.
`ai.text.embedBatch()` returns a `VECTOR` for each input resource, which can be converted into a list with link:https://neo4j.com/docs/cypher-manual/25/functions/list/#functions-tofloatlist[`toFloatList()`].

.Create embeddings from a limited number of properties and store them as `LIST<FLOAT>` properties
[source, cypher]
Expand All @@ -249,14 +230,15 @@ WITH m
LIMIT 20
WITH collect(m) AS moviesList // <1>
WITH moviesList, [movie IN moviesList | movie.title || ': ' || movie.plot] AS batch // <2>
CALL genai.vector.encodeBatch(batch, 'OpenAI', { token: $openaiToken }) YIELD index, vector
CALL ai.text.embedBatch(batch, 'OpenAI', { token: $openaiToken }) YIELD index, vector
WITH moviesList, index, vector
CALL db.create.setNodeVectorProperty(moviesList[index], 'embedding', vector) // <3>
CALL db.create.setNodeVectorProperty(moviesList[index], 'embedding', toFloatList(vector)) // <3>
----

<1> link:https://neo4j.com/docs/cypher-manual/25/functions/aggregating/#functions-collect[Collect] all 20 `Movie` nodes into a `LIST<NODE>`.
<2> A link:https://neo4j.com/docs/cypher-manual/25/expressions/list-expressions/#list-comprehension[list comprehension] (`[]`) extracts the `title` and `plot` properties of the movies in `moviesList` into a new `LIST<STRING>`.
<3> `db.create.setNodeVectorProperty` is run for each `vector` returned by `genai.vector.encodeBatch()`, and stores that vector as a property named `embedding` on the corresponding node.
<3> Each vector is converted into a list of floats and stored as a property named `embedding` (type `LIST<FLOAT>`) on the corresponding node.
The procedures link:https://neo4j.com/docs/operations-manual/current/procedures/#procedure_db_create_setNodeVectorProperty[`db.create.setNodeVectorProperty`] and link:https://neo4j.com/docs/operations-manual/current/procedures/#procedure_db_create_setRelationshipVectorProperty[`db.create.setRelationshipVectorProperty`] store the list with a more space-efficient representation.

.Create embeddings from a large number properties and store them as `LIST<FLOAT>` values
[source, cypher]
Expand All @@ -268,8 +250,8 @@ WITH collect(m) AS moviesList, // <1>
UNWIND range(0, total-1, batchSize) AS batchStart // <3>
CALL (moviesList, batchStart, batchSize) { // <4>
WITH [movie IN moviesList[batchStart .. batchStart + batchSize] | movie.title || ': ' || movie.plot] AS batch // <5>
CALL genai.vector.encodeBatch(batch, 'OpenAI', { token: $openaiToken }) YIELD index, vector
CALL db.create.setNodeVectorProperty(moviesList[batchStart + index], 'embedding', vector) // <6>
CALL ai.text.embedBatch(batch, 'OpenAI', { token: $openaiToken }) YIELD index, vector
CALL db.create.setNodeVectorProperty(moviesList[batchStart + index], 'embedding', toFloatList(vector)) // <6>
} IN CONCURRENT TRANSACTIONS OF 1 ROW // <7>
----

Expand All @@ -284,7 +266,7 @@ Note that this `CALL` subquery uses a link:https://neo4j.com/docs/cypher-manual/
<5> `batch` is a list of strings, each being the concatenation of `title` and `plot` of one movie.
<6> The procedure sets `vector` as value for the property named `embedding` for the node at position `batchStart + index` in the `moviesList`.
<7> Set to `1` the amount of batches to be processed at once.
For more information on concurrency in transactions, see link:https://neo4j.com/docs/cypher-manual/25/subqueries/subqueries-in-transactions/#concurrent-transactions[`CALL` subqueries -> Concurrent transactions]).
For more information on concurrency in transactions, see link:https://neo4j.com/docs/cypher-manual/25/subqueries/subqueries-in-transactions/#concurrent-transactions[`CALL` subqueries -> Concurrent transactions].

[NOTE]
This example may not scale to larger datasets, as `collect(m)` requires the whole result set to be loaded in memory.
Expand Down
Loading