Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate zero vector when using cosine metric #1501

Merged

Conversation

bugmakerrrrrr
Copy link
Contributor

@bugmakerrrrrr bugmakerrrrrr commented Mar 2, 2024

Description

Validate the vector passed by the user during querying or indexing, if the vector is zero vector and the space type of index is cosine, an exception will be thrown.

Examples

  1. When we use cosine space type and index a zero vector through bulk api, the response returned will be as follows:
{
    "took": 2,
    "errors": true,
    "items": [
        {
            "index": {
                "_index": "products-shirts",
                "_id": "1",
                "status": 400,
                "error": {
                    "type": "mapper_parsing_exception",
                    "reason": "failed to parse field [item_vector] of type [knn_vector] in document with id '1'. Preview of field's value: 'null'",
                    "caused_by": {
                        "type": "illegal_argument_exception",
                        "reason": "zero vector is not supported when space type is [cosinesimil]"
                    }
                }
            }
        }
    ]
}
  1. If we use cosine metric and query index is a zero vector, the response returned will be as following:
{
    "error": {
        "root_cause": [
            {
                "type": "query_shard_exception",
                "reason": "failed to create query: zero vector is not supported when space type is [cosinesimil]",
                "index": "products-shirts",
                "index_uuid": "5ccs3V8rRceXFOaMAvDtMw"
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "products-shirts",
                "node": "Fq1_A7lHSAu25Beapfb5-A",
                "reason": {
                    "type": "query_shard_exception",
                    "reason": "failed to create query: zero vector is not supported when space type is [cosinesimil]",
                    "index": "products-shirts",
                    "index_uuid": "5ccs3V8rRceXFOaMAvDtMw",
                    "caused_by": {
                        "type": "illegal_argument_exception",
                        "reason": "zero vector is not supported when space type is [cosinesimil]"
                    }
                }
            }
        ]
    },
    "status": 400
}
  1. When we use euclidean space type and index a zero vector through bulk api, the response returned will be as follows:
{
    "took": 301,
    "errors": false,
    "items": [
        {
            "index": {
                "_index": "products-shirts",
                "_id": "12",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 1,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 11,
                "_primary_term": 1,
                "status": 201
            }
        }
    ]
}
  1. If we use euclidean metric and query index is a zero vector, the response returned will be as following:
{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "products-shirts",
                "_id": "12",
                "_score": 1.0,
                "_source": {
                    "item_vector": [
                        0,
                        0,
                        0
                    ],
                    "size": "large",
                    "rating": 3
                }
            },
            {
                "_index": "products-shirts",
                "_id": "11",
                "_score": 0.055555556,
                "_source": {
                    "item_vector": [
                        3,
                        2,
                        2
                    ],
                    "size": "large",
                    "rating": 6
                }
            }
        ]
    }
}

Issues Resolved

#1461

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

codecov bot commented Mar 2, 2024

Codecov Report

Attention: Patch coverage is 91.78082% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 85.16%. Comparing base (089db16) to head (7635cc6).

Files Patch % Lines
.../main/java/org/opensearch/knn/index/SpaceType.java 80.00% 1 Missing and 1 partial ⚠️
...rg/opensearch/knn/index/query/KNNQueryBuilder.java 66.66% 1 Missing and 1 partial ⚠️
...a/org/opensearch/knn/common/KNNValidationUtil.java 95.23% 0 Missing and 1 partial ⚠️
...nsearch/knn/index/mapper/KNNVectorFieldMapper.java 92.85% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               main    #1501   +/-   ##
=========================================
  Coverage     85.16%   85.16%           
- Complexity     1291     1300    +9     
=========================================
  Files           169      171    +2     
  Lines          5258     5293   +35     
  Branches        498      505    +7     
=========================================
+ Hits           4478     4508   +30     
- Misses          570      574    +4     
- Partials        210      211    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jmazanec15
Copy link
Member

Thanks @bugmakerrrrrr could you update the Changelog as well? https://github.com/opensearch-project/k-NN/blob/main/CHANGELOG.md

Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. Thanks for this! Will wait to approve on changelog verification

*/
public static boolean isZeroVector(byte[] vector) {
for (byte e : vector) {
if (e != (byte) 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: Do we need 0 cast?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessary, it's just my habit. I can remove it.

@bugmakerrrrrr bugmakerrrrrr force-pushed the validate_zero_vector branch 2 times, most recently from e4abb8e to 9b86b35 Compare March 7, 2024 12:08
@jmazanec15 jmazanec15 added Enhancements Increases software capabilities beyond original client specifications backport 2.x labels Mar 7, 2024
jmazanec15
jmazanec15 previously approved these changes Mar 7, 2024
Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks for the change!

@navneet1v
Copy link
Collaborator

@bugmakerrrrrr please add the successful indexing and query I/O either in description or reply to this message to provide information what is the experience now.

martin-gaievski
martin-gaievski previously approved these changes Mar 7, 2024
Copy link
Member

@martin-gaievski martin-gaievski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you

Comment on lines 284 to 288
new KNNVectorFieldType(buildFullName(context), metaValue, -1, knnMethodContext, modelIdAsString),
new KNNVectorFieldType(buildFullName(context), metaValue, -1, modelIdAsString),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we are making this change? and is this required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

knnMethodContext is always null for ModelFieldMapper, I believe that this change will make it more clear

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend keeping it as is. This will ensure that changes are minimal for this PR.

Comment on lines 38 to 40
import static org.opensearch.knn.index.mapper.KNNVectorFieldMapperUtil.validateByteVector;
import static org.opensearch.knn.index.mapper.KNNVectorFieldMapperUtil.validateByteVectorValue;
import static org.opensearch.knn.index.mapper.KNNVectorFieldMapperUtil.validateFloatVector;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems these utility class should be renamed or these functions which are shared between mapper and Query should be put in KNNValidationUtil to ensure that these functions can be shared between Query and Mapper classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense. I can pull these out into a separate KNNValidationUtil class

Comment on lines 85 to 99
public static void validateByteVector(byte[] vector, SpaceType spaceType) {
if (spaceType == SpaceType.COSINESIMIL && isZeroVector(vector)) {
throw new IllegalArgumentException(
String.format(Locale.ROOT, "zero vector is not supported when space type is [%s]", spaceType.getValue())
);
}
}

/**
* Validate if the given float vector is supported by the given space type
*
* @param vector the given vector
* @param spaceType the given space type
*/
public static void validateFloatVector(float[] vector, SpaceType spaceType) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make these parts of SpaceType enum class? Also, does these validations true for all engines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the zero vector validation is same for all engines, but maybe not true for other validations introduced in the future. For example, the dot product metric in Lucene requires all vectors must be normalized, but Faiss doesn't, so I tend to keep these part in this class, any thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if its true for all engines that it makes more sense to add as part of enum class.

package org.opensearch.knn.common;

public class KNNVectorUtil {
private KNNVectorUtil() {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use @NoArgsConstructor(access = Access.PRIVATE)

Comment on lines 17 to 36
public static boolean isZeroVector(byte[] vector) {
for (byte e : vector) {
if (e != 0) {
return false;
}
}
return true;
}

/**
* Check if all the elements of a given vector are zero
*
* @param vector the vector
* @return true if yes; otherwise false
*/
public static boolean isZeroVector(float[] vector) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets validate that these arrays are no null


if (fieldDimension == -1) {
// If dimension is not set, the field uses a model and the information needs to be retrieved from there
assert spaceType == null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Putting assert here is dangerous as this will lead to exception messages not understood by users. Lets make sure that we are making sure proper exceptions are thrown here.

  2. Due to this change the comment location have been moved. Lets ensure the comments is at right position.

@@ -308,6 +314,9 @@ protected Query doToQuery(QueryShardContext context) {
validateByteVectorValue(vector[i]);
byteVector[i] = (byte) vector[i];
}
validateByteVector(byteVector, spaceType);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this validation while reading the vectors in above loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cannot reuse the validation function or short circuit like VectorUtil.isZeroVector if we implement it in the loop. I prefer not to do so

@navneet1v
Copy link
Collaborator

@bugmakerrrrrr can we add a BWC test where we try to add 0 vectors in older version and then zero vectors in upgraded version.

Same for the query too.

@jmazanec15 jmazanec15 self-requested a review March 7, 2024 18:11
@jmazanec15 jmazanec15 dismissed their stale review March 7, 2024 19:46

blockers from navneets comments

@bugmakerrrrrr
Copy link
Contributor Author

@bugmakerrrrrr can we add a BWC test where we try to add 0 vectors in older version and then zero vectors in upgraded version.

Same for the query too.

why do we need a BWC test? I think this a breaking change

@navneet1v
Copy link
Collaborator

@bugmakerrrrrr can we add a BWC test where we try to add 0 vectors in older version and then zero vectors in upgraded version.
Same for the query too.

why do we need a BWC test? I think this a breaking change

if this is a breaking change, then what are the steps that users needs to take so that they can safely upgrade?

cc: @jmazanec15

@bugmakerrrrrr
Copy link
Contributor Author

if this is a breaking change, then what are the steps that users needs to take so that they can safely upgrade?

If the user upgrades the cluster, any request containing a zero vector will be rejected and fail when using cosine metric. And during the upgrade process, requests containing zero vectors will succeed on the old nodes but will be rejected on the upgraded nodes.

If the user uses non-zero vectors under cosine metric, there will be no impact. However, if the user uses zero vectors, it may be considered as an abuse or misconfiguration, and request failure may be acceptable.

@jmazanec15
Copy link
Member

I agree with @bugmakerrrrrr . From #1461 (comment), the error is:

org.opensearch.action.search.SearchPhaseExecutionException: all shards failed
	at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:665) [opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:373) [opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:704) [opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:473) [opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:295) [opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:104) [opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:74) [opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:755) [opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.transport.TransportService$6.handleException(TransportService.java:884) [opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1504) [opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1618) [opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1592) [opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:79) [opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.transport.TransportChannel.sendErrorResponse(TransportChannel.java:71) [opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:70) [opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.action.ActionRunnable.onFailure(ActionRunnable.java:103) [opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:54) [opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) [opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:102) [opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) [opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.9.0.jar:2.9.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: org.opensearch.OpenSearchException$3: Index 2147483647 out of bounds for length 12
	at org.opensearch.OpenSearchException.guessRootCauses(OpenSearchException.java:685) ~[opensearch-core-2.9.0.jar:2.9.0]
	at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:371) [opensearch-2.9.0.jar:2.9.0]
	... 23 more
Caused by: java.lang.IndexOutOfBoundsException: Index 2147483647 out of bounds for length 12
	at jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64) ~[?:?]
	at jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70) ~[?:?]
	at jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248) ~[?:?]
	at java.util.Objects.checkIndex(Objects.java:372) ~[?:?]
	at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:252) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.apache.lucene.index.FilterLeafReader.document(FilterLeafReader.java:404) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.apache.lucene.index.FilterLeafReader.document(FilterLeafReader.java:404) ~[lucene-core-9.7.0.jar:9.7.0 ccf4b198ec328095d45d2746189dc8ca633e8bcf - 2023-06-21 11:48:16]
	at org.opensearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:563) ~[opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.search.fetch.FetchPhase.prepareNonNestedHitContext(FetchPhase.java:337) ~[opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.search.fetch.FetchPhase.prepareHitContext(FetchPhase.java:298) ~[opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.search.fetch.FetchPhase.execute(FetchPhase.java:169) ~[opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.search.SearchService.executeFetchPhase(SearchService.java:628) ~[opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:603) ~[opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:565) ~[opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:73) ~[opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:88) ~[opensearch-2.9.0.jar:2.9.0]
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.9.0.jar:2.9.0]
	... 8 more

The Caused by: java.lang.IndexOutOfBoundsException: Index 2147483647 out of bounds for length 12 is concerning because it looks like it is returning garbage which can cause worse exceptions. Of more concern, this could cause a divide by 0 exception in native code, which would lead to node drop.

In terms of BWC, if a shard containing a 0 vector is moved to a new node, it wont fail to migrate - we dont check. It will only fail if a user attempts to reindex. That being said, I am not concerned about breaking BWC.

jmazanec15
jmazanec15 previously approved these changes Mar 13, 2024
@navneet1v
Copy link
Collaborator

So what we are suggesting here is, if a user has add a zero vector already in the index after upgrade we are okay to break the customer search query as zero vector has potential to go with divide by 0 exceptions.

@navneet1v
Copy link
Collaborator

@bugmakerrrrrr Code looks good to me. Please resolve the conflicts in Changelog.md file.

@jmazanec15
Copy link
Member

So what we are suggesting here is, if a user has add a zero vector already in the index after upgrade we are okay to break the customer search query as zero vector has potential to go with divide by 0 exceptions.

We would fail the query only if query vector is 0 (not if index vectors are 0). If it is 0, it will have meaningless results anyway.

@navneet1v
Copy link
Collaborator

So what we are suggesting here is, if a user has add a zero vector already in the index after upgrade we are okay to break the customer search query as zero vector has potential to go with divide by 0 exceptions.

We would fail the query only if query vector is 0 (not if index vectors are 0). If it is 0, it will have meaningless results anyway.

Makes sense. @jmazanec15 we should have a documentation issue also created for this. We need to capture this detail that providing zero vector for Cosine space type will start throwing exception going forward.

@jmazanec15
Copy link
Member

@navneet1v added doc issue: opensearch-project/documentation-website#6678.

@jmazanec15 jmazanec15 merged commit b7bdda4 into opensearch-project:main Mar 14, 2024
51 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1501-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 b7bdda4f7d05b3e3e4248020ce01b2c888359a7e
# Push it to GitHub
git push --set-upstream origin backport/backport-1501-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1501-to-2.x.

jmazanec15 pushed a commit to jmazanec15/k-NN-1 that referenced this pull request Mar 14, 2024
Ensure zero vector is not used when using functionality with cosine similarity metric.

Signed-off-by: panguixin <[email protected]>
(cherry picked from commit b7bdda4)
jmazanec15 pushed a commit to jmazanec15/k-NN-1 that referenced this pull request Mar 14, 2024
Ensure zero vector is not used when using functionality with cosine similarity metric.

Signed-off-by: panguixin <[email protected]>
(cherry picked from commit b7bdda4)
Signed-off-by: John Mazanec <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Enhancements Increases software capabilities beyond original client specifications
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants