Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qdrant Internal Server Error on recreate timeout for > 40M vectors: Waiting for Consensus Operation Commit Failed #162

Closed
filipecosta90 opened this issue Jun 12, 2024 · 1 comment · May be fixed by #163

Comments

@filipecosta90
Copy link
Contributor

filipecosta90 commented Jun 12, 2024

Encountered an unexpected 500 (Internal Server Error) while using the Qdrant vector benchmark tool on a remote qdrant setup.
This happens when we want to run multiple experiments with different index definitions one after the other (run benchmark -> recreate -> etc.... ).

The raw response content is as follows:

UnexpectedResponse: Unexpected Response: 500 (Internal Server Error)
Raw response content:
b'{"status":{"error":"Service internal error: Waiting for consensus operation commit failed. Timeout set at: 300 seconds"},"time":300.000856421}'

Unwinding of error:

│ /root/vector-db-benchmark/engine/base_client/client.py:107 in run_experiment                     │
│                                                                                                  │
│   104 │   │                                                                                      │
│   105 │   │   if not skip_upload:                                                                │
│   106 │   │   │   print("Experiment stage: Configure")                                           │
│ ❱ 107 │   │   │   self.configurator.configure(dataset)                                           │
│   108 │   │   │                                                                                  │
│   109 │   │   │   print("Experiment stage: Upload")                                              │
│   110 │   │   │   upload_stats = self.uploader.upload(                                           │
│                                                                                                  │
│ ╭──────────────────────────────────────── locals ────────────────────────────────────────╮       │
│ │          dataset = <benchmark.dataset.Dataset object at 0x7da84cfdfc10>                │       │
│ │ execution_params = {}                                                                  │       │
│ │ existing_results = []                                                                  │       │
│ │        parallels = []                                                                  │       │
│ │           reader = <dataset_reader.ann_h5_reader.AnnH5Reader object at 0x7da84cfdfee0> │       │
│ │             self = <engine.base_client.client.BaseClient object at 0x7da84cd653c0>     │       │
│ │   skip_if_exists = True                                                                │       │
│ │      skip_search = False                                                               │       │
│ │      skip_upload = False                                                               │       │
│ ╰────────────────────────────────────────────────────────────────────────────────────────╯       │
│                                                                                                  │
│ /root/vector-db-benchmark/engine/base_client/configure.py:22 in configure                        │
│                                                                                                  │
│   19 │                                                                                           │
│   20 │   def configure(self, dataset: Dataset) -> Optional[dict]:                                │
│   21 │   │   self.clean()                                                                        │
│ ❱ 22 │   │   return self.recreate(dataset, self.collection_params) or {}                         │
│   23 │                                                                                           │
│   24 │   def execution_params(self, distance, vector_size) -> dict:                              │
│   25 │   │   return {}                                                                           │
│                                                                                                  │
│ ╭──────────────────────────────────────── locals ─────────────────────────────────────────╮      │
│ │ dataset = <benchmark.dataset.Dataset object at 0x7da84cfdfc10>                          │      │
│ │    self = <engine.clients.qdrant.configure.QdrantConfigurator object at 0x7da84cd65a20> │      │
│ ╰─────────────────────────────────────────────────────────────────────────────────────────╯      │
│                                                                                                  │
│ /root/vector-db-benchmark/engine/clients/qdrant/configure.py:43 in recreate                      │
│                                                                                                  │
│   40 │   │   res = self.client.delete_collection(collection_name=QDRANT_COLLECTION_NAME)         │
│   41 │                                                                                           │
│   42 │   def recreate(self, dataset: Dataset, collection_params):                                │
│ ❱ 43 │   │   self.client.recreate_collection(                                                    │
│   44 │   │   │   collection_name=QDRANT_COLLECTION_NAME,                                         │
│   45 │   │   │   vectors_config=rest.VectorParams(                                               │
│   46 │   │   │   │   size=dataset.config.vector_size,                                            │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ collection_params = {                                                                        │ │
│ │                     │   'timeout': 300,                                                      │ │
│ │                     │   'optimizers_config': {'memmap_threshold': 25000000},                 │ │
│ │                     │   'hnsw_config': {'m': 16, 'ef_construct': 512}                        │ │
│ │                     }                                                                        │ │
│ │           dataset = <benchmark.dataset.Dataset object at 0x7da84cfdfc10>                     │ │
│ │              self = <engine.clients.qdrant.configure.QdrantConfigurator object at            │ │
│ │                     0x7da84cd65a20>                                                          │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/qdrant_client/qdrant_client.py:1824 in                   │
│ recreate_collection                                                                              │
│                                                                                                  │
│   1821 │   │   │   stacklevel=2,                                                                 │
│   1822 │   │   )                                                                                 │
│   1823 │   │                                                                                     │
│ ❱ 1824 │   │   return self._client.recreate_collection(                                          │
│   1825 │   │   │   collection_name=collection_name,                                              │
│   1826 │   │   │   vectors_config=vectors_config,                                                │
│   1827 │   │   │   shard_number=shard_number,                                                    │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │          collection_name = 'benchmark'                                                       │ │
│ │              hnsw_config = {'m': 16, 'ef_construct': 512}                                    │ │
│ │                init_from = None                                                              │ │
│ │                   kwargs = {}                                                                │ │
│ │          on_disk_payload = None                                                              │ │
│ │        optimizers_config = {'memmap_threshold': 25000000}                                    │ │
│ │      quantization_config = None                                                              │ │
│ │       replication_factor = None                                                              │ │
│ │                     self = <qdrant_client.qdrant_client.QdrantClient object at               │ │
│ │                            0x7da84cd65b10>                                                   │ │
│ │             shard_number = None                                                              │ │
│ │          sharding_method = None                                                              │ │
│ │    sparse_vectors_config = None                                                              │ │
│ │                  timeout = 300                                                               │ │
│ │           vectors_config = VectorParams(                                                     │ │
│ │                            │   size=512,                                                     │ │
│ │                            │   distance=<Distance.COSINE: 'Cosine'>,                         │ │
│ │                            │   hnsw_config=None,                                             │ │
│ │                            │   quantization_config=None,                                     │ │
│ │                            │   on_disk=None,                                                 │ │
│ │                            │   datatype=None                                                 │ │
│ │                            )                                                                 │ │
│ │               wal_config = None                                                              │ │
│ │ write_consistency_factor = None                                                              │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/qdrant_client/qdrant_remote.py:2288 in                   │
│ recreate_collection                                                                              │
│                                                                                                  │
│   2285 │   │   sharding_method: Optional[types.ShardingMethod] = None,                           │
│   2286 │   │   **kwargs: Any,                                                                    │
│   2287 │   ) -> bool:                                                                            │
│ ❱ 2288 │   │   self.delete_collection(collection_name, timeout=timeout)                          │
│   2289 │   │                                                                                     │
│   2290 │   │   return self.create_collection(                                                    │
│   2291 │   │   │   collection_name=collection_name,                                              │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │          collection_name = 'benchmark'                                                       │ │
│ │              hnsw_config = {'m': 16, 'ef_construct': 512}                                    │ │
│ │                init_from = None                                                              │ │
│ │                   kwargs = {}                                                                │ │
│ │          on_disk_payload = None                                                              │ │
│ │        optimizers_config = {'memmap_threshold': 25000000}                                    │ │
│ │      quantization_config = None                                                              │ │
│ │       replication_factor = None                                                              │ │
│ │                     self = <qdrant_client.qdrant_remote.QdrantRemote object at               │ │
│ │                            0x7da84cd65210>                                                   │ │
│ │             shard_number = None                                                              │ │
│ │          sharding_method = None                                                              │ │
│ │    sparse_vectors_config = None                                                              │ │
│ │                  timeout = 300                                                               │ │
│ │           vectors_config = VectorParams(                                                     │ │
│ │                            │   size=512,                                                     │ │
│ │                            │   distance=<Distance.COSINE: 'Cosine'>,                         │ │
│ │                            │   hnsw_config=None,                                             │ │
│ │                            │   quantization_config=None,                                     │ │
│ │                            │   on_disk=None,                                                 │ │
│ │                            │   datatype=None                                                 │ │
│ │                            )                                                                 │ │
│ │               wal_config = None                                                              │ │
│ │ write_consistency_factor = None                                                              │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/qdrant_client/qdrant_remote.py:2156 in delete_collection │
│                                                                                                  │
│   2153 │   │   │   │   timeout=self._timeout,                                                    │
│   2154 │   │   │   ).result                                                                      │
│   2155 │   │                                                                                     │
│ ❱ 2156 │   │   result: Optional[bool] = self.http.collections_api.delete_collection(             │
│   2157 │   │   │   collection_name, timeout=timeout                                              │
│   2158 │   │   ).result                                                                          │
│   2159 │   │   assert result is not None, "Delete collection returned None"                      │
│                                                                                                  │
│ ╭─────────────────────────────────────── locals ────────────────────────────────────────╮        │
│ │ collection_name = 'benchmark'                                                         │        │
│ │          kwargs = {}                                                                  │        │
│ │            self = <qdrant_client.qdrant_remote.QdrantRemote object at 0x7da84cd65210> │        │
│ │         timeout = 300                                                                 │        │
│ ╰───────────────────────────────────────────────────────────────────────────────────────╯        │
│ 

Expected behaviour:

I expected the the timeout to happen but not to return a 500 error from the DB server.

@KShivendu
Copy link
Member

KShivendu commented Aug 5, 2024

Thanks for creating this issue. This timeout is expected assuming your cluster was in a unhealthy state. Maybe we could have returned something other than 500 but it's not much of a problem.

Closing this issue for now. Let me know if you have anything to add :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants