New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[docs] Update Storage Documentation #1392

Open

davidmyriel wants to merge 8 commits into master from collection

Contributor

davidmyriel commented Jan 16, 2025

No description provided.


          add architecture and cleaner description

ffc7370

netlify bot commented Jan 16, 2025 •

edited

Loading

✅ Deploy Preview for condescending-goldwasser-91acf0 ready!

Name	Link
🔨 Latest commit	`048c521`
🔍 Latest deploy log	https://app.netlify.com/sites/condescending-goldwasser-91acf0/deploys/67b4d65ee70a6800088fb473
😎 Deploy Preview	https://deploy-preview-1392--condescending-goldwasser-91acf0.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

generall reviewed

View reviewed changes

qdrant-landing/static/documentation/concepts/storage/collection_architecture.png Outdated Show resolved Hide resolved

timvisee reviewed

View reviewed changes

qdrant-landing/content/documentation/concepts/storage.md Outdated Show resolved Hide resolved

davidmyriel and others added 7 commits

February 18, 2025 11:23


          Update qdrant-landing/content/documentation/concepts/storage.md

945a957

Co-authored-by: Tim Visée <[email protected]>


          fix foc

d66e5bb


          remoe image

69ce660


          Update storage.md

28f4528


          add shard description and architecture

4749e23


          Update storage.md

7f84a6c


          Update storage.md

048c521

timvisee reviewed

View reviewed changes

qdrant-landing/content/documentation/concepts/storage.md

-              The configuration of the segments in the collection can be different and independent of one another, but at least one `appendable' segment must be present in a collection.
+              Segments are made up of:
+              - Vector and Payload Storage

Member

timvisee Feb 19, 2025

Also quantized vectors (if configured)

qdrant-landing/content/documentation/concepts/storage.md

-              ## Vector storage
+              Segments can be either `appendable` or `non-appendable`:
+              - **Appendable segments**: You can add, delete, and query data freely.
+              - **Non-appendable segments**: You can only read and delete data.

Member

timvisee Feb 19, 2025

Payload data can also be updated in non-appendable segment. But maybe it's better to not mention that here.

qdrant-landing/content/documentation/concepts/storage.md

-              Depending on the requirements of the application, Qdrant can use one of the data storage options.
-              The choice has to be made between the search speed and the size of the RAM used.
+              Each collection can have segments that are configured differently, but it must include at least one `appendable` segment.

Member

timvisee Feb 19, 2025

Suggested change

      
            Each collection can have segments that are configured differently, but it must include at least one `appendable` segment.
          
            Each collection can have segments that are configured differently, but each replica has at least one `appendable` segment to support incoming writes.

qdrant-landing/content/documentation/concepts/storage.md

+              - **In-Memory storage**:
+                - Stores all vectors in RAM.
+                - Offers the highest speed because disk access is only needed for saving data.

Member

timvisee Feb 19, 2025

Suggested change

      
              - Offers the highest speed because disk access is only needed for saving data.
          
              - Offers the highest speed because disk access is only needed for loading data on start, or for saving data if changed.

qdrant-landing/content/documentation/concepts/storage.md

Comment on lines +164 to +167

    
              To configure when segments switch to **On-Disk** storage, use the `memmap_threshold` option. You can set this threshold in two ways:

              1. You can set the threshold globally in the [configuration file](/documentation/guides/configuration/). The parameter is called `memmap_threshold` (previously `memmap_threshold_kb`).

              2. You can set the threshold for each collection separately during [creation](/documentation/concepts/collections/#create-collection) or [update](/documentation/concepts/collections/#update-collection-parameters).

              1. **Globally**: Adjust the `memmap_threshold` parameter in the [configuration file](/documentation/guides/configuration/).

              2. **Per Collection**: Set the threshold during the [creation](/documentation/concepts/collections/#create-collection) or [update](/documentation/concepts/collections/#update-collection-parameters) of each collection.

Member

timvisee Feb 19, 2025

Let's drop this, since the plan is to deprecate this parameter anyway.

Instead we can say to explicitly set on_disk: true for:

vectors (vector storage)
hnsw
payload (on_disk_payload: true)
payload indices

qdrant-landing/content/documentation/concepts/storage.md

Comment on lines +293 to +298

    
              The rule of thumb for setting the `memmap_threshold` is straightforward:

              - if you have a balanced use scenario - set memmap threshold the same as `indexing_threshold` (default is 20000). In this case the optimizer will not make any extra runs and will optimize all thresholds at once.

              - if you have a high write load and low RAM - set memmap threshold lower than `indexing_threshold` to e.g. 10000. In this case the optimizer will convert the segments to memmap storage first and will only apply indexing after that.

              - **Balanced Use**: Set `memmap_threshold` equal to `indexing_threshold` (default is 20000). This way, the optimizer handles all thresholds together without extra runs.

              - **High Write Load & Low RAM**: Set `memmap_threshold` lower than `indexing_threshold`, e.g., 10000. This prioritizes converting segments to Memmap storage before indexing.

              In addition, you can use memmap storage not only for vectors, but also for HNSW index.

              To enable this, you need to set the `hnsw_config.on_disk` parameter to `true` during collection [creation](/documentation/concepts/collections/#create-a-collection) or [updating](/documentation/concepts/collections/#update-collection-parameters).

              Additionally, **On-Disk** storage can be used for the HNSW index. To enable this, set the `hnsw_config.on_disk` parameter to `true` during collection [creation](/documentation/concepts/collections/#create-a-collection) or [updating](/documentation/concepts/collections/#update-collection-parameters).

Member

timvisee Feb 19, 2025

Same as the above, let's drop memmap_threshold all together and focus on on_disk.

qdrant-landing/content/documentation/concepts/storage.md

Comment on lines +449 to +454

    
              - **On-Disk Storage**:

                - Reads and writes payloads directly to [Gridstore](/articles/gridstore-key-value-storage/).

                - Requires less RAM but has higher access latency.

                - If querying vectors with payload-based conditions, create a payload index for each field to avoid disk access. Indexed fields are kept in RAM.

              You can specify the desired type of payload storage with [configuration file](/documentation/guides/configuration/) or with collection parameter `on_disk_payload` during [creation](/documentation/concepts/collections/#create-collection) of the collection.

              You can choose the type of payload storage in the [configuration file](/documentation/guides/configuration/) or by setting the `on_disk_payload` parameter when [creating](/documentation/concepts/collections/#create-collection) a collection.

Member

timvisee Feb 19, 2025

Let's put on-disk on top, since it's the default now.

qdrant-landing/content/documentation/concepts/storage.md

+. **Segment Updates**:
+                 - Changes are then applied to segments.
+                 - Each segment keeps the latest version of changes and the version of each point.
+                 - If a new change has a lower sequential number than the current version, it is ignored.

Member

timvisee Feb 19, 2025

Suggested change

      
               - If a new change has a lower sequential number than the current version, it is ignored.
          
               - If a new change has a lower (older) sequential number than the current version, it is ignored.

qdrant-landing/content/documentation/concepts/storage.md

-              Each segment stores the last version of the change applied to it as well as the version of each individual point.
-              If the new change has a sequential number less than the current version of the point, the updater will ignore the change.
-              This mechanism allows Qdrant to safely and efficiently restore the storage from the WAL in case of an abnormal shutdown.
+              This process allows Qdrant to restore storage safely from the WAL in case of an unexpected shutdown.

Member

timvisee Feb 19, 2025

Suggested change

      
            This process allows Qdrant to restore storage safely from the WAL in case of an unexpected shutdown.
          
            This process allows Qdrant to restore storage safely from the WAL in case of an unexpected shutdown, by replaying everything from the WAL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet