-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a way to build an index while keeping it on disk (GraphIndexBuilder + OnDiskGraphIndex ?) #125
Comments
Mostly, yes. This PR adds save() and load() method to OnHeapGraphIndex so that you can checkpoint to disk but also continue modifying it. #117 |
Great, #117 also unblocks DELETEs (and UPDATEs). |
technically yes, although updates are still expensive since you have to cleanup() before re-using the node id, which is O(N). better to use a new id if possible. |
Sorry I wasn't clear, for UPDATE I was referring to updating the value of the vector in a database row. In that case I would unregister the previous value and create a new node id with the new vector. I have another problem (that deserves another GH issue) about linking the "node id" to the physical id of the row in the DB (actually it is the Primary key of the record). Currently I am going to use a separate struct to keep track of this link. |
I am writing a POC to integrate JVector into HerdDB.
This is my work, for reference: diennea/herddb#814
This issue is about asking if there is a good way to have an GraphIndexBuilder backed by a OnDiskGraphIndex.
In HerdDB the index is always "open for writes" and it seems that GraphIndexBuilder is currently keeping everything on the Heap.
My current plan is to "flush" the index periodically to disk (during a checkpoint) but it doesn't seem efficient and it will lead to unwanted behaviour of the service (big writes to disk). Usually the checkpoint in HerdDB is like flushing a bunch of metadata with the list of "active pages".
The text was updated successfully, but these errors were encountered: