Is there a way to build an index while keeping it on disk (GraphIndexBuilder + OnDiskGraphIndex ?) #125

eolivelli · 2023-10-13T08:12:45Z

I am writing a POC to integrate JVector into HerdDB.

This is my work, for reference: diennea/herddb#814

This issue is about asking if there is a good way to have an GraphIndexBuilder backed by a OnDiskGraphIndex.
In HerdDB the index is always "open for writes" and it seems that GraphIndexBuilder is currently keeping everything on the Heap.

My current plan is to "flush" the index periodically to disk (during a checkpoint) but it doesn't seem efficient and it will lead to unwanted behaviour of the service (big writes to disk). Usually the checkpoint in HerdDB is like flushing a bunch of metadata with the list of "active pages".

jbellis · 2023-10-13T12:39:58Z

Mostly, yes. This PR adds save() and load() method to OnHeapGraphIndex so that you can checkpoint to disk but also continue modifying it. #117

eolivelli · 2023-10-13T14:40:29Z

Great, #117 also unblocks DELETEs (and UPDATEs).

jbellis · 2023-10-13T14:45:09Z

technically yes, although updates are still expensive since you have to cleanup() before re-using the node id, which is O(N). better to use a new id if possible.

eolivelli · 2023-10-13T15:10:02Z

Sorry I wasn't clear, for UPDATE I was referring to updating the value of the vector in a database row. In that case I would unregister the previous value and create a new node id with the new vector.

I have another problem (that deserves another GH issue) about linking the "node id" to the physical id of the row in the DB (actually it is the Primary key of the record). Currently I am going to use a separate struct to keep track of this link.
It would be great to have a "metadata" (byte array) field to attach to the "node" and let the GraphSearcher return it (together with the node id).
I will open a new discussion for this.

eolivelli changed the title ~~Is there a way to build and index while keeping it on disk (GraphIndexBuilder + OnDiskGraphIndex ?)~~ Is there a way to build an index while keeping it on disk (GraphIndexBuilder + OnDiskGraphIndex ?) Oct 13, 2023

jbellis closed this as completed Oct 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to build an index while keeping it on disk (GraphIndexBuilder + OnDiskGraphIndex ?) #125

Is there a way to build an index while keeping it on disk (GraphIndexBuilder + OnDiskGraphIndex ?) #125

eolivelli commented Oct 13, 2023

jbellis commented Oct 13, 2023

eolivelli commented Oct 13, 2023

jbellis commented Oct 13, 2023

eolivelli commented Oct 13, 2023

Is there a way to build an index while keeping it on disk (GraphIndexBuilder + OnDiskGraphIndex ?) #125

Is there a way to build an index while keeping it on disk (GraphIndexBuilder + OnDiskGraphIndex ?) #125

Comments

eolivelli commented Oct 13, 2023

jbellis commented Oct 13, 2023

eolivelli commented Oct 13, 2023

jbellis commented Oct 13, 2023

eolivelli commented Oct 13, 2023