-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add: Add index_gt::merge() #572
base: main
Are you sure you want to change the base?
Conversation
This just focus on index_gt. index_dense_gt is out of scope of this PR. If we get consensus of implementation approach, we'll be able to implement index_dense_gt::merge() too. This adds mutable `memory_mapped_file_t` and you can create a mutable memory-mapped index with it. You can merge multiple indexes to the mutable memory-mapped index without allocating all data on RAM.
@@ -3272,7 +3449,6 @@ class index_gt { | |||
|
|||
// We are loading an empty index, no more work to do | |||
if (!header.size) { | |||
reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not related to this PR but I add this PR because I need similar change in view()
. If this should be a separated PR, I'll open a separated PR.
We don't need to call reset()
here because we don't change anything after the above reset()
.
serialization_result_t stream_result = load_from_stream( | ||
[&](void* buffer, std::size_t length) { | ||
if (offset + length > file.size()) | ||
return false; | ||
std::memcpy(buffer, file.data() + offset, length); | ||
offset += length; | ||
return true; | ||
}, | ||
std::forward<progress_at>(progress)); | ||
|
||
return stream_result; | ||
is_mutable_ = true; | ||
return {}; | ||
} else { | ||
serialization_result_t io_result = file.open_if_not(); | ||
if (!io_result) | ||
return io_result; | ||
|
||
serialization_result_t stream_result = load_from_stream( | ||
[&](void* buffer, std::size_t length) { | ||
if (offset + length > file.size()) | ||
return false; | ||
std::memcpy(buffer, file.data() + offset, length); | ||
offset += length; | ||
return true; | ||
}, | ||
std::forward<progress_at>(progress)); | ||
return stream_result; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation is only changed because this is moved to else
.
if (!header.size) { | ||
reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to call reset()
because we don't change anything after the above reset()
.
// Add all values in `index` to this index. | ||
add_result_t merge_result; | ||
for (const auto& member : index) { | ||
auto& value = get_value(member); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to filter the target values as suggested in #84 (comment) , we can do it here.
Thank you, @kou! That's a good start! I think our key goal for the "merge" operation is to reduce the asymptotic complexity from ~$O(N \log N)$ to something closer to |
OK. Should we work on the key goal in this PR? Or can we work on it as a follow-up task? For |
#84
This focus only on
index_gt
.index_dense_gt
is out of scope of this PR. If we get consensus of implementation approach, we'll be able to implementindex_dense_gt::merge()
too.This adds mutable
memory_mapped_file_t
and you can create a mutable memory-mapped index with it. You can merge multiple indexes to the mutable memory-mapped index without allocating all data on RAM.index_gt::merge()
does just the followings:The important changes in this PR are the followings:
memory_mapped_file_t
index_gt
withmemory_mapped_file_t
For the former:
memory_mapped_file_t::is_writable
andmemory_mapped_file_t::reserve()
are added.memory_mapped_file_t::open_if_not()
opens a file with write mode whenis_writable
.For the latter: There are some change in the following methods:
index_gt::reset()
: Write header and levels before the memory-mapped file is closdindex_gt::reserve()
: Extend the memory-mapped file and adjustnodes_
after itindex_gt::node_malloc_()
: Allocate a memory from the memory-mapped file