-
Notifications
You must be signed in to change notification settings - Fork 617
CORE-7718 metadata #23508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CORE-7718 metadata #23508
Conversation
61550be
to
fee45fc
Compare
Force push
|
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/55205#01922eaa-a60b-4646-9e5a-f5d9ade2c89b ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/55205#01922eaa-a60e-43d8-a530-eeba765811c2 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/55254#01923047-8e82-4028-a01d-98427cb5f0f0 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/55254#01923047-8e7a-4b6d-bf0f-4f030444378e ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/55403#01924068-0272-407f-a7b0-944f81f080f0 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/55453#019243ca-9db5-4aac-87f2-931eb80e32f5 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/55553#019248db-a030-4473-b2ae-d6c335d4e7cc ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/55553#019248f5-a158-4dc3-9234-4b7befbe21dc |
fee45fc
to
7ea5560
Compare
Force push
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not finished my review, but it's moving underneath me.
@@ -250,7 +292,7 @@ ss::future<result<void>> service::initiate_rpk_debug_bundle_collection( | |||
job_id, | |||
co_await external_process::external_process::create_external_process( | |||
std::move(args)), | |||
form_debug_bundle_file_path(_debug_bundle_dir, job_id)); | |||
form_debug_bundle_file_path(output_dir, job_id)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I see this gets changed to debug_bundle_file_path
later, but this seems like an opportune moment.
7ea5560
to
4cda960
Compare
Force push
|
4cda960
to
21db2f8
Compare
Force push
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a commit that has a cherry-pick message with a sha that won't be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good but want to resolve the kvstore question before merging.
src/v/serde/rw/vector.h
Outdated
@@ -35,6 +35,7 @@ concept Vector = requires(T t) { | |||
t.begin(); | |||
t.end(); | |||
{ t.size() } -> std::convertible_to<std::size_t>; | |||
t.shrink_to_fit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the commit message doesn't seem to indicate that this was a problem prior to the commit, so i'm wondering: is this related to the previous commit that added shrink_to_fit to the bytes type? if so, we may have an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's related. I'm storing the sha256 checksum as bytes into the kvstore. I can switch to use fragmented_vector if that's an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's related. I'm storing the sha256 checksum as bytes into the kvstore. I can switch to use fragmented_vector if that's an issue.
but bytes
should not be being handled by the vector serde encoder. i think the issue is that the serde encoder is working with "things that look like vectors". i think can drop the shrink-to-fit and the rw/vector.h and include rw/bytes.h?
separately ill take a todo item to make this stricter, or maybe they should be combined if their on-wire formats are identical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include rw/bytes.h?
oh my bad, yeah good call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will say that the change in rw/vector.h
I think is still good (though probably no longer relevant to this PR), since the tag_invoke(tag_t<read_tag>)
function calls shrink_to_fit()
ssx::spawn_with_gate(_gate, [this, job_id]() { | ||
return _rpk_process->wait() | ||
.then([this, job_id](auto) { return handle_wait_result(job_id); }) | ||
.handle_exception_type([](const std::exception& e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean handle_exception
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeh
_rpk_process->cout().copy(), | ||
_rpk_process->cerr().copy(), | ||
job_id, | ||
debug_bundle_file, | ||
std::move(sha256_checksum), | ||
_rpk_process->get_wait_result()); | ||
|
||
iobuf buf; | ||
serde::write(buf, std::move(md)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kvstore isn't really designed to handle large data. so storing stdout/stderr is a bit concerning here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how big is too big? I want to say this is a few hundred bytes in the common case, but I'm not too sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if go crashed and produced a large backtrace? or whatever there is a bug in the collector and it dumped a bunch of stuff out.
how big is too big?
yeh good question--it is stored in ram for the lifetime of the key, and there is no option not to store it in ram.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could store it on disk as a file with the same key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I was planning on running rpk debug bundle
with the --verbose
flag which would probably increase the amount of data in stdout/stderr, which I guess could produce a bit of information.
Why keep around stdout/stderr?
In case a previous run of rpk debug bundle
failed and the broker reset. Maybe we'd like to know why it failed. Or on success, if there was anything interesting in the verbose output that may help with debugging a problem.
Why kvstore?
Originally I was going to output a json metadata file with all this information in it, but as reminded about the existence of the kvstore, so switched to doing that.
So to Oren's question, how big is too big?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to Oren's question, how big is too big?
we don't have any formal guidelines, but the intention of the kvstore in its current state is to store small metadata.
seems you could keep the metadata in the kvstore, and the stdout/stderr on disk with the same key name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to move it out of the kvstore and onto disk. If doing so I wonder if it should be JSON or it's fine to remain in a serde format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems you could keep the metadata in the kvstore, and the stdout/stderr on disk with the same key name?
Yeah that's probably fine something like <uuid>.stdout
and <uuid>.stderr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If doing so I wonder if it should be JSON or it's fine to remain in a serde format.
binary is probably fine. its presumably ascii or utf8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, few random comments but nothing pressing
3b0aaa6
5b048ec
to
44857e2
Compare
Force push
|
44857e2
to
53d077b
Compare
Force push
|
798fc07
to
1464c52
Compare
Force push
|
d129734
to
ad7106c
Compare
Removed friendship between service and process and added accessors and helper methods. Signed-off-by: Michael Boquard <[email protected]>
Signed-off-by: Michael Boquard <[email protected]> (cherry picked from commit c483162)
Signed-off-by: Michael Boquard <[email protected]> (cherry picked from commit 55b414b)
Signed-off-by: Michael Boquard <[email protected]>
Signed-off-by: Michael Boquard <[email protected]>
These structures are used to store metadata and process output to the kvstore and disk in order to maintain the state of the service between application restarts. Signed-off-by: Michael Boquard <[email protected]>
Now when a new process has been kicked off, the previous run will be cleaned up. Added tests to verify new functionality. Signed-off-by: Michael Boquard <[email protected]>
Signed-off-by: Michael Boquard <[email protected]>
Generating and storing metadata into kvstore and added tests to validate it. Signed-off-by: Michael Boquard <[email protected]>
Added functionality that will reload metadata from the kvstore after the service restarts. Signed-off-by: Michael Boquard <[email protected]>
I didn't touch it so no idea how this snuck in. Signed-off-by: Michael Boquard <[email protected]>
ad7106c
to
d20e80c
Compare
Force push
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. largely taking your word for it on the changes to unit tests
ssx::spawn_with_gate(_gate, [this, job_id]() { | ||
return _rpk_process->wait() | ||
.then([this, job_id](auto) { | ||
auto hold = _gate.hold(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not totally sure I get why you need to hold the gate here. is it because the background task completes when you return the _process_control_mutex.get...
future below?
I don't write enough continuation style code 😅
Thanks to some merge ordering schenangians, these tests started failing when redpanda-data#23557 merged after redpanda-data#23508. This change addresses the test bug by properly obtaining the configuration property and handling a situation when the debug bundle directory configuration is empty. Signed-off-by: Michael Boquard <[email protected]>
Backports Required
Release Notes