-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(cli): Handle reload based on referenced file change #22539
Conversation
Co-authored-by: Pavlos Rontidis <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few nits but overall this looks great. Thank you @gllb
Co-authored-by: Pavlos Rontidis <[email protected]>
Co-authored-by: Pavlos Rontidis <[email protected]>
I'm not able to run |
@pront, is there anything I need to do in order to make it move forward ? |
@@ -543,6 +547,11 @@ impl RunningTopology { | |||
.is_some() | |||
})) | |||
.collect::<Vec<_>>(); | |||
|
|||
if let Some(mut components) = components_to_reload { | |||
sinks_to_change.append(&mut components) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few more thoughts on this implementation. What you have now is good and works for the HTTP sink. However, we can go further.
- We should support reloading any component (sources, transforms, sinks)
- Looking at
files_to_watch
, it ishttp
sink specific. We have many components that support the same TLS settings, it should be easy to write a reusable util and expose these paths for all components that use them. (We can start by duplicating code if needed and think about DRY later.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you want to this PR to also implement this for all components ?
Or just make sure the code is able to be extended in future change, so make sure this shutdown_diff
is able to handle source and transforms change as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @gllb, I am looking at this ticket #10877.
So we have a few choices here:
- accept PR as is and follow up with multiple PRs (it doesn't have to be you btw)
- enhance this PR to handle all components (sources, transforms, sinks)
- choice 2 and handling all components that expose TLS settings
Perhaps choice (2) is good here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok for choice 2 and that means in fact : integrate with ConfigDiff
instead of patching shutdown_diff
as stated in your other comment, right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. In order to simplify further reviews, I went ahead and enabled merging for this PR. I would appreciate it if you follow up with a PR to handle all components and a smarter config diff. But it's understandable if you don't have time for that. I think this PR is a big step towards the right direction.
@@ -336,9 +336,32 @@ async fn handle_signal( | |||
allow_empty_config: bool, | |||
) -> Option<SignalTo> { | |||
match signal { | |||
Ok(SignalTo::ReloadComponents(component_keys)) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some implementation details:
- We can add
component_keys
(or whatever struct we end up with) as a newRunningTopology
field to avoid passing them around to all these functions. - Then, we can use this new field in
async fn shutdown_diff(...)
- We can probably add the notion of changed watched paths to
src/config/diff.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the component_keys
here are passed when a change is detected from the watcher. If we use it as a field of RunningTopology
then it supposed to be empty almost all the time, except when a change is detected. Which will still imply passing it in reload_config_and_respawn
anyway, right ? or maybe I don't get it ?
For the notion of watched paths change in configDiff I think I see what you mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pront So I did implement this logic in configDiff, and it does what it should. Only this part : https://github.com/gllb/vector/blob/master/src/topology/running.rs#L490-L509 is causing the sink to be re-used and therefore it is not reloaded.
On the implementation where all is done inside shutdown_diff
its easy to work around it as I can append to sink_to_change
after so the reuse buffer code doesnt even know about it. But since its in configDiff now, I dont see how to work around it, any idea ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could remove this part, but I dont know if its safe to do or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad we agree on the src/config/diff.rs
enhancement.
What is more important is how to keep track of the component type. I don't think we have an enum or something to model currently. But we probably to keep associate a ComponentKey
with a new ComponentType
. Talking about this, I realize it might be better to do this in a followup PR since it might need some discussion.
If we use it as a field of RunningTopology then it supposed to be empty almost all the time, except when a change is detected.
Correct. It will be empty until the watcher detects a file change, then it will push it there. When we process/reload, it will clear the whole thing. I am a bit ambivalent on this, so feel free to TIOLI.
Which will still imply passing it in reload_config_and_respawn anyway, right ?
To be clear, I was proposing adding it here:
vector/src/topology/running.rs
Line 41 in 7644658
pub struct RunningTopology { |
The reload_config_and_respawn will then have access to it via self
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I can create follow up PR for it but not sure I will have a lot of time for it, lets see. I still have no idea about the struggle with reuse buffer code I pointed out in shutdown_diff
but I will open a PR with what I have, so anybody can contribute as well.
A failed test prevented this from merging: https://github.com/vectordotdev/vector/actions/runs/13790530146/job/38656922831?pr=22539 |
Head branch was pushed to by a user without write access
Hi @pront, hopefully this last fix I push solve it. I struggle a bit to run the test locally as they are very demanding on memory. |
@gllb thank you for the fast iterations on this. If possible, avoid force pushing so I can do incremental reviews. I approved the workflows again. |
Test failures persist. You can iterate faster locally with |
@pront edit : I manage to find option in nextest to reduce build and test jobs, so now the changed test pass successfuly |
@@ -269,8 +268,23 @@ mod tests { | |||
{ | |||
panic!("Test timed out"); | |||
} | |||
} | |||
#[tokio::test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding a test and fixing the failure. I realize that these tests have several problems (existing, not introduced by this PR):
- they are disabled on macOS
- they have arbitrary delays e.g. 3 seconds and then 3*5
- they panic instead of returning an error
But again, this is out of scope for this PR.
Summary
The TLS
crt_file
andkey_file
fromhttp
sinks are now part of the watcher list and therefore they are reloaded on Vector restart.Ref #17283
Change Type
Is this a breaking change?
How did you test this PR?
adding this sink to
config/vector.yaml
:then execute :

vector -vvv -c config/vector.yaml -w
in parallel run :
cd /etc/ssl/tls-cert/ && openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -sha256 -days 3650 -nodes -subj "/C=XX/ST=StateName/L=CityName/O=CompanyName/OU=CompanySectionName/CN=CommonNameOrHostname"
And vector logs :
Does this PR include user facing changes?
Checklist
make check-all
is a good command to run locally. This check isdefined here. Some of these
checks might not be relevant to your PR. For Rust changes, at the very least you should run:
cargo fmt --all
cargo clippy --workspace --all-targets -- -D warnings
cargo nextest run --workspace
(alternatively, you can runcargo test --all
)Cargo.lock
), pleaserun
dd-rust-license-tool write
to regenerate the license inventory and commit the changes (if any). More details here.References
Ref: #10877