Implementing backup of broken config files #462

cr · 2016-05-03T18:55:39Z

This addresses #390.

The new behavior is to move the broken config file to foxbox.conf_BROKEN before continuing with an empty config that would ultimately have overwritten the broken one.

cr · 2016-05-03T18:56:37Z

r? @dhylands

dhylands · 2016-05-03T19:22:58Z

src/config_store.rs

+                let mut backup_name = file_name.to_owned();
+                backup_name.push_str("_BROKEN");
+                warn!("Moving broken config file to {}", backup_name);
+                let _ = fs::rename(file_name, &backup_name);


nit: I think you should test the return code and instead of printing the "moving" warning, print a "Moved" or "Failed to move" message (since there are lots of reasons why the rename could fail).

r=me with that change.

And as fabrice suggests, adding tests would be good.

cr · 2016-05-03T20:06:08Z

I implemented a test and an error check around moving the config file, but now I am having second thoughts. Now I tend to panicing the main thread when the config can't be parsed.

The problem is that I don't know how to properly test that. I can write a simple test like

#[test]
#[should_panic]
fn config_service_should_panic_with_broken_config() {
    use std::fs;
    use std::io::Write;
    use std::path::Path;
    // Create broken config file
    let broken_conf = "{\"foo\":\"bar\",}".to_owned();
    let mut file = fs::File::create(&Path::new(&config_file_name)).ok().unwrap();
    let _ = file.write_all(&broken_conf.as_bytes());
    // Should panic:
    let config = ConfigService::new(&config_file_name);
    // Will litter temp file, because this is never reached:
    fs::remove_file(config_file_name).unwrap_or(());
}

but this will litter the temp file because the removal line is never reached. How to work around this?

dhylands · 2016-05-03T20:09:35Z

You can set a panic handler which does the cleanup.
https://doc.rust-lang.org/std/panic/fn.set_handler.html
Disclaimer: I haven't actually used this.

cr · 2016-05-03T20:14:44Z

I don't know how tests are organized internally, but this might lead to the same problem in disguise: You might need to remove the handler to prevent interference with other should_panic tests.

cr · 2016-05-03T20:30:36Z

@dhylands is raising a good point here:

Normally panicing is bad since the user (who only accesses through a browser) won't be able to see anything. Although in this case, the only reason the config SHOULD fail is because it was hand edited. Although it could fail for other reasons (like it didn't get written properly due to a full disk)

I don't think that we can reasonably recover from the full disk / incomplete config write scenario and low-level user intervention is unavoidable. In such a case it may be better to fail really hard vs. fail just subtly. Problem is that loading the config happens so early in the startup process that we can't even inform the clients unless we implement a few ugly workarounds. I really see three options:

Panic the main thread and let the user deal with it. Either the disk was full and must be manually cleaned (in this case a copy of the last known good config is still around), or the user has been editing the config by hand and can be expected to figure this out from the logs.
Backup the broken config, continue with the last known-good config if still around, or else an empty config, write errors to the log, and let the user figure out that things got funky.
Continue with the last known-good config, push a "taint" key to the config that will be later be queried from somewhere, alerting the user that something went wrong.

The second option is close to what was in place so far, plus some rather complex lookup logic looking for good copies of the config. The third option seems desirable, but it requires major changes to several parts of foxbox and likely a good deal of new architectural feats that we don't have yet. For now, the first solution may be the cleanest.

dhylands · 2016-05-03T20:33:05Z

I don't know how tests are organized internally, but this might lead to the same problem in disguise: You might need to remove the handler to prevent interference with other should_panic tests.

take_handler allows you to retrieve the previous handler which could be reinstalled after you've done your test.

You could also use RAII and create an object whose drop method does the cleanup.

This addresses issue fxbox#390.

cr · 2016-05-04T14:11:04Z

I implemented behavior number 2 as outlined above. That way foxbox can run and potentially start some form of diagnostic mode in the future. I guess we'll need a getter for the logs soon?

r? @fabrice

dhylands reviewed May 3, 2016
View reviewed changes

cr force-pushed the cr-configbackup branch from 7f8b38a to 7641536 Compare May 3, 2016 19:52

Implementing backup of broken config files

a36d84b

This addresses issue fxbox#390.

cr force-pushed the cr-configbackup branch from 7641536 to a36d84b Compare May 4, 2016 14:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing backup of broken config files #462

Implementing backup of broken config files #462

cr commented May 3, 2016

cr commented May 3, 2016

dhylands May 3, 2016

cr commented May 3, 2016

dhylands commented May 3, 2016

cr commented May 3, 2016

cr commented May 3, 2016

dhylands commented May 3, 2016

cr commented May 4, 2016

Implementing backup of broken config files #462

Are you sure you want to change the base?

Implementing backup of broken config files #462

Conversation

cr commented May 3, 2016

cr commented May 3, 2016

dhylands May 3, 2016

Choose a reason for hiding this comment

cr commented May 3, 2016

dhylands commented May 3, 2016

cr commented May 3, 2016

cr commented May 3, 2016

dhylands commented May 3, 2016

cr commented May 4, 2016