Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: config file corruption seen twice on a wm1110 board #4184

Open
geeksville opened this issue Jun 27, 2024 · 3 comments
Open

[Bug]: config file corruption seen twice on a wm1110 board #4184

geeksville opened this issue Jun 27, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@geeksville
Copy link
Member

Category

Other

Hardware

Other

Firmware Version

2.3.14.c67a9dfe

Description

I bet it is not wm1110 specific. Occurred while doing hundreds of power cycles.

I bet the best way to find/fix it is to turn off our "if config read fails or is corrupted completely factory reset and try again". Instead we should spin inside the ICE debugger.

Notes from chat:

@thebentern re: the mystery "lora." config got toasted thing that happened to mw wm1110 board happened again. just fyi, I'll keep an eye on it and add more instrumentation while I'm doing my other stuff but possibly there is some badness somewhere. Possibly also I'm just inadvertently stress testing because I'm cycling this board through >100 power cycles in different configs (but none of that should have led us to corrupt our flash fs). I only noticed because my power.powermon_enables field also got toasted.
thebentern — Today at 6:26 PM
I covet your experienced eyes on that issue, because so far it's been elusive and seemingly random
geeksville — Today at 6:37 PM
hmm - rather than mystery corruption I wonder if there is a bug in the adafruit nrf52 fake filesystem stuff. after I finish power crap (about another week?) i'll try to make a robust stress test and leave it running while ICEd.

Relevant log output

No response

@geeksville geeksville added the bug Something isn't working label Jun 27, 2024
@geeksville geeksville self-assigned this Jun 27, 2024
@geeksville
Copy link
Member Author

I might (hopefully will) look into this in a week or two.

@geeksville
Copy link
Member Author

This same corruption probably exposed #4167 a couple of days ago.

@geeksville
Copy link
Member Author

hmm this is less that perfect (though quite possibly unrelated to the problem):

        // brief window of risk here ;-)
        if (FSCom.exists(filename) && !FSCom.remove(filename)) {
            LOG_WARN("Can't remove old pref file\n");
        }
        if (!renameFile(filenameTmp.c_str(), filename)) {
            LOG_ERROR("Error: can't rename new pref file\n");
        }

We could eliminate this window of risk by renaming the file.new to be file.good, then remove file, then rename file.good to be filename (a 3 stage commit). Then at load time if we ever see a file.good existing, we know that we lost power during that window and file.good should be used instead of file (and file should be deleted at that point.

But this might not actually be the bug, so I'll wait until I look into this and somehow make a reproducable failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant