Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features and fixes for release 1.8.0 #242

Merged
merged 38 commits into from
Oct 26, 2024
Merged

Conversation

dkerr64
Copy link
Collaborator

@dkerr64 dkerr64 commented Oct 17, 2024

v1.8.0 (2024-10-??)

What's Changed

Known Issues

Copy link
Collaborator

@jgstroud jgstroud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You sure you want to open this can of worms again by enabling the IRAM heap?

@dkerr64
Copy link
Collaborator Author

dkerr64 commented Oct 17, 2024

You sure you want to open this can of worms again by enabling the IRAM heap?

Having discovered that there is some impact from where the device is located, I want to ask themoneyish and pritchey over on Discord to try again, moving the device around to see if it makes a difference.

That said... I have consistently been seeing MDNSresponder OOM crashes. Often they occur immediately after a reboot, it crashes and reboots again and stabilizes. But then I found myself needing to un-pair and re-pair to HomeKit... in un-paired status the HomeKit server appears to need more memory, I was crashing every time I booted (still in MDNSresponder), it never got up to a stable state. Turning on IRAM heap immediately solved the problem, and I have not seen a OOM crash since (admittedly only a few days).

I think, for the vast majority of users, running with IRAM heap is going to be more stable.

@jgstroud
Copy link
Collaborator

I think we had already done the move the device around experiment. I wonder what's different during that reboot that's causing an OOM. I tried hitting it with way more mdns traffic than it could handle and I couldn't get it to crash. Something about doing that during a reboot. maybe having homekit initializing at the same time. I was thinking about adding a heap check down in the network stack and temporarily disabling RX when we get low on memory. However, if we can safely enable the IRAM, that's by far the easier solution.

@jgstroud
Copy link
Collaborator

One reason why I never implemented this was because I wasn't sure how to properly handle recovery when you can't connect to your wifi. Should it automatically go into AP mode? This leads to a lot of error conditions especially after power cycles. Instead I was thinking about a way to manually force it. I just pushed a commit where if you press the light button 5 times in a 3 second window it will automatically reboot into AP mode.

@dkerr64
Copy link
Collaborator Author

dkerr64 commented Oct 17, 2024

One reason why I never implemented this was because I wasn't sure how to properly handle recovery when you can't connect to your wifi. Should it automatically go into AP mode? This leads to a lot of error conditions especially after power cycles. Instead I was thinking about a way to manually force it. I just pushed a commit where if you press the light button 5 times in a 3 second window it will automatically reboot into AP mode.

That is a good idea. I wasn't sure what to do about this myself... I do not want to automatically send it into soft AP mode because as you state, there are times where WiFi may go offline that has nothing to do with ratgdo (power failures, etc.). Your solution is quite elegant.

I'll update the README file as well.

@dkerr64
Copy link
Collaborator Author

dkerr64 commented Oct 17, 2024

Before we merge this, there is one other change I want to make... and that is move all (or as much as possible) config settings into a single file. Loading them all from separate files is taking an awfully long time every reboot and I think I have come up with a way to store them all in one place.

Non-config data (like rolling code) would remain as it is in their own files.

@jgstroud
Copy link
Collaborator

I've deliberately avoided this whole patch because I didn't want to deal with all the possible failure modes. The idea to use the light switch just hit me. I think I might make it flash the lights couple times as well before it reboots as a sort of confirmation. I also need to add a similar fix to the sec+1 code.

@jgstroud
Copy link
Collaborator

Ok, I cleaned up the recovery logic some. I reused your TTC timer to flash the lights letting you know the command was accepted.

@jgstroud
Copy link
Collaborator

Is this still relevant? Seems like with all the logging changes you've made, this setting doesn't seem necessary anymore.
image

@dkerr64
Copy link
Collaborator Author

dkerr64 commented Oct 18, 2024

Is this still relevant? Seems like with all the logging changes you've made, this setting doesn't seem necessary anymore.

Yes, I think we can remove this.

@dkerr64
Copy link
Collaborator Author

dkerr64 commented Oct 18, 2024

I made significant change to how we save user configuration settings. There are 19 individual settings that have been stored in 19 separate files. This significantly slows down boot time. By saving them all in a single file boot time improves by approx 13 seconds... such that the device now completes initialization in about 5 seconds.

It did require extensive changes. But the result is actually much more readable code.

@dkerr64
Copy link
Collaborator Author

dkerr64 commented Oct 18, 2024

I made significant change to how we save user configuration settings. There are 19 individual settings that have been stored in 19 separate files. This significantly slows down boot time. By saving them all in a single file boot time improves by approx 13 seconds... such that the device now completes initialization in about 5 seconds.

It did require extensive changes. But the result is actually much more readable code.

Wd should, of course, do some thorough testing before we release this!

@dkerr64
Copy link
Collaborator Author

dkerr64 commented Oct 19, 2024

Also, @jgstroud I am thinking of changing the reboot countdown timer from 30 seconds to 15 seconds. My devices are consistently up and stable in less than 15 seconds now that config settings are all in one file.

@jgstroud
Copy link
Collaborator

Cool. I added syslog earlier today but wanted to wait for your settings patch. I'll merge it with your code later. I'm really glad you made the settings change. the code was getting messy and hard to read and modify.

@jgstroud
Copy link
Collaborator

1.7.1 seems pretty stable. We might want to remove the prerelease tag on that before we merge this.

@dkerr64
Copy link
Collaborator Author

dkerr64 commented Oct 19, 2024

1.7.1 seems pretty stable. We might want to remove the prerelease tag on that before we merge this.

I did that a few days ago.

@dkerr64
Copy link
Collaborator Author

dkerr64 commented Oct 19, 2024

Cool. I added syslog earlier today but wanted to wait for your settings patch. I'll merge it with your code later. I'm really glad you made the settings change. the code was getting messy and hard to read and modify.

Syslog is great, thanks.

For the settings, I think I over engineered it... I think use of std:map, std:any, std:string may be using way more memory than I would like. I think I'll change it to a simple struct which gives us control over memory use (potentially putting it into IRAM as well).

@jgstroud
Copy link
Collaborator

I really like the settings file, but the one thing I didn't like was trying to understand the first.second.second naming. That was confusing, but that's probably just me.

@jgstroud
Copy link
Collaborator

Oh I also added a comment on the code. You added code to migrate settings from old files to new. Should we remove those old files in the process to free up flash?

@dkerr64
Copy link
Collaborator Author

dkerr64 commented Oct 22, 2024

@jgstroud just a heads up that I need to make use of IRAM heap more granular around HomeKit server. I un-paired my door to test initial setup and it fails. On investigation it looks like HomeKit uses a lot (and I mean a lot) of memory to initialize itself if it is not already paired. I had to reduce the size of my message log buffer all the way down and while that got it going, it was very slow to create the initial crypto keys.

There are two big malloc's in the server initialization which is all we really need to wrap in IRAM, so I am going to do that and I think that will get us going again.

In the meantime, don't un-pair !!!

@jgstroud
Copy link
Collaborator

Those initial pairing crypto operations are very slow and I spent a good deal of time on fixing those so you don't get timeouts. After you make the change, we need to do a bunch of unpair / repair and make sure we don't see the timeouts come back.

@dkerr64
Copy link
Collaborator Author

dkerr64 commented Oct 22, 2024

We're good now. I have tested un-pairing and pairing several times. Consistent memory usage and timing. It takes about 10-12 seconds to generate the crypto keys on boot if no HomeKit paired.

I've also added comments in the source file every place that IRAM is used so that any future maintainer knows to keep an eye on usage/allocation during startup process.

@dkerr64
Copy link
Collaborator Author

dkerr64 commented Oct 25, 2024

I have added an advanced mode to the soft AP page... which shows all available WiFi networks with the unique hardware BSSID of the access points. This allows us to lock the device to a specific AP. Appropriate warnings have been added to the readme, soft AP page and confirmation pop-up.

@dkerr64 dkerr64 force-pushed the next-release branch 2 times, most recently from f69fed5 to 31472f7 Compare October 25, 2024 01:33
@jgstroud
Copy link
Collaborator

I tried the AP Locking and I assume it's working, but there is nothing in the logs to indicate it has been locked to an AP. I think we should fix that before merging.

@dkerr64
Copy link
Collaborator Author

dkerr64 commented Oct 25, 2024

I tried the AP Locking and I assume it's working, but there is nothing in the logs to indicate it has been locked to an AP. I think we should fix that before merging.

Yes, agree. I have coded this up including reporting it on the main web page so very visible. I just haven't committed and pushed yet as I want to take a screenshot to include in readme.

@jgstroud
Copy link
Collaborator

I tried the AP Locking and I assume it's working, but there is nothing in the logs to indicate it has been locked to an AP. I think we should fix that before merging.

Yes, agree. I have coded this up including reporting it on the main web page so very visible. I just haven't committed and pushed yet as I want to take a screenshot to include in readme.

Ok, great. Otherwise I have no issues.

@dkerr64
Copy link
Collaborator Author

dkerr64 commented Oct 26, 2024

Fixed the bug with empty hostname.

Also put a test for zero length accessory name into the arduino_homekit_server source. And discovered that, shockingly, all the initialization functions are declared with void returns... so while there is a bunch of testing for error conditions, the functions do not return true/false to indicate whether setup succeeded... so calling code cannot itself abort.

Changing those functions to return a bool would be good idea, but also fairly significant change, as implies also figuring out how to handle a failure. I'm not up to that right now.

wait for a status command to be processes to properly set the initial state of
all homekit characteristics.  Also timeout if we don't receive a status in
a reasonable amount of time.  This prevents unintentional state changes if
a home hub reads the state before we initialize everything
Note, secplus1 doesnt have a status command so it will just timeout

This specifically addresses an issue where we get spurious door unlock
notifications, but will also address incorrect door states immediately
after booting.
@jgstroud
Copy link
Collaborator

I fixed the issue with the random lock notifications. Due to a race condition on boot. If homekit queries the status of the door before we receive our first status message then we have a bunch of default values set and those immediately change state when we get our first status response. I added a slight delay to starting homekit loop so we have a chance to populate the door state first.

I think I'm done for now. We can merge.

@dkerr64 dkerr64 merged commit 64e8642 into ratgdo:main Oct 26, 2024
@dkerr64 dkerr64 deleted the next-release branch October 26, 2024 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants