-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Features and fixes for release 1.8.0 #242
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You sure you want to open this can of worms again by enabling the IRAM heap?
Having discovered that there is some impact from where the device is located, I want to ask themoneyish and pritchey over on Discord to try again, moving the device around to see if it makes a difference. That said... I have consistently been seeing MDNSresponder OOM crashes. Often they occur immediately after a reboot, it crashes and reboots again and stabilizes. But then I found myself needing to un-pair and re-pair to HomeKit... in un-paired status the HomeKit server appears to need more memory, I was crashing every time I booted (still in MDNSresponder), it never got up to a stable state. Turning on IRAM heap immediately solved the problem, and I have not seen a OOM crash since (admittedly only a few days). I think, for the vast majority of users, running with IRAM heap is going to be more stable. |
I think we had already done the move the device around experiment. I wonder what's different during that reboot that's causing an OOM. I tried hitting it with way more mdns traffic than it could handle and I couldn't get it to crash. Something about doing that during a reboot. maybe having homekit initializing at the same time. I was thinking about adding a heap check down in the network stack and temporarily disabling RX when we get low on memory. However, if we can safely enable the IRAM, that's by far the easier solution. |
One reason why I never implemented this was because I wasn't sure how to properly handle recovery when you can't connect to your wifi. Should it automatically go into AP mode? This leads to a lot of error conditions especially after power cycles. Instead I was thinking about a way to manually force it. I just pushed a commit where if you press the light button 5 times in a 3 second window it will automatically reboot into AP mode. |
That is a good idea. I wasn't sure what to do about this myself... I do not want to automatically send it into soft AP mode because as you state, there are times where WiFi may go offline that has nothing to do with ratgdo (power failures, etc.). Your solution is quite elegant. I'll update the README file as well. |
Before we merge this, there is one other change I want to make... and that is move all (or as much as possible) config settings into a single file. Loading them all from separate files is taking an awfully long time every reboot and I think I have come up with a way to store them all in one place. Non-config data (like rolling code) would remain as it is in their own files. |
I've deliberately avoided this whole patch because I didn't want to deal with all the possible failure modes. The idea to use the light switch just hit me. I think I might make it flash the lights couple times as well before it reboots as a sort of confirmation. I also need to add a similar fix to the sec+1 code. |
Ok, I cleaned up the recovery logic some. I reused your TTC timer to flash the lights letting you know the command was accepted. |
Yes, I think we can remove this. |
I made significant change to how we save user configuration settings. There are 19 individual settings that have been stored in 19 separate files. This significantly slows down boot time. By saving them all in a single file boot time improves by approx 13 seconds... such that the device now completes initialization in about 5 seconds. It did require extensive changes. But the result is actually much more readable code. |
Wd should, of course, do some thorough testing before we release this! |
Also, @jgstroud I am thinking of changing the reboot countdown timer from 30 seconds to 15 seconds. My devices are consistently up and stable in less than 15 seconds now that config settings are all in one file. |
Cool. I added syslog earlier today but wanted to wait for your settings patch. I'll merge it with your code later. I'm really glad you made the settings change. the code was getting messy and hard to read and modify. |
1.7.1 seems pretty stable. We might want to remove the prerelease tag on that before we merge this. |
I did that a few days ago. |
Syslog is great, thanks. For the settings, I think I over engineered it... I think use of std:map, std:any, std:string may be using way more memory than I would like. I think I'll change it to a simple struct which gives us control over memory use (potentially putting it into IRAM as well). |
I really like the settings file, but the one thing I didn't like was trying to understand the first.second.second naming. That was confusing, but that's probably just me. |
Oh I also added a comment on the code. You added code to migrate settings from old files to new. Should we remove those old files in the process to free up flash? |
@jgstroud just a heads up that I need to make use of IRAM heap more granular around HomeKit server. I un-paired my door to test initial setup and it fails. On investigation it looks like HomeKit uses a lot (and I mean a lot) of memory to initialize itself if it is not already paired. I had to reduce the size of my message log buffer all the way down and while that got it going, it was very slow to create the initial crypto keys. There are two big malloc's in the server initialization which is all we really need to wrap in IRAM, so I am going to do that and I think that will get us going again. In the meantime, don't un-pair !!! |
Those initial pairing crypto operations are very slow and I spent a good deal of time on fixing those so you don't get timeouts. After you make the change, we need to do a bunch of unpair / repair and make sure we don't see the timeouts come back. |
We're good now. I have tested un-pairing and pairing several times. Consistent memory usage and timing. It takes about 10-12 seconds to generate the crypto keys on boot if no HomeKit paired. I've also added comments in the source file every place that IRAM is used so that any future maintainer knows to keep an eye on usage/allocation during startup process. |
I have added an advanced mode to the soft AP page... which shows all available WiFi networks with the unique hardware BSSID of the access points. This allows us to lock the device to a specific AP. Appropriate warnings have been added to the readme, soft AP page and confirmation pop-up. |
f69fed5
to
31472f7
Compare
I tried the AP Locking and I assume it's working, but there is nothing in the logs to indicate it has been locked to an AP. I think we should fix that before merging. |
Yes, agree. I have coded this up including reporting it on the main web page so very visible. I just haven't committed and pushed yet as I want to take a screenshot to include in readme. |
Ok, great. Otherwise I have no issues. |
…og html and javascript
…by syslog feature
…Kit server initialization
…WiFi Access Point to connect to.
Fixed the bug with empty hostname. Also put a test for zero length accessory name into the arduino_homekit_server source. And discovered that, shockingly, all the initialization functions are declared with void returns... so while there is a bunch of testing for error conditions, the functions do not return true/false to indicate whether setup succeeded... so calling code cannot itself abort. Changing those functions to return a bool would be good idea, but also fairly significant change, as implies also figuring out how to handle a failure. I'm not up to that right now. |
2ca0ffc
to
f25cc7d
Compare
wait for a status command to be processes to properly set the initial state of all homekit characteristics. Also timeout if we don't receive a status in a reasonable amount of time. This prevents unintentional state changes if a home hub reads the state before we initialize everything Note, secplus1 doesnt have a status command so it will just timeout This specifically addresses an issue where we get spurious door unlock notifications, but will also address incorrect door states immediately after booting.
f25cc7d
to
14f376c
Compare
I fixed the issue with the random lock notifications. Due to a race condition on boot. If homekit queries the status of the door before we receive our first status message then we have a bunch of default values set and those immediately change state when we get our first status response. I added a slight delay to starting homekit loop so we have a chance to populate the door state first. I think I'm done for now. We can merge. |
v1.8.0 (2024-10-??)
What's Changed
Known Issues