-
-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retries on API or wifi fail #29
Comments
Sounds like you are understanding everything correctly. You are looking in the right places, most of the logic you are looking for is in main.cpp.
First off, that is an excellent idea and definitely possible. Let me share some more context about the project. One of the primary goals and motivations of this project is that it is low power enough to run on battery for extended periods (months). For this reason, much effort was taken to reduce time spent awake as much as possible. One thing to keep in mind is that when the esp32 wakes up from deep sleep, it enters the setup() function. This can make it somewhat less trivial to keep track of the state of the weather display. However there are solutions, ideally we store the state of the project to non-volatile storage (nvs) before entering deep-sleep then when we awake we can read from nvs. An example of this is low battery behavior. When low battery happens we want the display to show the low battery warning and then no longer update the display until the battery level is high enough again. So if the display wakes up when it has low battery it will check nvs to see if the low battery screen is already displayed. If so it will immediately sleep. In regards to wifi, attempts are made to connect for 10s, after that it's considered a lost cause so we give up. You may have seen this option already but I thought I'd mention it cause it is related to this discussion, -Luke |
Thanks for the reply!
I just found most of this logic :-) What I am actually thinking right now is maybe I've given myself an xy problem - perhaps I should start with why the board doesn't connect to wifi every wake first. FYI:
For the log above, the display does NOT go to the API error screen. I have yet to capture a log where the error screen gets called. When it does, it seems to then go to sleep until the next interval (currently one hour) so thedisplay would have that error in place for an hour, I want to add some retries, or leave the previous forecast on screen. |
This appears to be the expected behavior, correct? |
Yes - sorry, I was mid-edit on my previous post, please see that one. |
If you are having issues with your board connecting to wifi on first wake, are you referring to the first wake after you flashed new code? Particularly when you flash new code it is not uncommon to fail to connect to wifi for whatever reason. I would simply hit the reset button. |
You might also try increasing the timeout for the http GET request to openweathermap if you are having issues with it timing out. |
Unfortunately no, it's after a few successful updates e.g. I have this test running on my workbench and yesterday it was fine all day (1 hr updates) then the one at 2pm failed - 3pm fixed it. The wifi connect is less worrisome, as above you confirmed it tries for 10s and I could increase that to say 30 but beyond that I need to look at my wifi setup. What about as a quick change, if I comment out this under each of the air quality and Onecall blocks?
|
An unusual "complaint" but you are responding so fast I am getting out of sync :-) :-) :-)
Have done so on wifi connect as well as the GETs, thanks. |
This code only draws the error to the display. Maybe I am misunderstanding what the goal of commenting this out is? |
I was hoping to leave the previous forecast on screen until the next wake attempt - at the moment I see it pushes the error to the screen; Old info on the screen would be less jarring to me that an error message being there for an hour - I see the pain that it also hides the fact there was an error so maybe 3 attempts a few mins apart, leaving the old forecast on screen, and finally displaying the error? For now I am increasing the timeout 10s --> 20s on wifi connect each wakeup, and the attempts for the GETs under API calls from 3-->5 and will run it a few hours. I suspect most of this is a manifestation of being on a slow internet connection with long ping times to the OpenWeather servers half a world away, apart from the wifi connection failures which are mine to own. |
Oh then, yes comment out the drawError messages and it should have the effect you desire. |
Thanks - for now will run with the increased connection attempts and report back in a few days! Also on the wifi connection attempts - totally my fault, once I bothered to look at RSSI it was way too low, made no sense, until I realised ESP only uses 2.4GHz and I am well covered with 5GHz all over the house but 2.4GHz is only served by one access point which not surprisingly is not close to where I had the weather station. |
Update me if increasing those timeout times works, I could increase the default for the project. Admittedly this is a hard thing for me to control for when I am testing so the default times picked are somewhat arbitrary. |
Absolutely will update, I got a whole weather forecast e-paper project for no effort on my part, will contribute back anything I can find ! reminder set for a couple of days :-) |
So here's the update - tl;dr is this is now a great fixture on my wall and I am immensely grateful to you. I made the change in the drawError routine to not redraw the display on error - keeping the previous hour's display (I have it set for hourly updates). I also changed the retries in a couple of places for GETting the data from 3 to 5. I also investigated the underlying cause, and apart from the no connect on first reboot after programming which as you mentioned is just a thing, my internet has several hundred ms pings to the openweathermap servers, with occasional (rough estimate 25% of the time) complete drops of the connection when using a web browser. I don't have the best internet but usually don't notice webpages having a problem, but manually browsing OpenWeather I do occasionally have to manually refresh after getting a 40x error. So - a few more retries and keeping the last hour's refresh on fail has made the display from a common user perspective (read that as "wife perspective") pretty dang good. No one cares if the weather data is an hour or two out of date, but a big honkin' error screen draws attention. What would be ideal is say after 3 hours of retires we give up and show an error, but that is a bit beyond me at this stage. I got my e-ink display and built the project into a small picture frame to hang on the wall - cutting the border paper mat that came with the frame to the size of the display is a very easy way to get a clean looking build. Thanks again - now please provide code to make the project produce weather instead of predicting :-) |
Thanks for looking into this. I'll keep this issue open until then, and I'll tag it when I push updates related to your feedback. |
I've been looking at this, as well, and have a theory on what's going on. I'm only seeing the api connect errors on the Air Quality request, never on the main weather update request that occurs first. Sometimes it's a time out, and others it's an outright connection refused, as in maybe it's hammering on the connection too quickly after the weather api request. I'm thinking if you can add in a small (or increased) delay before it attempts to grab the Air Quality api info that should alleviate or remove the errors. The logs show that it usually catches on the 2nd or 3rd retry, and I probably see the API error msg on the display about 2 or 3 times a week. |
@ChiefPoints good idea |
@ChiefPoints now that I think about it, I'm fairly sure I did see the AQI error more often than any other. Note my distance form the servers and general internet connection probably still has a lot to do with it - I was seeing "connection refused" errors even in a web browser, though seemingly not as often as the ESP got. |
Just popping in again what...4 months later to say the project is still on my wall, still getting compliments, and with the hack I put in place has had zero issues to date. Thanks again Luke! and anyone else wanting a nice easy mounting system for this - try a small picture frame :-) |
Thanks for following up :). I'm glad to hear it is working well after your patches. It's been on my todo list to properly address this for quite some time and I just want to reiterate my intent to fix this in the main project sometime in the medium-term future. @YouCanNotBeSerious you are welcome to open a pull request if you want with your fixes and would be happy to review them and get them merged into the main project. PS - the picture frame sounds like a very sleek solution! |
WOW! That looks really good! Thank you so much for sharing O_O |
Ditto what everyone has said about how great this project is -- many thanks! I have a consistent problem with time synchronization via NTP failing (at least the initial synch). I believe that the initial synch takes about 30 seconds (asynchronous to the main process), so it's likely that after startup synchronization won't have succeeded before getLocalTime() is called. I addressed this by watching the synch status for completion in the setup:
This usually succeeds on the third time through the loop as expected and fixes the missing time problem for me. Also, I'm not sure you need to setup NTP every time you awaken, but I haven't had a chance to test that yet. On a related note, this loop in printLocalTime:
Probably doesn't do much; I think the only way getLocalTime() can fail is if the time has not been set and that's not likely to change during three calls in rapid succession. But maybe I'm missing something there. |
These are all great finds @srt19170. Feel free to create a pull request with these fixes, otherwise, I will try and get to it this weekend. |
Upon further investigation, it appears that the SNTP process is lost during deep sleep, so you do need to configure and start it again upon wakeup. It appears that the system keeps track of time pretty well across the deep sleep cycles. Timezone has to be corrected after deep sleep, but after that the time seems to be accurate. Given that the time being off even (say) five minutes wouldn't have much impact on the display, you could probably optimize and only do an SNTP update every couple of hours, or even just once a day. But that might be an unnecessary complication, given that it will only save a small amount of power. |
I've submitted a pull request with the proposed fix for more reliable NTP synchronization. (Hopefully I did it correctly; not my forte!) |
Since @YouCanNotBeSerious posted his photo in this thread I'll do the same for my finished version: Thanks again for documenting your project so thoroughly and helping me out. I certainly could not have done this on my own! |
WOW! That is beautiful! 10/10. |
Hi -
VERY NICE PROJECT for a start - thanks for sharing it with us all.
I have flashed the code onto a spare board to play with before committing to buying the display .... what I see so far (debug terminal) is that on a wifi fail, or API request fail we wait until the next wake time to try again. I am examining the code (as an amateur!) now, but if this is something I have misconfigured or is not expected plz let me know :-)
I'll try to work out in code how to try say 3 times 1 minute apart on a fail before failing until the next wake time - already I see:
display.powerOff();
beginDeepSleep(startTime, &timeInfo);
on a fail of fetching the time so will start there.
The text was updated successfully, but these errors were encountered: