Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MQTT Configuration vanished without any reason #1930

Open
zaphood1967 opened this issue Aug 10, 2024 · 31 comments
Open

MQTT Configuration vanished without any reason #1930

zaphood1967 opened this issue Aug 10, 2024 · 31 comments

Comments

@zaphood1967
Copy link

zaphood1967 commented Aug 10, 2024

log.txt

PROBLEM DESCRIPTION

While I have been on holidays, the MQTT Configuration got removed without any obvious reason. MQTT was disabled when I checked. After providing Broker Address, username and PW and enabling MQTT Discovery, all went back to normal. As I had been away, I am absolutely sure, nothing changed during the last two weeks, while this problem occured the second time now on my system.

REQUESTED INFORMATION

This is the same effect as I was describing here: #1882

EMS is on 3.6.5.

Attached you can find the log, not sure it helps.

@proddy
Copy link
Contributor

proddy commented Aug 10, 2024

I've had this once. I need to look into the code to find why it's happening - it seems to happen after a restart

@proddy proddy added the bug Something isn't working label Aug 10, 2024
@zaphood1967
Copy link
Author

Yes, you said so already. But as I haven't been on site and according to the log, the device seemed to be up the entire time. So I am not sure what it happened this time (again). Last time there was no reboot either before the problem. I did the reboot only during the analysis of why the device did not respond anymore. Furthermore, I am using this device now for roughly one year and only saw that happen twice within the last few weeks. So something might have been changed explicitly in the last FW update ?

@zaphood1967
Copy link
Author

And it went away again just now. Log attached.
Uploading log (1).txt…

@proddy
Copy link
Contributor

proddy commented Aug 10, 2024

Are you building the firmware manually? The 3.6.5 from the GitHub releases page hasn't changed since it was created back in March. So I don't expect its a firmware update.

I think it must be something in the code, like the MQTT settings are not loaded for some reason. I'll take a look with Michael.

@zaphood1967
Copy link
Author

Nope, no self building involved ;-) After I purchased the device I think there was just one update being offered. Downloaded the compiled file from Github then and installed it, that's it. Must have been already a while ago.

@proddy
Copy link
Contributor

proddy commented Aug 10, 2024

There are some nice new features coming in the next version (3.7) you may be interested in. There's a demo at https://demo.emsesp.org

@zaphood1967
Copy link
Author

Looks nice and modern, and there is even Modbus now ? Wow... ;-) What I am missing for a Buderus RC310 is mainly the option to modify heating (and ww) programs. Currently I can only select the program, but changing the schedule is only possible at the physical controller. Not sure if this is a restriction of the EMS interface or just not implemented in the EMS ESP, but it would be a very welcome feature

@proddy
Copy link
Contributor

proddy commented Aug 10, 2024

We get this feature request a few times, also for heating pumps. Look at the discussion in #1918

@MichaelDvP is it an idea to write the switch times in a .CSV/.txt file and then upload it using the Upload/Download page. The data can be written to an internal LittleFS file and executed as a Command?

@MichaelDvP
Copy link
Contributor

Don't know why the mqtt config was deleted. Are other confgs affected also? Please post the support info.

2024-08-08 23:41:33.000 INFO 2: [emsesp] Last system reset reason Core0: RTC watch dog reset: CPU+RTC, Core1: APP CPU reset by PRO CPU
Seems there was something blocking, WTD-reset is very rare.
2024-08-09 06:01:27.715 WARNING 17: [emsesp] WiFi disconnected. Reason: assoc leave (8)
There was also a wifi disconnect initiated by the AP. Does this happen often?

For the switch programs. Maybe we should start a discussion about this topic, #1918 went too much OT.

The RC300 switchprog have 6 switchpoints a day of week with time and temperature, This give 84 (6 * 7* 2) entities for a program. Each hc have two programs and each deh have a program and a circulation program, This gives 1000 entities.
Too much for the given data structure, impossible to include in the thermostat_data mqtt-topic.

We can read the telegram-data and store internally raw, This cost 1k of ram. And generate a json (or csv) on demand.

  • api is easy, we can generate the reply on api query.
  • mqtt: maybe publish rarly (once a minute) in an extra topic thermostat_hc1_switchprog_a_data, etc.
  • webpage, i don't have a idea, extra pages?
  • commands? maybe a command to set a single switchpoint. As string or json?

We need a concept that fit's in web/console/mqtt/api and does not blow the memory of the esp.

@zaphood1967
Copy link
Author

zaphood1967 commented Aug 10, 2024

Don't know why the mqtt config was deleted. Are other confgs affected also? Please post the support info.

Not 100% sure, maybe the language setting was also reverted to english... but I am not sure, if it was set to German before. No other settings seemed to be affected,.

2024-08-08 23:41:33.000 INFO 2: [emsesp] Last system reset reason Core0: RTC watch dog reset: CPU+RTC, Core1: APP CPU reset by PRO CPU Seems there was something blocking, WTD-reset is very rare. 2024-08-09 06:01:27.715 WARNING 17: [emsesp] WiFi disconnected. Reason: assoc leave (8) There was also a wifi disconnect initiated by the AP. Does this happen often?

You mean, disconnect from WLAN ? At least not that I am aware of. Usually, if such happens, other devices just do a reconnect, so this would not to be noticed easily. I am using AVM Fritz Repeater 1700 in AP Mode (LAN Input, not acting as repeater but as an AP). The AP for the basement is literally on the other side of the wall where the ESP is located. Being a quite old building, this is build of bricks, not concrete and steel. So connectivity should be very solid.

Please find the requested logs attached.

@zaphood1967
Copy link
Author

zaphood1967 commented Aug 10, 2024

For the switch programs. Maybe we should start a discussion about this topic, #1918 went too much OT.

Agree

The RC300 switchprog have 6 switchpoints a day of week with time and temperature, This give 84 (6 * 7* 2) entities for a program. Each hc have two programs and each deh have a program and a circulation program, This gives 1000 entities. Too much for the given data structure, impossible to include in the thermostat_data mqtt-topic.

yayks.... understandable...

We need a concept that fit's in web/console/mqtt/api and does not blow the memory of the esp.

If the problem is that "large", I would totally understand when this is not going to be implemented at all.

@MichaelDvP
Copy link
Contributor

Please find the requested logs attached.

Never post settings-file (ok, you have removed personal infos). There is the support info as first button on help- and download-page. This have no personal infos, but much more states and measures helping to debug.

I would totally understand when this is not going to be implemented at all.

There is always a way, but not easy and a lot of work.

@zaphood1967
Copy link
Author

zaphood1967 commented Aug 10, 2024

Please find the requested logs attached.

Never post settings-file (ok, you have removed personal infos). There is the support info as first button on help- and download-page. This have no personal infos, but much more states and measures helping to debug.

Did I post the wrong one ? Maybe the attached file is the correct one? (I'd always clean logs before posting, but thanks for the reminder anyways ;-))

emsesp_system_info.txt

@MichaelDvP
Copy link
Contributor

Thanks for the system info. It's a S32,16M with 2 large ota partitons. Heap/free alloc space looks ok. Maybe reduce the log buffer, but this should happen automatic if memory goes down.

I have no idea what happend.

  1. a watchdog reset, seems something was blocking. Maybe something happend in the (sync) mqtt.loop? @proddy Should we make the mqtt async for systems with PSRAM? But this does not help here,
  2. mqtt settings reset. I can not find anything in code that removes only these settings.

Sorry.

@zaphood1967
Copy link
Author

  1. I have had increased the Log Buffer only after we started to debug. Will dial it back to 50 though.
    The MQTT Server is a Mosquitto that runs as an Add-on on my Homeassistant-machine. Not sure this causes any problems, as HA operates the Add-ons as Containers, hence we have NAT on the Server involved. Before this, the Mosquitto was even in a remote location and the ESP talked to it via a VPN, which did not cause any hickups.

  2. Weird. Maybe a part of the flash is defective, so not all data gets saved in a clean manner?

@MichaelDvP
Copy link
Contributor

Maybe a part of the flash is defective, so not all data gets saved in a clean manner?

Don't think so. The filesystem has wearleveling, On every change the data are stored in a different place. Mqtt settings are a single file and will be written only if there is a change via web-interface.

On startup the file is read and on read error or if the json can not be deserialized the default is set. Maybe anything (a special character) in the settings that can cause a deserialization error? But in this case there will be a reset to defaults on every reboot.

@proddy
Copy link
Contributor

proddy commented Aug 11, 2024

I've seen this behaviour when I was flashing/restarting an ESP32 during development - some of the MQTT settings would be reset. But it's happened like twice in 2 years and I can't reproduce it. Neither can I see anything in the code that would suggest why. I think we need to wait until it happens again.

@zaphood1967
Copy link
Author

I've seen this behaviour when I was flashing/restarting an ESP32 during development - some of the MQTT settings would be reset. But it's happened like twice in 2 years and I can't reproduce it. Neither can I see anything in the code that would suggest why. I think we need to wait until it happens again.

Is there anything I can do to prepare, assuming this occurs again? Loglevel or anything like that ? I have experienced that now 3 times and the logs have been provided, seemingly being no help for you. So if I can prepare anything that would help you guys, just let me know.

@proddy
Copy link
Contributor

proddy commented Aug 11, 2024

From what I understand, something is causing your EMS-ESP to restart, and when it does, some of the MQTT settings get reset. So, possibly two separate issues. A few ideas:

  • turning on SysLog to capture the logs would help, since the logs are not persisted in EMS-ESP after it restarts
  • do you know if HA or anything around the MQTT broker is changed around the same time?
  • I assume the EMS-ESP is powered adequately and has a strong WiFi signal (as that can cause restarts)
  • do you think you can easily reproduce it? Like restarting HA, if you're using HA's embedded Mosquitto broker
  • I would go to 3.7.0. We're not doing fixes on 3.6.5 and there's a chance it works in the latest dev version

@zaphood1967
Copy link
Author

  • turning on SysLog to capture the logs would help, since the logs are not persisted in EMS-ESP after it restarts

Ok, I will need to set up a SysLog Target then.

  • do you know if HA or anything around the MQTT broker is changed around the same time?

No, nothing happened. As I said, I have been on vacation when it happened. So definitely no update or change anywhere

  • I assume the EMS-ESP is powered adequately and has a strong WiFi signal (as that can cause restarts)

It is powered by the connection to the Buderus Controller.

  • do you think you can easily reproduce it? Like restarting HA, if you're using HA's embedded Mosquitto broker

Did a reboot of the entire HA VM just now (including Mosquitto) -> no Effect. Then I rebooted the ESB -> No Effect as well, MQTT settings are still there. Weird.

  • I would go to 3.7.0. We're not doing fixes on 3.6.5 and there's a chance it works in the latest dev version

As I am not on site, this is something I can only do in app. 2 Weeks time.

@MichaelDvP
Copy link
Contributor

BTW: When going to 3.7.0-dev, use the ESP32 version, **NOT the ESP32-16M! **
I know your S32 has 16Mflash, but no PSRAM. The 16M version default to E32V2 hardware with PSRAM,
@proddy Also for the update page, check PSram.

@proddy
Copy link
Contributor

proddy commented Aug 11, 2024

BTW: When going to 3.7.0-dev, use the ESP32 version, **NOT the ESP32-16M! ** I know your S32 has 16Mflash, but no PSRAM. The 16M version default to E32V2 hardware with PSRAM, @proddy Also for the update page, check PSram.

Think it's fixed in the latest PR (#1931). See getPlatform() in UploadDownload.tsx. I just remove 16M if its an S3.

@MichaelDvP
Copy link
Contributor

I just remove 16M if its an S3.

Yes i've seen it, but the S32 gateways (without S3) have 16M flash without PSRAM, It can't handle the 16M file.
The 16M file is compiled with -DEMSESP_DEFAULT_BOARD_PROFILE="E32V2 and defaults on factory reset to the wrong board-pins when flashed to S32.

@proddy proddy removed the bug Something isn't working label Aug 12, 2024
@proddy proddy changed the title MQTT Configuration vanished without an reason MQTT Configuration vanished without any reason Sep 9, 2024
@zaphood1967
Copy link
Author

And here we go again. ESP vanished from HA once again. No power loss, no Thunderstorm. It just quit sending MQTT data. Website did not respond either. Funny enough, ping worked. After a reboot webserver was working again, but MQTT config was, once again, gone. Needed to key in server IP, credentials and HA discovery.

I am now assuming that something on the ESPs memorychip might be faulty...

@proddy
Copy link
Contributor

proddy commented Oct 14, 2024

I swear this bug will haunt me forever!

Can you remember what was happening before EMS-ESP went into this dead state?

@zaphood1967
Copy link
Author

Well, then you won't be alone at least....

It stopped sending Data on Monday around 11 o'clock. The last event that might be of any significance here has been taken place Saturday, so 2 days in advance. I was working on the mains power and had switched off the entire house for an hour or so. After that, everything came up without problems, including the ems-esp. So unfortunately there is nothing I can directly link to, that might have had an impact on the ems-esp on Monday.

Strange enough, it had kept the fixed IP. For whatever reason, it only seems to dislike the MQTT configuration.
I have now upgraded to 3.7.0-dev45, as you recommended in the last post.

@zaphood1967
Copy link
Author

zaphood1967 commented Oct 19, 2024

And once again the ESP died. Worked for one day, not responding since the 15th. Guys, I think the ESP is plainly defective, I have no other explanation, as there was - once again - no compelling event that could have initiated that problem.

Edit: Seems this time it does not come back again, blue LED is blinking, nothing else. I'd assume it now deleted the WLAN Configuration as well ? Bummr, nobody on Site to put it back into the WLAN :-(

I would like to take a more structured approach here.

First Point: Can somebody point me to an Open Source Syslog server, preferably for Debian ? Greylog seemed promising, but the docs are totally outdated. Not really a Linux guy, so a simple one would be great.

Second point: Is there any way I can make the device reboot once a day?

Third point: Is there a way to just swap out the ESP for a new one and then flash the image to it? An ESP is quite cheap, so if that is an option, please tell me which exact model to get

Edit 2: Maybe the power supply by the Buderus controller is not stable enough? I am thinking of switching to the LAN Version of the gateway. Can it be powered by the external power supply while being connected to the EMS Bus, without damaging the EMS interface on the controller?

@zaphood1967
Copy link
Author

Nobody any Feedback?

@proddy
Copy link
Contributor

proddy commented Oct 23, 2024

From the 10,000 users you're the only one to report this strange behaviour so I suspect it's either a hardware/board issue, or a power issue where the ESP32 doesn't have enough amps to run all the services like writing to the filesystem.

I would recommend first powering by an external power supply, which is always recommended. Leave it for a few weeks to see if that fixes the problem.

Also look at the doc to see what your memory usage is in the EMS-ESP, and follow the recommendations to reduce power (in WiFi).

syslog comes with Debian. Google rsyslog and you'll find hundreds of articles how to install it within minutes.

you can reboot EMS-ESP daily using the Scheduler module. Check out the wiki/doc for the command. I'm not sure that will help.

If all that fails, I'm sure BBQKees can send you a replacement version.

@bbqkees
Copy link
Contributor

bbqkees commented Oct 23, 2024

In general the firmware forgetting the settings is not a hardware failure.
The setting are kept in a part of the Flash storage. If the Flash would fail, probable the whole thing would stop functioning.

It appears you have the S32 Gateway. This model does not have additional PSRAM on board. The S32 struggles having to deal with larger amounts of entities, especially when MQTT Discovery is turned on.

Can you do the following:
Flash the firmware again via USB:https://bbqkees-electronics.nl/wiki/gateway/firmware-update-and-downgrade.html#uploading-the-firmware-via-ems-esp-flasher-flashtool
Either select 'Erase Flash', or with the latest version of the tool deselect 'Keep settings'.
And use the following bin file:https://github.com/emsesp/EMS-ESP32/releases/download/v3.6.5/EMS-ESP-3_6_5-ESP32-16MB.bin
The method above loads the Gateway into the 'real' factory condition, and the whole Flash storage is deleted and setup as new.

When loaded, reconfigure the Gateway as new. (So do not upload a previous config).

Then turn off all the firmware features you do not need. Like NTP, Syslog etc.
Then create a customization where you remove all entities from memory that you do not need.
Then save and reboot. This should keep the most memory available.

If it still forgets the settings after this then contact me again via email for a swap or upgrade.

@zaphood1967
Copy link
Author

zaphood1967 commented Oct 23, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants