Skip to content

MQTT curtailment source stays connected after brokers gone #565

Description

@rongxin-liu

Describe the bug

The MQTT curtailment source can continue to show as connected in Proto Fleet after both configured MQTT brokers are deliberately killed.

From code inspection, the runtime status appears to be latched after the initial broker connect/subscribe succeeds. The worker reports Connected: true / Subscribed: true once in server/internal/domain/curtailment/mqttingest/worker.go, and the Paho client in server/internal/infrastructure/mqttclient/client.go has auto-reconnect enabled but does not appear to propagate connection-lost/reconnecting/on-connect callbacks back into RuntimeStatusUpdate. The frontend maps runtime state RUNNING to the green Connected label, so the UI can keep reporting connected even when broker sockets are gone.

Preconditions

  • Proto Fleet has an enabled MaestroOS MQTT curtailment source configured with two brokers.
  • The brokers can be started with deployment-files/scripts/mqtt-curtailment-loop.sh or an equivalent two-broker Mosquitto setup.
  • Source page is open under Settings > Curtailment sources.

Steps to reproduce

  1. Start two MQTT brokers and configure the source with the printed primary/secondary broker hosts, port, topic, username, and password.
  2. Confirm Proto Fleet shows the source as connected after initial subscription and signal receipt.
  3. Deliberately kill both MQTT broker containers/processes.
  4. Wait for at least one source-list poll interval.
  5. Observe the source health in Proto Fleet.

Expected behavior

The source should stop showing as connected once the runtime MQTT clients lose broker connectivity. At minimum, broker runtime counts should decrement and the source should transition away from RUNNING / green Connected when both broker sockets are disconnected.

Proto Fleet version

Current local commit observed while filing: db5bbbffd.

Environment

Observed in local development on macOS with Docker-hosted Mosquitto brokers.

Logs

Not captured yet. Suggested follow-up: reproduce while logging Paho connection-lost/reconnect events and the mqttingest.RuntimeStatusUpdate stream.

Screenshots & screen recordings

Not captured yet.

Additional context

Likely implementation areas:

  • server/internal/infrastructure/mqttclient/client.go: wire Paho connection lost / reconnecting / reconnected callbacks into the ingest runtime.
  • server/internal/domain/curtailment/mqttingest/worker.go: update per-broker runtime status after post-start disconnects, not only after initial connect/subscribe.
  • server/internal/domain/curtailment/mqttingest/subscriber.go: verify RunningBrokerCount / SubscribedBrokerCount recompute correctly when one or both brokers disconnect.
  • client/src/protoFleet/api/useMqttCurtailmentSources.ts: once server status is accurate, the existing health mapping should stop showing Connected for non-running sources.

Acceptance test idea: simulate an MQTT client that connects successfully, then emits a connection-lost event; assert the source runtime status transitions from RUNNING with two brokers to a non-connected state when both brokers are lost.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions