Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add software update capability #709

Open
wants to merge 25 commits into
base: main
Choose a base branch
from
Open

Conversation

agners
Copy link
Collaborator

@agners agners commented May 18, 2024

This adds software update capability to the Python Matter Server.

As it stands, the implementation does the following things:

  1. Check the DCL for software updates for a particular device
  2. If an update is available, it downloads the firmware locally
  3. Prepares the OTA Provider on the Matter fabric (using the Linux OTA Provider example app)
    3a. On first use, the OTA Provider App is first commissioned to the fabric
  4. Notifies the node about the availability of the OTA Provider and the update

The node then communicates directly with the OTA Provider. Progress of the update can be observed in the OTA Software Update Requestor Cluster of the device being updated.

There are a few things which still need to be sorted out:

  • Lifecycle management of the OTA Provider App needs improvement (termination not working/adding more updates)
  • Potential candidates need to be better checked if they are indeed compatible (minApplicableSoftwareVersion/maxApplicableSoftwareVersion)
  • Checksum of the update files needs to be checked after download
  • Split between update check and update apply
  • Handle multiple update calls for the same node
  • Support updates from testnet DCL

@agners agners marked this pull request as draft May 18, 2024 07:54
@olavt
Copy link

olavt commented May 18, 2024

Will there be some new API-commands available to control this?

@olavt
Copy link

olavt commented May 19, 2024

It would be nice if these steps where totally independent:

  1. Check the DCL for software updates for a particular device
  2. If an update is available, it downloads the firmware locally

I would like them to be:

  1. Check the DCL for software updates for a particular device
  2. If an update is available go to step 3
  3. Download a firmware file from a given location (either one from the one specified by the DCL or any other location)

The reason I want this is to be able to test firmware updates for devices in development, which is not yet published to the DCL. Then step 1 and 2 would be bypassed.

It would be nice with a separate API-function for each of the steps.

@marcelveldt
Copy link
Contributor

TODO:

  • add separate api command to check for updates
  • periodic check for node updates - set flag on matterNodeData
  • Show "update available" in Matter UI
  • HA update entity attached to the flag + manual service call to check for updates
  • The check for updates should implement throttling (HA has helpers for that)

@agners agners force-pushed the add-software-update-capability branch from 745ff7d to b43b024 Compare May 24, 2024 09:25
@agners
Copy link
Collaborator Author

agners commented May 24, 2024

Will there be some new API-commands available to control this?

Yes,

It would be nice if these steps where totally independent:

1. Check the DCL for software updates for a particular device

2. If an update is available, it downloads the firmware locally

Yes, I intend to split this two operations. It is a bit a question of what the reference will be to apply the update. I intend to use the software version number (in integer format) as reference, as this should be unique. So the commands will be:

  {
    "command": "check_node_update",
    "args": {
      "node_id": 50
    }
  }

This will return information about the software, the exact format is tbd (currently a raw dump of the JSON data returned by the DCL).

Then with this command the update will get applied:

  {
    "command": "update_node",
    "args": {
      "node_id": 50,
      "software_version": 123,
    }
  }

Will then start the node update.

I would like them to be:

1. Check the DCL for software updates for a particular device

2. If an update is available go to step 3

3. Download a firmware file from a given location (either one from the one specified by the DCL or any other location)

The reason I want this is to be able to test firmware updates for devices in development, which is not yet published to the DCL. Then step 1 and 2 would be bypassed.

Agreed, we'll need to have a way to support non-DCL updates. In fact, for testing I already need this (I've now added hardcoded updates for the OTA Requestor example).

There are various use cases, from just not in DCL but online, to just an update file from somewhere. Each of these cases need some metadata to match the update file with the right devices. The metadata should probably closely follow the DCL format.

For online ones, we could probably just pass it through the WebSocket API as well, essentially how you suggest. But I don't think this will cover all use cases. 🤔

@agners agners added the new-feature New feature or request label May 24, 2024
@agners agners force-pushed the add-software-update-capability branch 2 times, most recently from c6e19e2 to bb700de Compare June 5, 2024 14:01
@agners
Copy link
Collaborator Author

agners commented Jun 5, 2024

image

image

agners added 17 commits June 24, 2024 23:57
Check if there is a software update available using DCL software update
information.
The OTA provider downloads the updates and prepares them so Matter
devices can consume them.
Use the OTA Provider example app to implement a OTA provider. The
example app supports a JSON update descriptor file to manage update
metadata. Tested with the OTA Requestor app.
Start and commission OTA Provider App when necessary. Use random
discriminator and passcode. Store the Node ID of the OTA Provider
App once setup for fast re-use.
Verify that the DCL software update is indeed applicable to the
currently software running on the device. Add test coverage as
well.
Add global check for update where we can insert hardcoded updates or
updates from other sources in the future.
Make check_node_update a separate WebSocket command which only
checks for updates. The update_node command then will download
and actually apply the update.
Add Update specific exceptions and raise them where appropriate.
Add capability to verify the checksum of the OTA file while
downloading it.
Add two new client commands to check for updates and trigger the
update.
Create log files for each OTA Provider run. Improve setup and
commissioning of the OTA Provider.
Most update logic is related to the external OTA provider (like
commissioning and configuring it). This commit moves most of the
update logic into the ExternalOtaProvider class.
Make use of the new ChipDeviceControllerWrapper to communicate with
the device directly. This avoids unnecessary node interviewing and
showing up on the controller side.
Support updating to a specific software version by string. This requires
fetching all versions one by one, but is much more user friendly. The
code still checks that all restrictions are met (min/max version, etc).
Instead of trying to reuse the same OTA Provider instance for multiple
OTA requests, create a new instance for each request. This is easier
to implement and also allows parallel updates.

Because the OTA Provider is now ephemeral, we need to commission it on
every update. But this is quick and reliable, so not a big deal.

To support multiple updates at once, we need to make sure the OTA
Providers use a distinct Matter port (hence passing 0) and distinct
node ids. The current implementation simply uses the target node id
plus a fixed offset. Since a single node can only run one update at a
time, this is sufficient.

Furthermore, some updates seem to have a difference in reported
versionNumberString value in the DCL vs. what is actually in the OTA
metadata. Specifically Eve updates from the Testnet DCL are such
updates (e.g. 3.2.0 vs 3.2.6705). When using the OTA Provider with the
--otaImageList option, this discrepancy is an issue and causes OTA
Provider to abort.

Using the single OTA update per OTA Provider instance allows us to use
--filepath, which doesn't check the versionNumberString.
When the node's UpdateStateEnum changes from Querying to Idle it
means the update file did not get processed. This could be due to
temporary network issues or the update file not being honored by the
target node.
Instead of using an Event use a Future to mark completion of the update.
This allows to set an Exception in case we see update state transitions
which are unexpected (specifically kQuerying -> kIdle).
Running multiple updates on the same node doesn't make sense. This
should be mostly handled by the client/UX. But we do check on server
side for the critical path.
Don't raise an exception when there is no software version info on DCL.
It is perfectly fine to operate such nodes. An informational message is
good enough for this case.
@agners agners force-pushed the add-software-update-capability branch from 21ea899 to 0bf4bc5 Compare June 24, 2024 21:58
Use a new model MatterSoftwareVersion to store the software version
information typically fetched from DCL for the Matter nodes.
@agners agners force-pushed the add-software-update-capability branch from 0bf4bc5 to 7f5460b Compare June 24, 2024 21:58
@agners agners changed the title WIP: Add software update capability Add software update capability Jun 24, 2024
@agners agners marked this pull request as ready for review June 24, 2024 21:59
We've added new commands, so a schema version bump is needed.
@agners agners force-pushed the add-software-update-capability branch from 8c22aa5 to 0764463 Compare June 25, 2024 07:53
@@ -25,6 +26,21 @@ RUN \

ARG PYTHON_MATTER_SERVER

ENV chip_example_url "https://github.com/agners/matter-linux-example-apps/releases/download/v1.3.0.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you manage to move it to the libs org ?

Comment on lines 98 to +117
DESCRIPTOR_PARTS_LIST_ATTRIBUTE_PATH = create_attribute_path_from_attribute(
0, Clusters.Descriptor.Attributes.PartsList
)
BASIC_INFORMATION_VENDOR_ID_ATTRIBUTE_PATH = create_attribute_path_from_attribute(
0, Clusters.BasicInformation.Attributes.VendorID
)
BASIC_INFORMATION_PRODUCT_ID_ATTRIBUTE_PATH = create_attribute_path_from_attribute(
0, Clusters.BasicInformation.Attributes.ProductID
)
BASIC_INFORMATION_SOFTWARE_VERSION_ATTRIBUTE_PATH = (
create_attribute_path_from_attribute(
0, Clusters.BasicInformation.Attributes.SoftwareVersion
)
)
BASIC_INFORMATION_SOFTWARE_VERSION_STRING_ATTRIBUTE_PATH = (
create_attribute_path_from_attribute(
0, Clusters.BasicInformation.Attributes.SoftwareVersionString
)
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a follow-up iteration we should maybe start moving all these constants into the constants module

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, hmm maybe not as these are not plain/simple constants but created from a util function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants