Hide TTS filename behind random token #131192

synesthesiam · 2024-11-21T16:23:41Z

Breaking change

TTS URLs of the form /api/tts_proxy/{filename} no longer map to {filename} directly in the TTS cache. This means that TTS URLs will change every time HA is restarted.

Proposed change

The text-to-speech (TTS) cache stores audio files using a SHA1 hash of the text as part of the file name. The filename is currently used directly in the web API, where /api/tts_proxy/{filename} maps directly to {filename} in the TTS cache.

This presents a small security issue when an HA instance is exposed publicly, as a malicious actor could try to retrieve files with a known SHA1 to determine whether or not a particular message was spoken.

A simple fix is provided in this PR: the TTS SpeechManager contains a mapping between cache file names and a randomly generated token using the secrets library. This ensures there is no relationship between the URL to retrieve a TTS audio file and its message.

Type of change

Dependency upgrade
Bugfix (non-breaking change which fixes an issue)
New integration (thank you!)
New feature (which adds functionality to an existing integration)
Deprecation (breaking change to happen in the future)
Breaking change (fix/feature causing existing functionality to break)
Code quality improvements to existing code or addition of tests

Additional information

This PR fixes or closes issue: fixes #
This PR is related to issue:
Link to documentation pull request:

Checklist

The code change is tested and works locally.
Local tests pass. Your PR cannot be merged unless tests pass
There is no commented out code in this PR.
I have followed the development checklist
I have followed the perfect PR recommendations
The code has been formatted using Ruff (ruff format homeassistant tests)
Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

Documentation added/updated for www.home-assistant.io

If the code communicates with devices, web services, or third-party tools:

The manifest file has all fields filled out correctly.
Updated and included derived files by running: python3 -m script.hassfest.
New or updated dependencies have been added to requirements_all.txt.
Updated by running python3 -m script.gen_requirements_all.
For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.

To help with the load of incoming pull requests:

I have reviewed two other open pull requests in this repository.

home-assistant · 2024-11-21T16:23:49Z

Hey there @home-assistant/core, mind taking a look at this pull request as it has been labeled with an integration (tts) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of tts can trigger bot actions by commenting:

@home-assistant close Closes the pull request.
@home-assistant rename Awesome new title Renames the pull request.
@home-assistant reopen Reopen the pull request.
@home-assistant unassign tts Removes the current integration label and assignees on the pull request, add the integration domain after the command.
@home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the pull request.
@home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the pull request.

balloob · 2024-11-21T16:40:56Z

homeassistant/components/tts/__init__.py

+            self.filename_token[filename] = token
+            self.filename_token[token] = filename


Can we track this in 2 different dictionaries. Feels weird to reuse.

Also, shouldn't _async_get_tts_audio( just return a data class that contains filename and token ?

maybe that conflates concerns.

I split the dictionaries for now. A larger refactoring here where the token and cache key are the same should be done in the future.

balloob · 2024-11-21T16:42:06Z

homeassistant/components/tts/__init__.py

        if not (record := _RE_VOICE_FILE.match(filename.lower())) and not (
            record := _RE_LEGACY_VOICE_FILE.match(filename.lower())


This is no longer needed now

It's still needed to reconstruct the cache key correctly right below. I'll save this clean up for a future PR.

balloob · 2024-11-21T16:42:49Z

homeassistant/components/tts/__init__.py

        """Read a voice file and return binary.

        This method is a coroutine.
        """
+        filename = self.filename_token.get(token)
+        if not filename:
+            raise HomeAssistantError(f"{token} was not recognized!")


This should raise a 401, as we shouldn't expose if the key existed or not.

Wouldn't raising a different error here than in the other conditions (404) make it obvious that the key doesn't exist?

Hide TTS filename behind random token

3730c8d

synesthesiam marked this pull request as ready for review November 21, 2024 16:23

home-assistant bot added breaking-change cla-signed core has-tests labels Nov 21, 2024

synesthesiam requested a review from a team as a code owner November 21, 2024 16:23

home-assistant bot added integration: tts new-feature small-pr PRs with less than 30 lines. Quality Scale: internal labels Nov 21, 2024

balloob reviewed Nov 21, 2024

View reviewed changes

synesthesiam added 3 commits November 21, 2024 13:44

Clean up and fix test snapshots

be6946e

Fix tests

1f253cb

Fix cloud tests

d8eb9d2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hide TTS filename behind random token #131192

Hide TTS filename behind random token #131192

synesthesiam commented Nov 21, 2024

home-assistant bot commented Nov 21, 2024

balloob Nov 21, 2024

balloob Nov 21, 2024

synesthesiam Nov 21, 2024

balloob Nov 21, 2024

synesthesiam Nov 21, 2024

balloob Nov 21, 2024

synesthesiam Nov 21, 2024

		self.filename_token[filename] = token
		self.filename_token[token] = filename

		if not (record := _RE_VOICE_FILE.match(filename.lower())) and not (
		record := _RE_LEGACY_VOICE_FILE.match(filename.lower())

Hide TTS filename behind random token #131192

Are you sure you want to change the base?

Hide TTS filename behind random token #131192

Conversation

synesthesiam commented Nov 21, 2024

Breaking change

Proposed change

Type of change

Additional information

Checklist

home-assistant bot commented Nov 21, 2024

balloob Nov 21, 2024

Choose a reason for hiding this comment

balloob Nov 21, 2024

Choose a reason for hiding this comment

synesthesiam Nov 21, 2024

Choose a reason for hiding this comment

balloob Nov 21, 2024

Choose a reason for hiding this comment

synesthesiam Nov 21, 2024

Choose a reason for hiding this comment

balloob Nov 21, 2024

Choose a reason for hiding this comment

synesthesiam Nov 21, 2024

Choose a reason for hiding this comment