Skip to content

fix(backend): handle UnicodeDecodeError and validate str_replace count#2665

Open
zhoufengen wants to merge 1 commit intobytedance:mainfrom
zhoufengen:fix/backend-robustness-improvements
Open

fix(backend): handle UnicodeDecodeError and validate str_replace count#2665
zhoufengen wants to merge 1 commit intobytedance:mainfrom
zhoufengen:fix/backend-robustness-improvements

Conversation

@zhoufengen
Copy link
Copy Markdown

Summary

Improve robustness in two backend components:

1. Artifact serving: handle UnicodeDecodeError for non-UTF-8 text files

File: backend/app/gateway/routers/artifacts.py

The get_artifact endpoint calls actual_path.read_text(encoding="utf-8") after is_text_file_by_content() returns True. However, is_text_file_by_content() only checks for null bytes — a file can pass this check while still containing non-UTF-8 byte sequences (e.g., Latin-1, Shift-JIS encoded text files). This causes an unhandled UnicodeDecodeError resulting in a 500 error.

The .skill archive code path in the same file (lines 152-155) already handles this correctly with a try/except UnicodeDecodeError fallback to binary response.

Fix: Wrap both read_text calls in try/except UnicodeDecodeError that fall through to the binary Response path, consistent with the existing .skill archive handling.

2. str_replace tool: enforce single-occurrence when replace_all=False

File: backend/packages/harness/deerflow/sandbox/tools.py

The str_replace_tool docstring states: "the substring to replace must appear exactly once in the file" when replace_all is False. However, the implementation uses content.replace(old_str, new_str, 1) which silently replaces only the first occurrence regardless of how many exist. This can lead to unexpected edits when the agent intends to make a precise single replacement.

Fix: Add a count check before replacing. If content.count(old_str) > 1, return an error message telling the agent the string appears multiple times and suggesting either replace_all=True or a more specific string.

Testing

  • All 2792 existing tests pass
  • ruff check passes
  • ruff format --check passes

…tool

- Handle UnicodeDecodeError in artifact serving for non-UTF-8 text files,
  falling back to binary response instead of raising 500 error
- Add occurrence count validation in str_replace tool when replace_all is
  False, returning clear error when substring appears multiple times

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 30, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves backend robustness by preventing unhandled decode errors when serving artifact files and by aligning the str_replace sandbox tool’s behavior with its “single occurrence” contract.

Changes:

  • Catch UnicodeDecodeError when serving artifacts that look like text but aren’t valid UTF-8, falling back to a binary response.
  • In str_replace (when replace_all=False), count occurrences and return an error if the target string appears more than once.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
backend/packages/harness/deerflow/sandbox/tools.py Enforces single-occurrence replacement when replace_all=False by adding a count check and error message.
backend/app/gateway/routers/artifacts.py Wraps UTF-8 read_text() calls with UnicodeDecodeError handling to avoid 500s on non-UTF-8 “text-like” files.

Comment on lines +178 to +187
try:
return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type)
except UnicodeDecodeError:
pass # Fall through to binary response

if is_text_file_by_content(actual_path):
return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type)
try:
return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type)
except UnicodeDecodeError:
pass # Fall through to binary response
Comment on lines 177 to +180
if mime_type and mime_type.startswith("text/"):
return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type)
try:
return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type)
except UnicodeDecodeError:
Comment on lines 1571 to 1575
else:
count = content.count(old_str)
if count > 1:
return f"Error: The string to replace appears {count} times in {requested_path}. Use replace_all=True to replace all occurrences, or provide a more specific string that appears exactly once."
content = content.replace(old_str, new_str, 1)
Comment on lines +1572 to +1574
count = content.count(old_str)
if count > 1:
return f"Error: The string to replace appears {count} times in {requested_path}. Use replace_all=True to replace all occurrences, or provide a more specific string that appears exactly once."
Comment on lines 177 to +187
if mime_type and mime_type.startswith("text/"):
return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type)
try:
return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type)
except UnicodeDecodeError:
pass # Fall through to binary response

if is_text_file_by_content(actual_path):
return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type)
try:
return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type)
except UnicodeDecodeError:
pass # Fall through to binary response
@WillemJiang
Copy link
Copy Markdown
Collaborator

@zhoufengen, thanks for your contribution. Please take a look at the review comments for Copilot.

@WillemJiang WillemJiang added the question Further information is requested label May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

question Further information is requested

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants