fix(backend): handle UnicodeDecodeError and validate str_replace count by zhoufengen · Pull Request #2665 · bytedance/deer-flow

zhoufengen · 2026-04-30T04:25:20Z

Summary

Improve robustness in two backend components:

1. Artifact serving: handle UnicodeDecodeError for non-UTF-8 text files

File: backend/app/gateway/routers/artifacts.py

The get_artifact endpoint calls actual_path.read_text(encoding="utf-8") after is_text_file_by_content() returns True. However, is_text_file_by_content() only checks for null bytes — a file can pass this check while still containing non-UTF-8 byte sequences (e.g., Latin-1, Shift-JIS encoded text files). This causes an unhandled UnicodeDecodeError resulting in a 500 error.

The .skill archive code path in the same file (lines 152-155) already handles this correctly with a try/except UnicodeDecodeError fallback to binary response.

Fix: Wrap both read_text calls in try/except UnicodeDecodeError that fall through to the binary Response path, consistent with the existing .skill archive handling.

2. str_replace tool: enforce single-occurrence when replace_all=False

File: backend/packages/harness/deerflow/sandbox/tools.py

The str_replace_tool docstring states: "the substring to replace must appear exactly once in the file" when replace_all is False. However, the implementation uses content.replace(old_str, new_str, 1) which silently replaces only the first occurrence regardless of how many exist. This can lead to unexpected edits when the agent intends to make a precise single replacement.

Fix: Add a count check before replacing. If content.count(old_str) > 1, return an error message telling the agent the string appears multiple times and suggesting either replace_all=True or a more specific string.

Testing

All 2792 existing tests pass
ruff check passes
ruff format --check passes

…tool - Handle UnicodeDecodeError in artifact serving for non-UTF-8 text files, falling back to binary response instead of raising 500 error - Add occurrence count validation in str_replace tool when replace_all is False, returning clear error when substring appears multiple times Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>

CLAassistant · 2026-04-30T04:25:26Z

All committers have signed the CLA.

Copilot

Pull request overview

Improves backend robustness by preventing unhandled decode errors when serving artifact files and by aligning the str_replace sandbox tool’s behavior with its “single occurrence” contract.

Changes:

Catch UnicodeDecodeError when serving artifacts that look like text but aren’t valid UTF-8, falling back to a binary response.
In str_replace (when replace_all=False), count occurrences and return an error if the target string appears more than once.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
`backend/packages/harness/deerflow/sandbox/tools.py`	Enforces single-occurrence replacement when `replace_all=False` by adding a count check and error message.
`backend/app/gateway/routers/artifacts.py`	Wraps UTF-8 `read_text()` calls with `UnicodeDecodeError` handling to avoid 500s on non-UTF-8 “text-like” files.

+        try:
+            return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type)
+        except UnicodeDecodeError:
+            pass  # Fall through to binary response

    if is_text_file_by_content(actual_path):
-        return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type)
+        try:
+            return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type)
+        except UnicodeDecodeError:
+            pass  # Fall through to binary response


    if mime_type and mime_type.startswith("text/"):
-        return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type)
+        try:
+            return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type)
+        except UnicodeDecodeError:


            else:
+                count = content.count(old_str)
+                if count > 1:
+                    return f"Error: The string to replace appears {count} times in {requested_path}. Use replace_all=True to replace all occurrences, or provide a more specific string that appears exactly once."
                content = content.replace(old_str, new_str, 1)


+                count = content.count(old_str)
+                if count > 1:
+                    return f"Error: The string to replace appears {count} times in {requested_path}. Use replace_all=True to replace all occurrences, or provide a more specific string that appears exactly once."


    if mime_type and mime_type.startswith("text/"):
-        return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type)
+        try:
+            return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type)
+        except UnicodeDecodeError:
+            pass  # Fall through to binary response

    if is_text_file_by_content(actual_path):
-        return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type)
+        try:
+            return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type)
+        except UnicodeDecodeError:
+            pass  # Fall through to binary response


WillemJiang · 2026-05-01T14:31:09Z

@zhoufengen, thanks for your contribution. Please take a look at the review comments for Copilot.

WillemJiang requested a review from Copilot May 1, 2026 07:53

Copilot started reviewing on behalf of WillemJiang May 1, 2026 07:53 View session

Copilot AI reviewed May 1, 2026

View reviewed changes

WillemJiang added the question Further information is requested label May 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(backend): handle UnicodeDecodeError and validate str_replace count#2665

fix(backend): handle UnicodeDecodeError and validate str_replace count#2665
zhoufengen wants to merge 1 commit intobytedance:mainfrom
zhoufengen:fix/backend-robustness-improvements

zhoufengen commented Apr 30, 2026

Uh oh!

CLAassistant commented Apr 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

WillemJiang commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

zhoufengen commented Apr 30, 2026

Summary

1. Artifact serving: handle UnicodeDecodeError for non-UTF-8 text files

2. str_replace tool: enforce single-occurrence when replace_all=False

Testing

Uh oh!

CLAassistant commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

WillemJiang commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CLAassistant commented Apr 30, 2026 •

edited

Loading