fix(backend): handle UnicodeDecodeError and validate str_replace count#2665
Open
zhoufengen wants to merge 1 commit intobytedance:mainfrom
Open
fix(backend): handle UnicodeDecodeError and validate str_replace count#2665zhoufengen wants to merge 1 commit intobytedance:mainfrom
zhoufengen wants to merge 1 commit intobytedance:mainfrom
Conversation
…tool - Handle UnicodeDecodeError in artifact serving for non-UTF-8 text files, falling back to binary response instead of raising 500 error - Add occurrence count validation in str_replace tool when replace_all is False, returning clear error when substring appears multiple times Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Contributor
There was a problem hiding this comment.
Pull request overview
Improves backend robustness by preventing unhandled decode errors when serving artifact files and by aligning the str_replace sandbox tool’s behavior with its “single occurrence” contract.
Changes:
- Catch
UnicodeDecodeErrorwhen serving artifacts that look like text but aren’t valid UTF-8, falling back to a binary response. - In
str_replace(whenreplace_all=False), count occurrences and return an error if the target string appears more than once.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
backend/packages/harness/deerflow/sandbox/tools.py |
Enforces single-occurrence replacement when replace_all=False by adding a count check and error message. |
backend/app/gateway/routers/artifacts.py |
Wraps UTF-8 read_text() calls with UnicodeDecodeError handling to avoid 500s on non-UTF-8 “text-like” files. |
Comment on lines
+178
to
+187
| try: | ||
| return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type) | ||
| except UnicodeDecodeError: | ||
| pass # Fall through to binary response | ||
|
|
||
| if is_text_file_by_content(actual_path): | ||
| return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type) | ||
| try: | ||
| return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type) | ||
| except UnicodeDecodeError: | ||
| pass # Fall through to binary response |
Comment on lines
177
to
+180
| if mime_type and mime_type.startswith("text/"): | ||
| return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type) | ||
| try: | ||
| return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type) | ||
| except UnicodeDecodeError: |
Comment on lines
1571
to
1575
| else: | ||
| count = content.count(old_str) | ||
| if count > 1: | ||
| return f"Error: The string to replace appears {count} times in {requested_path}. Use replace_all=True to replace all occurrences, or provide a more specific string that appears exactly once." | ||
| content = content.replace(old_str, new_str, 1) |
Comment on lines
+1572
to
+1574
| count = content.count(old_str) | ||
| if count > 1: | ||
| return f"Error: The string to replace appears {count} times in {requested_path}. Use replace_all=True to replace all occurrences, or provide a more specific string that appears exactly once." |
Comment on lines
177
to
+187
| if mime_type and mime_type.startswith("text/"): | ||
| return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type) | ||
| try: | ||
| return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type) | ||
| except UnicodeDecodeError: | ||
| pass # Fall through to binary response | ||
|
|
||
| if is_text_file_by_content(actual_path): | ||
| return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type) | ||
| try: | ||
| return PlainTextResponse(content=actual_path.read_text(encoding="utf-8"), media_type=mime_type) | ||
| except UnicodeDecodeError: | ||
| pass # Fall through to binary response |
Collaborator
|
@zhoufengen, thanks for your contribution. Please take a look at the review comments for Copilot. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Improve robustness in two backend components:
1. Artifact serving: handle UnicodeDecodeError for non-UTF-8 text files
File:
backend/app/gateway/routers/artifacts.pyThe
get_artifactendpoint callsactual_path.read_text(encoding="utf-8")afteris_text_file_by_content()returnsTrue. However,is_text_file_by_content()only checks for null bytes — a file can pass this check while still containing non-UTF-8 byte sequences (e.g., Latin-1, Shift-JIS encoded text files). This causes an unhandledUnicodeDecodeErrorresulting in a 500 error.The
.skillarchive code path in the same file (lines 152-155) already handles this correctly with atry/except UnicodeDecodeErrorfallback to binary response.Fix: Wrap both
read_textcalls intry/except UnicodeDecodeErrorthat fall through to the binaryResponsepath, consistent with the existing.skillarchive handling.2. str_replace tool: enforce single-occurrence when replace_all=False
File:
backend/packages/harness/deerflow/sandbox/tools.pyThe
str_replace_tooldocstring states: "the substring to replace must appear exactly once in the file" whenreplace_allisFalse. However, the implementation usescontent.replace(old_str, new_str, 1)which silently replaces only the first occurrence regardless of how many exist. This can lead to unexpected edits when the agent intends to make a precise single replacement.Fix: Add a count check before replacing. If
content.count(old_str) > 1, return an error message telling the agent the string appears multiple times and suggesting eitherreplace_all=Trueor a more specific string.Testing
ruff checkpassesruff format --checkpasses