Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sagemaker.local bug when inputs are binary files #4996

Open
vojavocni opened this issue Jan 17, 2025 · 1 comment
Open

sagemaker.local bug when inputs are binary files #4996

vojavocni opened this issue Jan 17, 2025 · 1 comment
Labels

Comments

@vojavocni
Copy link

vojavocni commented Jan 17, 2025

Describe the bug
Hello, I think I encountered a bug in sagemaker.local. I'm trying to test a batch transform with images as input, but I get the following error even before I reach the input_fn of my custom inference script

│   345 │   │   for element in self.splitter.split(file):
│ ❱ 346 │   │   │   if _payload_size_within_limit(buffer + element, size):
│   347 │   │   │   │   buffer += element
│   348 │   │   │   else:
│   349 │   │   │   │   tmp = buffer
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: can only concatenate str (not "bytes") to str

I am not using a splitter (splitter type is None), as it's not necessary on images.

I believe the problem is in line 343 of MultRecordStrategy class

class MultiRecordStrategy(BatchStrategy):
"""Feed multiple records at a time for batch inference.
Will group up as many records as possible within the payload specified.
"""
def pad(self, file, size=6):
"""Group together as many records as possible to fit in the specified size.
Args:
file (str): file path to read the records from.
size (int): maximum size in MB that each group of records will be
fitted to. passing 0 means unlimited size.
Returns:
generator of records
"""
buffer = ""
for element in self.splitter.split(file):
if _payload_size_within_limit(buffer + element, size):
buffer += element
else:
tmp = buffer
buffer = element
yield tmp
if _validate_payload_size(buffer, size):
yield buffer

We can see that the buffer variable is assumed to be a string, which means it's assumed that the file variable would not refer to a binary object, which should be possible.

To reproduce
Just run local batch transform with a single image as input. The model doesn't really matter I think, it will fail before any prediction or interaction between data and the model is made.

Expected behavior
I would expect the buffer to be sensitive to weather the file is a string like json or csv, or a binary type like png.

Screenshots or logs
See above.

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.237.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): Pytorch, custom inference and model
  • Framework version: 2.5.1
  • Python version: 3.11
  • CPU or GPU: Both
  • Custom Docker image (Y/N): Y, extending the pytorch-inference:2.5.1-gpu-py311-cu124-ubuntu22.04-sagemaker image

Additional context
Add any other context about the problem here.

@vojavocni vojavocni added the bug label Jan 17, 2025
@vojavocni
Copy link
Author

I have a fix in my local environment after which the whole batch transform job works:

buffer = b"" if isinstance(next(self.splitter.split(file)), bytes) else ""

So I can confirm this is the issue.
I can make a small PR if this seems ok?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant