feat: Increase file size limit from 25GB to 50.1GB #396

sbassam · 2025-11-07T01:52:06Z

Have you read the Contributing Guidelines?

Issue #

Describe your changes

Clearly and concisely describe what's in this pull request. Include screenshots, if necessary.

Note

Increases max file size to 50.1GB, switches multipart target part size to 250MB with sliding-window concurrency, adds progress bars to validation, and sets a download request timeout.

Uploads:
- Multipart: Set TARGET_PART_SIZE_MB to 250; keep MAX_MULTIPART_PARTS at 250.
- Implement sliding-window concurrency in MultipartUploadManager._upload_parts_concurrent to respect max_concurrent_parts while continuously feeding new parts.
- Increase max supported file size to MAX_FILE_SIZE_GB = 50.1.
Download:
- Add request_timeout=3600 to raw GET in DownloadManager.download.
File validation:
- Wrap JSONL iteration with tqdm for progress; import tqdm.
- Improve UTF-8 check to read in chunks.
Tests:
- Update expectations for part sizing/count (e.g., 500MB → 2×250MB, 50GB → ~205 parts) and size-limit error message.
- Adjust as_completed mocking to align with sliding-window behavior.

^{Written by Cursor Bugbot for commit 7ac3d44. This will update automatically on new commits. Configure here.}

nikita-smetanin

Hi Soroush, PR looks nice, I left a few suggestions :)

nikita-smetanin · 2025-11-10T11:49:02Z

src/together/utils/files.py

+        chunk_size = 8192
        with file.open(encoding="utf-8") as f:
-            f.read()
+            for chunk in iter(lambda: f.read(chunk_size), ""):


I think just do for line in f: pass would be simpler and you won't have to specified chunk size manually (it'll use line buffering but you can override it if you think we should use larger buffers). I'd also comment we just do dry-run to let file reader decode file in utf8

nikita-smetanin · 2025-11-10T12:05:54Z

src/together/filemanager.py

+
+                        # Submit next part if available
+                        if part_index < len(parts):
+                            part_info = parts[part_index]


Would be great to rewrite it to deduplicate this code piece with the one above. I think you can either make a for loop to submit tasks and wait on result if we have enough already, or use executor.map with buffersize to limit concurrent tasks.

nikita-smetanin · 2025-11-10T12:09:56Z

src/together/constants.py

-TARGET_PART_SIZE_MB = 100  # Target part size for optimal performance
-MAX_MULTIPART_PARTS = 250  # Maximum parts per upload (S3 limit)
+TARGET_PART_SIZE_MB = 250  # Target part size
+MAX_MULTIPART_PARTS = 250  # Maximum parts per upload


I see you dropped "S3 limit" mention, is it still the case?

sbassam added 2 commits November 6, 2025 17:46

feat: Increase file size limit from 25GB to 50.1GB

6b685e2

feat: Increase file size limit to 50.1GB - tests passing

3c5264f

sbassam requested review from connermanuel and nikita-smetanin November 7, 2025 03:21

sbassam marked this pull request as ready for review November 7, 2025 03:21

sbassam added 3 commits November 7, 2025 15:35

fix: Optimize memory and timeout handling for 50GB+ files

18163e0

fixed test

c11bb5a

removed unnecessary comments

7ac3d44

sbassam requested a review from vorobyov01 November 8, 2025 03:38

nikita-smetanin reviewed Nov 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Increase file size limit from 25GB to 50.1GB #396

feat: Increase file size limit from 25GB to 50.1GB #396

sbassam commented Nov 7, 2025 •

edited by cursor bot

Loading

Uh oh!

nikita-smetanin left a comment

Uh oh!

nikita-smetanin Nov 10, 2025

Uh oh!

nikita-smetanin Nov 10, 2025

Uh oh!

nikita-smetanin Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Increase file size limit from 25GB to 50.1GB #396

Are you sure you want to change the base?

feat: Increase file size limit from 25GB to 50.1GB #396

Conversation

sbassam commented Nov 7, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes

Uh oh!

nikita-smetanin left a comment

Choose a reason for hiding this comment

Uh oh!

nikita-smetanin Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

nikita-smetanin Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

nikita-smetanin Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sbassam commented Nov 7, 2025 •

edited by cursor bot

Loading