Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 19, 2025

📄 16% (0.16x) speedup for check_disk_space in ultralytics/utils/downloads.py

⏱️ Runtime : 3.46 milliseconds 2.99 milliseconds (best of 129 runs)

📝 Explanation and details

The optimized code achieves a 15% performance improvement through several micro-optimizations focused on reducing Python overhead and attribute lookups:

Key optimizations:

  1. Eliminated generator/tuple unpacking overhead: Instead of using (x / gib for x in shutil.disk_usage(path)) which creates a generator and then unpacks it, the code now directly indexes the tuple returned by shutil.disk_usage(). This saves the generator creation cost and tuple unpacking overhead.

  2. Reduced attribute lookups: The Content-Length header is retrieved once into a local variable (data_header) before processing, rather than calling r.headers.get() twice in the original code's arithmetic expression.

  3. Pre-calculated required space: The value data * sf is computed once and stored in required_space, eliminating repeated multiplication operations in both the comparison and error message formatting.

  4. Optimized None handling: Added explicit None check for Content-Length header to avoid unnecessary int(0) conversion when the header is missing.

Performance impact: The line profiler shows the most significant gains in the disk usage processing (lines that went from 5.49ms to 2.71ms combined) and header processing sections. The optimizations are particularly effective for scenarios with:

  • Sufficient disk space cases (16-19% faster in tests)
  • Missing Content-Length headers (16% faster)
  • Repeated calls (19% faster for 100 iterations)

Hot path relevance: Since check_disk_space is called from safe_download() before every file download in the Ultralytics framework, these micro-optimizations will accumulate meaningful time savings across multiple downloads, model loading, and dataset preparation workflows where the function may be invoked hundreds of times.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 2747 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import shutil
from pathlib import Path

# imports
import pytest

# function to test
import requests
from ultralytics.utils.downloads import check_disk_space

# ===========================
# UNIT TESTS FOR check_disk_space
# ===========================


# Helper: Patch requests.head and shutil.disk_usage for controlled tests
class DummyResponse:
    def __init__(self, status_code=200, headers=None, reason="OK"):
        self.status_code = status_code
        self.headers = headers or {}
        self.reason = reason


@pytest.mark.parametrize(
    "file_size_bytes,free_bytes,sf,expected",
    [
        # Basic: file size 1GB, free 10GB, sf=1.5 -> True
        (1 << 30, 10 << 30, 1.5, True),
        # Basic: file size 2GB, free 5GB, sf=1.5 -> True
        (2 << 30, 5 << 30, 1.5, True),
        # Basic: file size 2GB, free 2GB, sf=1.5 -> False (hard=False)
        (2 << 30, 2 << 30, 1.5, False),
        # Edge: file size 0, free 1GB, sf=1.5 -> True (zero size)
        (0, 1 << 30, 1.5, True),
        # Edge: file size 1GB, free 1.5GB, sf=1.5 -> False (exactly at limit)
        (1 << 30, int(1.5 * (1 << 30)), 1.5, False),
        # Edge: file size 1GB, free just above requirement, sf=1.5 -> True
        (1 << 30, int(1.51 * (1 << 30)), 1.5, True),
        # Large: file size 500GB, free 1000GB, sf=1.5 -> False
        (500 << 30, 1000 << 30, 1.5, False),
        # Large: file size 100GB, free 1000GB, sf=1.5 -> True
        (100 << 30, 1000 << 30, 1.5, True),
    ],
)
def test_check_disk_space_basic_and_edge(monkeypatch, file_size_bytes, free_bytes, sf, expected):
    """Test basic and edge scenarios with various file sizes and free disk space."""

    # Patch requests.head to return controlled Content-Length
    def dummy_head(url):
        return DummyResponse(status_code=200, headers={"Content-Length": str(file_size_bytes)})

    # Patch shutil.disk_usage to return controlled free space
    def dummy_disk_usage(path):
        # total, used, free
        return (free_bytes * 2, free_bytes, free_bytes)

    monkeypatch.setattr(requests, "head", dummy_head)
    monkeypatch.setattr(shutil, "disk_usage", dummy_disk_usage)

    if expected is True:
        codeflash_output = check_disk_space(
            url="http://example.com/file", path="/tmp", sf=sf, hard=False
        )  # 103μs -> 97.4μs (6.49% faster)
    else:
        codeflash_output = check_disk_space(url="http://example.com/file", path="/tmp", sf=sf, hard=False)


def test_check_disk_space_hard_raises(monkeypatch):
    """Test that MemoryError is raised when hard=True and insufficient space."""

    file_size_bytes = 2 << 30  # 2GB
    free_bytes = 2 << 30  # 2GB

    def dummy_head(url):
        return DummyResponse(status_code=200, headers={"Content-Length": str(file_size_bytes)})

    def dummy_disk_usage(path):
        return (free_bytes * 2, free_bytes, free_bytes)

    monkeypatch.setattr(requests, "head", dummy_head)
    monkeypatch.setattr(shutil, "disk_usage", dummy_disk_usage)

    with pytest.raises(MemoryError) as excinfo:
        check_disk_space(
            url="http://example.com/file", path="/tmp", sf=1.5, hard=True
        )  # 7.12μs -> 6.29μs (13.2% faster)


def test_check_disk_space_url_error(monkeypatch):
    """Test that function returns True if requests.head fails (network error)."""

    def dummy_head(url):
        raise requests.ConnectionError("Network down")

    monkeypatch.setattr(requests, "head", dummy_head)

    # Should return True regardless of disk space, since URL check failed
    codeflash_output = check_disk_space(
        url="http://badurl", path="/tmp", sf=1.5, hard=True
    )  # 4.49μs -> 4.51μs (0.311% slower)


def test_check_disk_space_http_error(monkeypatch):
    """Test that function returns True if requests.head returns error status."""

    def dummy_head(url):
        return DummyResponse(status_code=404, headers={}, reason="Not Found")

    monkeypatch.setattr(requests, "head", dummy_head)

    # Should return True because of HTTP error
    codeflash_output = check_disk_space(
        url="http://badurl", path="/tmp", sf=1.5, hard=True
    )  # 3.15μs -> 3.02μs (4.33% faster)


def test_check_disk_space_no_content_length(monkeypatch):
    """Test that function handles missing Content-Length gracefully."""

    def dummy_head(url):
        return DummyResponse(status_code=200, headers={})

    def dummy_disk_usage(path):
        # Simulate 10GB free
        return (20 << 30, 10 << 30, 10 << 30)

    monkeypatch.setattr(requests, "head", dummy_head)
    monkeypatch.setattr(shutil, "disk_usage", dummy_disk_usage)

    # Content-Length missing, so file size is 0, always enough space
    codeflash_output = check_disk_space(
        url="http://example.com/file", path="/tmp", sf=1.5, hard=True
    )  # 3.57μs -> 3.07μs (16.3% faster)


def test_check_disk_space_path_types(monkeypatch):
    """Test that function accepts both str and Path for path argument."""

    file_size_bytes = 1 << 30  # 1GB

    def dummy_head(url):
        return DummyResponse(status_code=200, headers={"Content-Length": str(file_size_bytes)})

    def dummy_disk_usage(path):
        return (10 << 30, 5 << 30, 5 << 30)

    monkeypatch.setattr(requests, "head", dummy_head)
    monkeypatch.setattr(shutil, "disk_usage", dummy_disk_usage)

    # Path as string
    codeflash_output = check_disk_space(
        url="http://example.com/file", path="/tmp", sf=1.5, hard=False
    )  # 4.06μs -> 3.59μs (12.9% faster)
    # Path as Path object
    codeflash_output = check_disk_space(
        url="http://example.com/file", path=Path("/tmp"), sf=1.5, hard=False
    )  # 2.46μs -> 2.18μs (12.8% faster)


def test_check_disk_space_large_scale(monkeypatch):
    """Test performance/scalability with large file and disk sizes."""

    file_size_bytes = 900 << 30  # 900GB
    free_bytes = 950 << 30  # 950GB

    def dummy_head(url):
        return DummyResponse(status_code=200, headers={"Content-Length": str(file_size_bytes)})

    def dummy_disk_usage(path):
        return (free_bytes * 2, free_bytes, free_bytes)

    monkeypatch.setattr(requests, "head", dummy_head)
    monkeypatch.setattr(shutil, "disk_usage", dummy_disk_usage)

    # 900GB * 1.5 = 1350GB required, but only 950GB free, should fail
    codeflash_output = check_disk_space(
        url="http://example.com/largefile", path="/tmp", sf=1.5, hard=False
    )  # 37.0μs -> 35.7μs (3.75% faster)


def test_check_disk_space_large_number_of_calls(monkeypatch):
    """Test function stability under repeated calls (scalability)."""

    file_size_bytes = 1 << 30  # 1GB
    free_bytes = 10 << 30  # 10GB

    def dummy_head(url):
        return DummyResponse(status_code=200, headers={"Content-Length": str(file_size_bytes)})

    def dummy_disk_usage(path):
        return (free_bytes * 2, free_bytes, free_bytes)

    monkeypatch.setattr(requests, "head", dummy_head)
    monkeypatch.setattr(shutil, "disk_usage", dummy_disk_usage)

    for _ in range(100):  # up to 100 calls to simulate load
        codeflash_output = check_disk_space(
            url="http://example.com/file", path="/tmp", sf=1.5, hard=False
        )  # 125μs -> 104μs (19.4% faster)


def test_check_disk_space_custom_sf(monkeypatch):
    """Test that custom safety factor is respected."""

    file_size_bytes = 1 << 30  # 1GB
    free_bytes = int(1.1 * (1 << 30))  # 1.1GB

    def dummy_head(url):
        return DummyResponse(status_code=200, headers={"Content-Length": str(file_size_bytes)})

    def dummy_disk_usage(path):
        return (free_bytes * 2, free_bytes, free_bytes)

    monkeypatch.setattr(requests, "head", dummy_head)
    monkeypatch.setattr(shutil, "disk_usage", dummy_disk_usage)

    # sf=1.0 (should pass)
    codeflash_output = check_disk_space(
        url="http://example.com/file", path="/tmp", sf=1.0, hard=False
    )  # 3.77μs -> 3.29μs (14.5% faster)
    # sf=1.2 (should fail)
    codeflash_output = check_disk_space(
        url="http://example.com/file", path="/tmp", sf=1.2, hard=False
    )  # 32.5μs -> 31.7μs (2.66% faster)


def test_check_disk_space_zero_free(monkeypatch):
    """Test when disk has zero free space."""

    file_size_bytes = 1 << 30  # 1GB
    free_bytes = 0

    def dummy_head(url):
        return DummyResponse(status_code=200, headers={"Content-Length": str(file_size_bytes)})

    def dummy_disk_usage(path):
        return (file_size_bytes, file_size_bytes, free_bytes)

    monkeypatch.setattr(requests, "head", dummy_head)
    monkeypatch.setattr(shutil, "disk_usage", dummy_disk_usage)

    codeflash_output = check_disk_space(
        url="http://example.com/file", path="/tmp", sf=1.5, hard=False
    )  # 29.3μs -> 28.3μs (3.42% faster)


def test_check_disk_space_negative_sf(monkeypatch):
    """Test negative safety factor (should always succeed unless file size is negative)."""

    file_size_bytes = 1 << 30  # 1GB
    free_bytes = 0

    def dummy_head(url):
        return DummyResponse(status_code=200, headers={"Content-Length": str(file_size_bytes)})

    def dummy_disk_usage(path):
        return (file_size_bytes, file_size_bytes, free_bytes)

    monkeypatch.setattr(requests, "head", dummy_head)
    monkeypatch.setattr(shutil, "disk_usage", dummy_disk_usage)

    # Negative sf means required space is negative, so always enough
    codeflash_output = check_disk_space(
        url="http://example.com/file", path="/tmp", sf=-1.0, hard=False
    )  # 3.52μs -> 3.35μs (5.20% faster)


def test_check_disk_space_non_integer_content_length(monkeypatch):
    """Test non-integer Content-Length header."""

    def dummy_head(url):
        return DummyResponse(status_code=200, headers={"Content-Length": "notanumber"})

    def dummy_disk_usage(path):
        return (10 << 30, 5 << 30, 5 << 30)

    monkeypatch.setattr(requests, "head", dummy_head)
    monkeypatch.setattr(shutil, "disk_usage", dummy_disk_usage)

    # Should raise ValueError when trying to convert Content-Length
    with pytest.raises(ValueError):
        check_disk_space(
            url="http://example.com/file", path="/tmp", sf=1.5, hard=True
        )  # 4.12μs -> 4.88μs (15.6% slower)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import shutil
from pathlib import Path

# imports
import pytest
import requests
from ultralytics.utils.downloads import check_disk_space

# --- Unit Tests ---


# Helper: monkeypatch requests.head and shutil.disk_usage
class DummyResponse:
    def __init__(self, status_code=200, headers=None, reason="OK"):
        self.status_code = status_code
        self.headers = headers or {}
        self.reason = reason


@pytest.fixture
def monkeypatch_requests_head(monkeypatch):
    def _patch(status_code=200, content_length=1024 * 1024 * 10, reason="OK"):
        def dummy_head(url):
            return DummyResponse(
                status_code=status_code, headers={"Content-Length": str(content_length)}, reason=reason
            )

        monkeypatch.setattr(requests, "head", dummy_head)

    return _patch


@pytest.fixture
def monkeypatch_disk_usage(monkeypatch):
    def _patch(total, used, free):
        def dummy_disk_usage(path):
            return (total, used, free)

        monkeypatch.setattr(shutil, "disk_usage", dummy_disk_usage)

    return _patch


# --- Basic Test Cases ---


def test_sufficient_space(monkeypatch_requests_head, monkeypatch_disk_usage):
    # File size: 10MB, Disk free: 1GB, sf=1.5
    monkeypatch_requests_head(content_length=10 * 1024 * 1024)
    monkeypatch_disk_usage(total=2 << 30, used=1 << 30, free=1 << 30)  # 1GB free
    codeflash_output = check_disk_space(sf=1.5, hard=True)  # 4.76μs -> 4.34μs (9.72% faster)


def test_insufficient_space_hard(monkeypatch_requests_head, monkeypatch_disk_usage):
    # File size: 1GB, Disk free: 1GB, sf=2.0, hard=True
    monkeypatch_requests_head(content_length=1 << 30)
    monkeypatch_disk_usage(total=2 << 30, used=1 << 30, free=1 << 30)  # 1GB free
    with pytest.raises(MemoryError) as excinfo:
        check_disk_space(sf=2.0, hard=True)  # 6.89μs -> 6.18μs (11.4% faster)


def test_insufficient_space_soft(monkeypatch_requests_head, monkeypatch_disk_usage):
    # File size: 1GB, Disk free: 1GB, sf=2.0, hard=False
    monkeypatch_requests_head(content_length=1 << 30)
    monkeypatch_disk_usage(total=2 << 30, used=1 << 30, free=1 << 30)
    codeflash_output = check_disk_space(sf=2.0, hard=False)  # 36.5μs -> 34.8μs (4.85% faster)


def test_url_not_found(monkeypatch, monkeypatch_disk_usage):
    # URL returns 404, should raise AssertionError and return True (except block)
    def dummy_head(url):
        raise requests.ConnectionError("Not found")

    monkeypatch.setattr(requests, "head", dummy_head)
    monkeypatch_disk_usage(total=2 << 30, used=1 << 30, free=1 << 30)
    codeflash_output = check_disk_space(sf=1.5, hard=True)  # 4.56μs -> 4.50μs (1.31% faster)


def test_url_status_code_error(monkeypatch, monkeypatch_disk_usage):
    # URL returns 404 status code, should raise AssertionError and return True (except block)
    def dummy_head(url):
        return DummyResponse(status_code=404, headers={"Content-Length": "1024"}, reason="Not Found")

    monkeypatch.setattr(requests, "head", dummy_head)
    monkeypatch_disk_usage(total=2 << 30, used=1 << 30, free=1 << 30)
    codeflash_output = check_disk_space(sf=1.5, hard=True)  # 3.24μs -> 3.49μs (7.05% slower)


def test_zero_file_size(monkeypatch_requests_head, monkeypatch_disk_usage):
    # File size: 0, Disk free: 1GB, should always pass
    monkeypatch_requests_head(content_length=0)
    monkeypatch_disk_usage(total=2 << 30, used=1 << 30, free=1 << 30)
    codeflash_output = check_disk_space(sf=1.5, hard=True)  # 4.90μs -> 4.20μs (16.7% faster)


def test_zero_disk_space(monkeypatch_requests_head, monkeypatch_disk_usage):
    # File size: 1GB, Disk free: 0GB, should fail
    monkeypatch_requests_head(content_length=1 << 30)
    monkeypatch_disk_usage(total=2 << 30, used=2 << 30, free=0)
    with pytest.raises(MemoryError):
        check_disk_space(sf=1.5, hard=True)  # 7.53μs -> 6.79μs (10.8% faster)


def test_non_default_path(monkeypatch_requests_head, monkeypatch_disk_usage):
    # Test with a custom path
    monkeypatch_requests_head(content_length=10 * 1024 * 1024)
    monkeypatch_disk_usage(total=2 << 30, used=1 << 30, free=1 << 30)
    codeflash_output = check_disk_space(path="/tmp", sf=1.5, hard=True)  # 4.62μs -> 4.19μs (10.3% faster)


# --- Edge Test Cases ---


def test_large_file_small_disk(monkeypatch_requests_head, monkeypatch_disk_usage):
    # File size: 10GB, Disk free: 5GB, sf=1.0
    monkeypatch_requests_head(content_length=10 << 30)
    monkeypatch_disk_usage(total=20 << 30, used=15 << 30, free=5 << 30)
    with pytest.raises(MemoryError):
        check_disk_space(sf=1.0, hard=True)  # 6.98μs -> 6.38μs (9.40% faster)


def test_large_file_large_disk(monkeypatch_requests_head, monkeypatch_disk_usage):
    # File size: 10GB, Disk free: 100GB, sf=1.0
    monkeypatch_requests_head(content_length=10 << 30)
    monkeypatch_disk_usage(total=200 << 30, used=100 << 30, free=100 << 30)
    codeflash_output = check_disk_space(sf=1.0, hard=True)  # 4.46μs -> 3.84μs (16.2% faster)


def test_missing_content_length(monkeypatch, monkeypatch_disk_usage):
    # No Content-Length header, should treat as size 0
    def dummy_head(url):
        return DummyResponse(status_code=200, headers={}, reason="OK")

    monkeypatch.setattr(requests, "head", dummy_head)
    monkeypatch_disk_usage(total=2 << 30, used=1 << 30, free=1 << 30)
    codeflash_output = check_disk_space(sf=1.5, hard=True)  # 4.58μs -> 3.93μs (16.5% faster)


def test_negative_content_length(monkeypatch, monkeypatch_disk_usage):
    # Negative Content-Length, should treat as negative size
    def dummy_head(url):
        return DummyResponse(status_code=200, headers={"Content-Length": "-1024"}, reason="OK")

    monkeypatch.setattr(requests, "head", dummy_head)
    monkeypatch_disk_usage(total=2 << 30, used=1 << 30, free=1 << 30)
    codeflash_output = check_disk_space(sf=1.5, hard=True)  # 4.27μs -> 3.67μs (16.1% faster)


def test_non_numeric_content_length(monkeypatch, monkeypatch_disk_usage):
    # Non-numeric Content-Length, should treat as 0
    def dummy_head(url):
        return DummyResponse(status_code=200, headers={"Content-Length": "abc"}, reason="OK")

    monkeypatch.setattr(requests, "head", dummy_head)
    monkeypatch_disk_usage(total=2 << 30, used=1 << 30, free=1 << 30)
    # Should raise ValueError on int("abc"), but function doesn't catch so will propagate
    with pytest.raises(ValueError):
        check_disk_space(sf=1.5, hard=True)  # 4.65μs -> 5.22μs (10.9% slower)


def test_path_as_path_object(monkeypatch_requests_head, monkeypatch_disk_usage):
    # Path argument as pathlib.Path
    monkeypatch_requests_head(content_length=10 * 1024 * 1024)
    monkeypatch_disk_usage(total=2 << 30, used=1 << 30, free=1 << 30)
    codeflash_output = check_disk_space(path=Path("/tmp"), sf=1.5, hard=True)  # 4.83μs -> 4.18μs (15.8% faster)


def test_path_as_string(monkeypatch_requests_head, monkeypatch_disk_usage):
    # Path argument as string
    monkeypatch_requests_head(content_length=10 * 1024 * 1024)
    monkeypatch_disk_usage(total=2 << 30, used=1 << 30, free=1 << 30)
    codeflash_output = check_disk_space(path="/tmp", sf=1.5, hard=True)  # 4.46μs -> 3.87μs (15.1% faster)


def test_zero_safety_factor(monkeypatch_requests_head, monkeypatch_disk_usage):
    # sf=0, always enough space
    monkeypatch_requests_head(content_length=1 << 30)
    monkeypatch_disk_usage(total=2 << 30, used=1 << 30, free=1 << 30)
    codeflash_output = check_disk_space(sf=0, hard=True)  # 4.59μs -> 3.92μs (17.1% faster)


def test_negative_safety_factor(monkeypatch_requests_head, monkeypatch_disk_usage):
    # sf<0, always enough space
    monkeypatch_requests_head(content_length=1 << 30)
    monkeypatch_disk_usage(total=2 << 30, used=1 << 30, free=1 << 30)
    codeflash_output = check_disk_space(sf=-1, hard=True)  # 4.35μs -> 3.72μs (16.9% faster)


# --- Large Scale Test Cases ---


@pytest.mark.parametrize("n_files", [10, 100, 500, 999])
def test_large_scale_many_files(monkeypatch, monkeypatch_disk_usage, n_files):
    # Simulate checking space for many files in sequence (scalability)
    file_size = 10 * 1024 * 1024  # 10MB
    free_space = 20 << 30  # 20GB
    monkeypatch_disk_usage(total=40 << 30, used=20 << 30, free=free_space)

    # Patch requests.head to always return the same file size
    def dummy_head(url):
        return DummyResponse(status_code=200, headers={"Content-Length": str(file_size)}, reason="OK")

    monkeypatch.setattr(requests, "head", dummy_head)
    for i in range(n_files):
        codeflash_output = check_disk_space(
            url=f"http://test.com/file{i}.zip", sf=1.0, hard=True
        )  # 1.84ms -> 1.58ms (16.4% faster)


def test_large_file_low_safety_factor(monkeypatch_requests_head, monkeypatch_disk_usage):
    # Large file, low safety factor, should pass if free >= file size
    monkeypatch_requests_head(content_length=900 << 20)  # ~900MB
    monkeypatch_disk_usage(total=2 << 30, used=1 << 30, free=1 << 30)
    codeflash_output = check_disk_space(sf=1.0, hard=True)  # 4.67μs -> 4.11μs (13.4% faster)


def test_large_file_high_safety_factor(monkeypatch_requests_head, monkeypatch_disk_usage):
    # Large file, high safety factor, should fail if free < file_size * sf
    monkeypatch_requests_head(content_length=900 << 20)  # ~900MB
    monkeypatch_disk_usage(total=2 << 30, used=1 << 30, free=1 << 30)
    with pytest.raises(MemoryError):
        check_disk_space(sf=2.0, hard=True)  # 7.40μs -> 6.82μs (8.40% faster)


def test_many_small_files(monkeypatch, monkeypatch_disk_usage):
    # Simulate many small files, each check should pass
    file_size = 1 * 1024 * 1024  # 1MB
    free_space = 999 * file_size * 2  # Enough for 999 files with sf=2.0
    monkeypatch_disk_usage(total=free_space * 2, used=free_space, free=free_space)

    def dummy_head(url):
        return DummyResponse(status_code=200, headers={"Content-Length": str(file_size)}, reason="OK")

    monkeypatch.setattr(requests, "head", dummy_head)
    for i in range(999):
        codeflash_output = check_disk_space(
            url=f"http://test.com/file{i}.zip", sf=2.0, hard=True
        )  # 1.13ms -> 966μs (16.9% faster)


def test_large_file_exact_limit(monkeypatch_requests_head, monkeypatch_disk_usage):
    # File size: 1GB, sf=1.5, free=1.5GB, should pass
    monkeypatch_requests_head(content_length=1 << 30)
    monkeypatch_disk_usage(total=2 << 30, used=0.5 << 30, free=1.5 << 30)
    codeflash_output = check_disk_space(sf=1.5, hard=True)


def test_large_file_just_below_limit(monkeypatch_requests_head, monkeypatch_disk_usage):
    # File size: 1GB, sf=1.5, free=1.499GB, should fail
    monkeypatch_requests_head(content_length=1 << 30)
    monkeypatch_disk_usage(total=2 << 30, used=0.501 << 30, free=int(1.499 * (1 << 30)))
    with pytest.raises(MemoryError):
        check_disk_space(sf=1.5, hard=True)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-check_disk_space-mi5v6ldg and push.

Codeflash Static Badge

The optimized code achieves a 15% performance improvement through several micro-optimizations focused on reducing Python overhead and attribute lookups:

**Key optimizations:**

1. **Eliminated generator/tuple unpacking overhead**: Instead of using `(x / gib for x in shutil.disk_usage(path))` which creates a generator and then unpacks it, the code now directly indexes the tuple returned by `shutil.disk_usage()`. This saves the generator creation cost and tuple unpacking overhead.

2. **Reduced attribute lookups**: The `Content-Length` header is retrieved once into a local variable (`data_header`) before processing, rather than calling `r.headers.get()` twice in the original code's arithmetic expression.

3. **Pre-calculated required space**: The value `data * sf` is computed once and stored in `required_space`, eliminating repeated multiplication operations in both the comparison and error message formatting.

4. **Optimized None handling**: Added explicit None check for `Content-Length` header to avoid unnecessary `int(0)` conversion when the header is missing.

**Performance impact**: The line profiler shows the most significant gains in the disk usage processing (lines that went from 5.49ms to 2.71ms combined) and header processing sections. The optimizations are particularly effective for scenarios with:
- Sufficient disk space cases (16-19% faster in tests)
- Missing Content-Length headers (16% faster) 
- Repeated calls (19% faster for 100 iterations)

**Hot path relevance**: Since `check_disk_space` is called from `safe_download()` before every file download in the Ultralytics framework, these micro-optimizations will accumulate meaningful time savings across multiple downloads, model loading, and dataset preparation workflows where the function may be invoked hundreds of times.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 19, 2025 10:33
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant