Skip to content

Conversation

@cyppe
Copy link
Contributor

@cyppe cyppe commented Jun 20, 2025

This is a new PR due to invalid merge in PR #1005.

I also try to handle the review feedback. However I removed the cache property, as I forget to remove it. I have a proof of concept of cache but it becomes pretty messy as I need to ignore a few rules that has dependencies to data from being cached. I will open a PR with that later for feedback.

Original PR text below:

Optimize Validation for Large Nested Collections

Problem:

Validating large arrays (e.g., 5,000+ items) of nested Data objects using Data::validate() can be significantly slower than native Laravel validation when the nested Data objects do not define a dynamic rules() method. This is due to the overhead of Illuminate\Validation\Rule::forEach being used internally even for static nested rules.

Benchmark Setup:

To demonstrate the issue and the improvement, the following self-contained benchmark test code was used (within the package's test environment):

<?php

namespace Spatie\LaravelData\Tests\Benchmarks;

// Required imports for Data classes and Test
use Illuminate\Support\Facades\Validator;
use PHPUnit\Framework\Attributes\Test;
use Spatie\LaravelData\Attributes\Validation\Max; // For BenchmarkIdData
use Spatie\LaravelData\Data;
use Spatie\LaravelData\Tests\TestCase;

// --- Data Class Definitions ---

class BenchmarkIdData extends Data
{
    public function __construct(
        public int $id,
        #[Max(100)]
        public string $name,
    ) {
    }
}

class BenchmarkArrayData extends Data
{
    /** @param BenchmarkIdData[] $array */
    public function __construct(
        public array $array,
    ) {
    }
}

// --- Test Class Definition ---

class SimpleValidationBenchmarkTest extends TestCase
{
    #[Test]
    public function simple_benchmark_performance_comparison()
    {
        $count = 5000;
        $array = array_map(fn (int $id) => [
            'id' => $id,
            'name' => 'Item Name ' . $id // Add name field
        ], range(1, $count));
        $payload = ['array' => $array];

        // --- Manual Timing ---
        $runs = 3;
        $nativeTimes = [];
        $dataTimes = [];

        // Updated native rules
        $nativeRules = [
            'array' => 'required|array',
            'array.*.id' => 'required|integer',
            'array.*.name' => 'required|string|max:100', // Add rules for name
        ];

        for ($i = 0; $i < $runs; $i++) {
            $start = microtime(true);
            Validator::validate($payload, $nativeRules);
            $nativeTimes[] = microtime(true) - $start;

            $start = microtime(true);
            BenchmarkArrayData::validate($payload);
            $dataTimes[] = microtime(true) - $start;
        }

        $nativeAvg = array_sum($nativeTimes) / $runs;
        $dataAvg = array_sum($dataTimes) / $runs;
        // --- End Manual Timing ---

        // Output the results (will appear during test execution)
        $nativeMs = round($nativeAvg * 1000, 3);
        $dataMs = round($dataAvg * 1000, 3);

        echo "\n--- Simple Benchmark Results ---\n";
        echo "Native Laravel Validator (avg of {$runs} runs): {$nativeMs} ms\n";
        echo "Spatie Laravel Data (avg of {$runs} runs):    {$dataMs} ms\n";
        echo "----------------------------------\n";

        $this->assertTrue(true);
    }
}

Benchmark Results (Avg of 3 runs):

Before Optimization:

--- Simple Benchmark Results ---
Native Laravel Validator (avg of 3 runs): 719.38 ms
Spatie Laravel Data (avg of 3 runs):    24514.464 ms
----------------------------------

After Optimization:

--- Simple Benchmark Results ---
Native Laravel Validator (avg of 3 runs): 703.961 ms
Spatie Laravel Data (avg of 3 runs):    258.938 ms
----------------------------------

Performance Summary:

Scenario Native Validator (ms) Spatie Laravel Data (ms) Improvement (Spatie Data)
Before Optimization ~719.38 24514.464 -
After Optimization ~703.96 258.938 ~95x faster

Bonus: Impact of Upstream Laravel Arr::dot Optimization

It's worth noting that recent versions of Laravel (the changes from PR #55495) significantly optimized the Arr::dot method, which is used internally by the validator when handling wildcard rules.

Running the same benchmark test against a Laravel version incorporating this Arr::dot optimization shows improved performance for the Native Laravel Validator baseline, while our laravel-data optimization continues to provide a substantial benefit:

Benchmark Results (With Optimized Arr::dot):

--- Simple Benchmark Results ---
Native Laravel Validator (avg of 3 runs): 467.153 ms
Spatie Laravel Data (avg of 3 runs):    260.651 ms
----------------------------------

This demonstrates that even with the upstream improvements to Laravel's validator internals, directly bypassing Rule::forEach for static nested collections in laravel-data offers further significant performance gains for this specific use case.

Proposed Solution:

This PR modifies DataValidationRulesResolver::resolveDataCollectionSpecificRules:

  1. It checks if the nested data class (e.g., BenchmarkIdData) has a rules() method using `method_exists()$.
  2. If no rules() method exists, it iterates the collection payload and directly calls $this->execute() for each item, adding the generated rules to the main ruleset. This bypasses Rule::forEach.
  3. If a rules() method does exist, it falls back to the original Rule::forEach logic to ensure dynamic rules are handled correctly.

Testing:

All existing package tests pass with this change.

Impact:

This provides a significant performance improvement (~95x faster in the benchmark case) for validating large collections of nested data objects with static rules (defined via typehints and attributes), without affecting validation for objects using dynamic rules() methods.

Feedback:
Let me know if you have any feedback on this. I am no expert in all internal processes in this package but I am running this PR in my project and it seems fine.

@cyppe
Copy link
Contributor Author

cyppe commented Jun 21, 2025

@rubenvanassche for your review

@aa-dragosmocrii
Copy link

aa-dragosmocrii commented Jul 21, 2025

While observing the behavior of Spatie Data with an array containing 4000 items, I noticed that validating all the items at once takes about 4m:10s. If i chunk the data to contain only 10 items per chunk. it takes about 2s to validate and create data objects from all the chunks. (this is not related to this PR, just something that I thought to share in case it's helpful here)

@rubenvanassche
Copy link
Member

Great PR thanks!

@rubenvanassche rubenvanassche merged commit 6d867f7 into spatie:main Oct 16, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants