You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Usage of in_array() in MigrateExecutable::handleMissingSourceRows() is proving to be very inefficient for migrations with a very large amount of rows.
To Reproduce
Run any migration ID with a very large amount of rows (eg 10,000+).
While the actual migration has a progress bar and lets you know when its finished, the logic in handleMissingSourceRows() will have the process seem like its frozen for an indeterminate amount of time.
Actual behavior
Running a migration ID with many rows (in my case over 300,000 for upgrade_d7_file_private) would take roughly 20-30 minutes for the actual migration, but would hang on MigrateExecutable::handleMissingSourceRows() for multiple hours before having to manually stop the process.
Workaround
Instead of using in_array() the $allSourceIdValues property should be keyed with a unique ID in order to utilize isset()
Having a dedicated method to build the key off the source ID values can allow it to be used when writing to the $allSourceIdValues property in MigrateExecutable::onPrepareRow() and reading it within handleMissingSourceRows().
Making this change to the example above with 300k rows, brought this post-migration logic to finish within a few minutes instead of multiple hours.
The text was updated successfully, but these errors were encountered:
mdolnik
pushed a commit
to mdolnik/drush
that referenced
this issue
Sep 25, 2023
Describe the bug
Usage of
in_array()
inMigrateExecutable::handleMissingSourceRows()
is proving to be very inefficient for migrations with a very large amount of rows.To Reproduce
Run any migration ID with a very large amount of rows (eg 10,000+).
While the actual migration has a progress bar and lets you know when its finished, the logic in
handleMissingSourceRows()
will have the process seem like its frozen for an indeterminate amount of time.Actual behavior
Running a migration ID with many rows (in my case over 300,000 for
upgrade_d7_file_private
) would take roughly 20-30 minutes for the actual migration, but would hang onMigrateExecutable::handleMissingSourceRows()
for multiple hours before having to manually stop the process.Using
in_array()
can be very inefficient as it needs to compare all array values until it finds a match not to mention the current logic is trying to find an an array within an array of arrays.Workaround
Instead of using
in_array()
the$allSourceIdValues
property should be keyed with a unique ID in order to utilizeisset()
Having a dedicated method to build the key off the source ID values can allow it to be used when writing to the
$allSourceIdValues
property inMigrateExecutable::onPrepareRow()
and reading it withinhandleMissingSourceRows()
.Making this change to the example above with 300k rows, brought this post-migration logic to finish within a few minutes instead of multiple hours.
The text was updated successfully, but these errors were encountered: