Migrations are primarily controlled from the Thrall dashboard, an HTTP page exposed on Thrall's domain.
Running a migration requires a lot of computation - each image in your library
must be projected, which involves downloading and reprocessing the original
image from scratch. For this reason, we suggest running a second pool of
image-loader instances reserved specifically for projection. These are usually
hosted at the loader-projection.media.
domain prefix, though you can of course
reuse your primary pool of image-loader instances by setting the
hosts.projectionPrefix
configuration option to the same value as the
hosts.loaderPrefix
option (defaults to loader.media.
). Be aware though that
doing so may cause slowdown or disruption to users uploading images. Take care
to scale whichever pool of image-loader instances to an appropriate size.
The throughput of the migration process is determined by how many image-loader
instances are in the projection pool (and the CPU/RAM/resources available to each
instance), but also how many parallel projection requests Thrall is allows to
make. This parallelism is controlled by the configuration setting
thrall.projection.parallelism
, which defaults to 1. (In other words, you
almost certainly will want to increase this value to a level that makes good use
of the available image-loader projection instances)
You will also experience an increased usage of your DynamoDB tables and Elasticsearch cluster, so make sure to watch their performance and scale both to match their usage. We recommend enabling autoscaling on all DynamoDB tables and indices where possible.
The size of the image projection ASGs are dictated by Cloudformation parameters, ProjectionServiceAutoscalingMinSize
and ProjectionServiceAutoscalingMaxSize
– alter these to scale the service.
As a baseline for running a migration, 4 Elasticsearch nodes and 7 loader-projection
instances was a good place to start for our configuration on 18/05/22, with an index of ~40,000,000 images.
A migration can be started by going to the Thrall dashboard and following the prompt to press the 'Start Migration' button. This will create a new index using the latest version of the mappings (see Mappings.scala) and then assign the "Images_Migration" alias. Thrall will then automatically begin searching for and queueing images for migration.
While a migration is running, you can track progress on the Thrall dashboard, which will display a count of images that exist in each index. The form to start a migration has been replaced with a form that will allow you to manually queue an image for migration, regardless of whether Thrall has attempted to migrate it previously.
While the migration is in progress, a form will be present on the Thrall dashboard with the option of completing the migration. You should only do this once the number of images in Images_Migration is equal to the number in Images_Current. You may optionally choose to leave some images that have failed; these will remain available for review in the list of errored images (see below).
When you submit the migration completion form, the Images_Current alias will be moved to the new index, the Images_Migration alias will be removed, and the Images_Historical alias will be added pointing to the old index. This should all happen seamlessly without impacting any concurrent uploads or edits.
Errors may occur while migrating an image --- this may include failing to project the image, failing to insert into the new index, or something else. The list of failures is available on the Thrall dashboard, behind the "View images that have failed to migrate" link.
On this page, you can see an overview of the images that have failed to migrate, grouped by the failure message. You can click through into the groups to get a full list of failed images and a button to easily retry them.
Caveat: Currently the failure messages may not be very descriptive due to how error messages are passed through Grid services. Be aware that one group of errors in the dashboard may have multiple different root causes. Try searching the logs using the image ID to find the original error, whichever service that originates from.