Parallel Requests - rake workarea:cache:prime_images #188

GesJeremie · 2019-10-25T08:27:57Z

Is your feature request related to a problem? Please describe.
I'm currently running the rake task workarea:cache:prime_images on few thousands products and I'm frustrated of how slow it is.

Describe the solution you'd like
The current written code (workarea-core-3.4.16) is the following one:

namespace :workarea do
  namespace :cache do
    desc 'Prime images cache'
    task prime_images: :environment do
      include Rails.application.routes.url_helpers
      include Workarea::Storefront::ProductsHelper
      include Workarea::Core::Engine.routes.url_helpers

      built_in_jobs = [:thumb, :gif, :jpg, :png, :strip, :convert, :optimized]

      jobs = Dragonfly.app(:workarea).processor_methods.reject do |job|
        built_in_jobs.include?(job)
      end

      Workarea::Catalog::Product.all.each_by(50) do |product|
        product.images.each do |image|
          jobs.each do |job|
            url = URI.join(
              "https://#{Workarea.config.host}",
              dynamic_product_image_url(
                image.product.slug,
                image.option,
                image.id,
                job,
                only_path: true
              )
            ).to_s

            begin
              `curl #{url}`
              puts "Downloaded image #{url}"
            rescue StandardError => e
              puts e.inspect
            end
          end
        end
      end
    end
  end
end

It's basically a loop requesting a url through curl, wait for the result and go to the next record.
The obvious optimization would be to run the curl requests in parallel.

In my side projects I usually use https://github.com/typhoeus/typhoeus and his Hydra "engine" but I'm pretty sure we can come up with some bash magic and call it a day.

bencrouse · 2019-10-25T13:28:03Z

Thanks for the issue @GesJeremie.

I'm reticent to add another lib just for this one use case, but 2 options come to mind for parallelizing:

Use a Sidekiq worker
Use normal Ruby threads in this task

eric-pigeon · 2019-10-25T13:43:35Z

ConcurrentRuby is a dependency of ActiveSupprt. There's already a CachedThreadPool available as Concurrent.global_io_executor, although a FixedThreadPool might be a better fit to throttle the number of active requests.

GesJeremie added the enhancement New feature or request label Oct 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel Requests - rake workarea:cache:prime_images #188

Parallel Requests - rake workarea:cache:prime_images #188

GesJeremie commented Oct 25, 2019 •

edited

Loading

bencrouse commented Oct 25, 2019

eric-pigeon commented Oct 25, 2019

Parallel Requests - rake workarea:cache:prime_images #188

Parallel Requests - rake workarea:cache:prime_images #188

Comments

GesJeremie commented Oct 25, 2019 • edited Loading

bencrouse commented Oct 25, 2019

eric-pigeon commented Oct 25, 2019

GesJeremie commented Oct 25, 2019 •

edited

Loading