Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel Requests - rake workarea:cache:prime_images #188

Open
GesJeremie opened this issue Oct 25, 2019 · 2 comments
Open

Parallel Requests - rake workarea:cache:prime_images #188

GesJeremie opened this issue Oct 25, 2019 · 2 comments
Labels
enhancement New feature or request

Comments

@GesJeremie
Copy link
Contributor

GesJeremie commented Oct 25, 2019

Is your feature request related to a problem? Please describe.
I'm currently running the rake task workarea:cache:prime_images on few thousands products and I'm frustrated of how slow it is.

Describe the solution you'd like
The current written code (workarea-core-3.4.16) is the following one:

namespace :workarea do
  namespace :cache do
    desc 'Prime images cache'
    task prime_images: :environment do
      include Rails.application.routes.url_helpers
      include Workarea::Storefront::ProductsHelper
      include Workarea::Core::Engine.routes.url_helpers

      built_in_jobs = [:thumb, :gif, :jpg, :png, :strip, :convert, :optimized]

      jobs = Dragonfly.app(:workarea).processor_methods.reject do |job|
        built_in_jobs.include?(job)
      end

      Workarea::Catalog::Product.all.each_by(50) do |product|
        product.images.each do |image|
          jobs.each do |job|
            url = URI.join(
              "https://#{Workarea.config.host}",
              dynamic_product_image_url(
                image.product.slug,
                image.option,
                image.id,
                job,
                only_path: true
              )
            ).to_s

            begin
              `curl #{url}`
              puts "Downloaded image #{url}"
            rescue StandardError => e
              puts e.inspect
            end
          end
        end
      end
    end
  end
end

It's basically a loop requesting a url through curl, wait for the result and go to the next record.
The obvious optimization would be to run the curl requests in parallel.

In my side projects I usually use https://github.com/typhoeus/typhoeus and his Hydra "engine" but I'm pretty sure we can come up with some bash magic and call it a day.

@GesJeremie GesJeremie added the enhancement New feature or request label Oct 25, 2019
@bencrouse
Copy link
Contributor

Thanks for the issue @GesJeremie.

I'm reticent to add another lib just for this one use case, but 2 options come to mind for parallelizing:

  • Use a Sidekiq worker
  • Use normal Ruby threads in this task

@eric-pigeon
Copy link
Contributor

ConcurrentRuby is a dependency of ActiveSupprt. There's already a CachedThreadPool available as Concurrent.global_io_executor, although a FixedThreadPool might be a better fit to throttle the number of active requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants