-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scrapers Detail Admins Page load too slow #90
Comments
Hi @bezkos, what do you mean with "change one of them" respectively what are you changing/doing for the detail pages to load so slow? |
In scrapers page, i have around 100 scrapers. I need to change xpath for example in 1 of them then it needs around 35 sec to load this page with 5790 SQL Queries and 5780 duplicates. SELECT "dynamic_scraper_scrapedobjclass"."id", "dynamic_scraper_scrapedobjclass"."name", "dynamic_scraper_scrapedobjclass"."scraper_scheduler_conf", "dynamic_scraper_scrapedobjclass"."checker_scheduler_conf", "dynamic_scraper_scrapedobjclass"."comments" FROM "dynamic_scraper_scrapedobjclass" WHERE "dynamic_scraper_scrapedobjclass"."id" = 104 SELECT "dynamic_scraper_scrapedobjclass"."id", "dynamic_scraper_scrapedobjclass"."name", "dynamic_scraper_scrapedobjclass"."scraper_scheduler_conf", "dynamic_scraper_scrapedobjclass"."checker_scheduler_conf", "dynamic_scraper_scrapedobjclass"."comments" FROM "dynamic_scraper_scrapedobjclass" WHERE "dynamic_scraper_scrapedobjclass"."id" = 89 SELECT "dynamic_scraper_scrapedobjclass"."id", "dynamic_scraper_scrapedobjclass"."name", "dynamic_scraper_scrapedobjclass"."scraper_scheduler_conf", "dynamic_scraper_scrapedobjclass"."checker_scheduler_conf", "dynamic_scraper_scrapedobjclass"."comments" FROM "dynamic_scraper_scrapedobjclass" WHERE "dynamic_scraper_scrapedobjclass"."id" = 27 |
Ah, I thought you meant the detail pages of the websites you are going to scrape. With detail page do you mean the edit form page of a scraper in the admin? or do you mean the overview site with all the scrapers? Have you got one Scraped Obj Class for every scraper? And how many Scraped Obj Classes have you got? |
Can you make a test and edit the C:\venv27\lib\site-packages\dynamic_scraper/models.py file and remove the part of the returned name being in parantheses, both in line 55 and line 203? So just leave |
Yes i did the test and it fixes the problem. SELECT "dynamic_scraper_scrapedobjattr"."id", "dynamic_scraper_scrapedobjattr"."name", "dynamic_scraper_scrapedobjattr"."order", "dynamic_scraper_scrapedobjattr"."obj_class_id", "dynamic_scraper_scrapedobjattr"."attr_type", "dynamic_scraper_scrapedobjattr"."id_field", "dynamic_scraper_scrapedobjattr"."save_to_db" FROM "dynamic_scraper_scrapedobjattr" ORDER BY "dynamic_scraper_scrapedobjattr"."order" ASC I have around 100 obj classes and 110 scrappers. |
This was actually trickier than I though, experimented with 2-3 different things, all not completely satisfying (thought I could quickly fix this since I'm doing a minor release today anyhow). I actually need the complete names otherwise users get confused when selecting the scraped object attributes for the scraper, so simplify the naming is not an option. I also experimented with simple caching of the name which also didn't work. Limit the choices to only the attributes of the corresponding scraped object class is also trickier than one might think, since the object class is not determined yet when adding a new scraper or adding new scraper elems. I have now added such a limitation, but this works only for already saved scrapers for already added attributes. Let me know if this improves the performance situation for you. Otherwise you will have to monkey patch this for yourself in your installed DDS version. Greetings |
Ok @holgerd77 i found a way to reduce 75% time and queries.
|
And my last update with no duplicates and <1 sec load (from 35secs) @python_2_unicode_compatible
|
If you have many scrapers (around 100) and u try to change one of them then detail page load too slow (around 35 sec).
I try to inspect problem with DjDT and i see that there are 5790 SQL Queries to load this page and from them 5780 duplicates. I know u cant use select_related or prefetch_related like in Django to eliminate this problem but i think its important to fix it cause DDS with many scrapers is unusable atm.
The text was updated successfully, but these errors were encountered: