Pop from empty list, crash in /webscrape #108

KastanDay · 2023-10-09T22:31:52Z

I got it by scraping this: https://ncsa-delta-doc.readthedocs-hosted.com/en/latest/index.html

Heads up a new bug:
File "/app/ai_ta_backend/web_scrape.py", line 450, in breadth_crawler
url = self.queue[depth].pop(0)
IndexError: pop from empty list

Full error:

2023-10-09 22:29:57,249:ERROR - Exception on /web-scrape [GET]

Traceback (most recent call last):

File "/opt/venv/lib/python3.8/site-packages/[flask](https://railway.app/project/214c0077-af58-4a32-a88d-64ede781eee9/logs?filter=%40service%3A14b25553-ea73-47f6-97a6-efa0fa9aa170&range=12h)/app.py", line 2190, in wsgi_app

response = self.full_dispatch_request()

File "/opt/venv/lib/python3.8/site-packages/[flask](https://railway.app/project/214c0077-af58-4a32-a88d-64ede781eee9/logs?filter=%40service%3A14b25553-ea73-47f6-97a6-efa0fa9aa170&range=12h)/app.py", line 1486, in full_dispatch_request

rv = self.handle_user_exception(e)

File "/opt/venv/lib/python3.8/site-packages/[flask](https://railway.app/project/214c0077-af58-4a32-a88d-64ede781eee9/logs?filter=%40service%3A14b25553-ea73-47f6-97a6-efa0fa9aa170&range=12h)_cors/extension.py", line 176, in wrapped_function

return cors_after_request(app.make_response(f(*args, **kwargs)))

File "/opt/venv/lib/python3.8/site-packages/[flask](https://railway.app/project/214c0077-af58-4a32-a88d-64ede781eee9/logs?filter=%40service%3A14b25553-ea73-47f6-97a6-efa0fa9aa170&range=12h)/app.py", line 1484, in full_dispatch_request

rv = self.dispatch_request()

File "/opt/venv/lib/python3.8/site-packages/[flask](https://railway.app/project/214c0077-af58-4a32-a88d-64ede781eee9/logs?filter=%40service%3A14b25553-ea73-47f6-97a6-efa0fa9aa170&range=12h)/app.py", line 1469, in dispatch_request

return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)

File "/app/ai_ta_backend/main.py", line 349, in scrape

success_fail_dict = scraper.main_crawler(url, course_name, max_urls, max_depth, timeout, stay_on_baseurl, depth_or_breadth)

File "/app/ai_ta_backend/web_scrape.py", line 532, in main_crawler

self.breadth_crawler(url=url, course_name=course_name, timeout=timeout, base_url_on=base_url_str, max_depth=max_depth)

File "/app/ai_ta_backend/web_scrape.py", line 450, in breadth_crawler

url = self.queue[depth].pop(0)

IndexError: pop from empty list

The text was updated successfully, but these errors were encountered:

jkmin3 · 2023-10-11T02:54:24Z

Ahh I see, I have a catch for this error now, but should we maybe create a base url input for cases like this? For example, this site might want to input this https://ncsa as the base url.

KastanDay · 2023-10-11T04:00:06Z

Interesting, I'm not sure I follow.

I thought this error occurred when the BaseURL didn't have any links on the page. So the input page has 0 additional links. Is that right?

KastanDay assigned jkmin3 Oct 9, 2023

KastanDay added the bug Something isn't working label Oct 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pop from empty list, crash in /webscrape #108

Pop from empty list, crash in /webscrape #108

KastanDay commented Oct 9, 2023

jkmin3 commented Oct 11, 2023

KastanDay commented Oct 11, 2023

Pop from empty list, crash in /webscrape #108

Pop from empty list, crash in /webscrape #108

Comments

KastanDay commented Oct 9, 2023

jkmin3 commented Oct 11, 2023

KastanDay commented Oct 11, 2023