You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Heads up a new bug:
File "/app/ai_ta_backend/web_scrape.py", line 450, in breadth_crawler
url = self.queue[depth].pop(0)
IndexError: pop from empty list
Full error:
2023-10-09 22:29:57,249:ERROR - Exception on /web-scrape [GET]
Traceback (most recent call last):
File "/opt/venv/lib/python3.8/site-packages/[flask](https://railway.app/project/214c0077-af58-4a32-a88d-64ede781eee9/logs?filter=%40service%3A14b25553-ea73-47f6-97a6-efa0fa9aa170&range=12h)/app.py", line 2190, in wsgi_app
response = self.full_dispatch_request()
File "/opt/venv/lib/python3.8/site-packages/[flask](https://railway.app/project/214c0077-af58-4a32-a88d-64ede781eee9/logs?filter=%40service%3A14b25553-ea73-47f6-97a6-efa0fa9aa170&range=12h)/app.py", line 1486, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/opt/venv/lib/python3.8/site-packages/[flask](https://railway.app/project/214c0077-af58-4a32-a88d-64ede781eee9/logs?filter=%40service%3A14b25553-ea73-47f6-97a6-efa0fa9aa170&range=12h)_cors/extension.py", line 176, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "/opt/venv/lib/python3.8/site-packages/[flask](https://railway.app/project/214c0077-af58-4a32-a88d-64ede781eee9/logs?filter=%40service%3A14b25553-ea73-47f6-97a6-efa0fa9aa170&range=12h)/app.py", line 1484, in full_dispatch_request
rv = self.dispatch_request()
File "/opt/venv/lib/python3.8/site-packages/[flask](https://railway.app/project/214c0077-af58-4a32-a88d-64ede781eee9/logs?filter=%40service%3A14b25553-ea73-47f6-97a6-efa0fa9aa170&range=12h)/app.py", line 1469, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "/app/ai_ta_backend/main.py", line 349, in scrape
success_fail_dict = scraper.main_crawler(url, course_name, max_urls, max_depth, timeout, stay_on_baseurl, depth_or_breadth)
File "/app/ai_ta_backend/web_scrape.py", line 532, in main_crawler
self.breadth_crawler(url=url, course_name=course_name, timeout=timeout, base_url_on=base_url_str, max_depth=max_depth)
File "/app/ai_ta_backend/web_scrape.py", line 450, in breadth_crawler
url = self.queue[depth].pop(0)
IndexError: pop from empty list
The text was updated successfully, but these errors were encountered:
Ahh I see, I have a catch for this error now, but should we maybe create a base url input for cases like this? For example, this site might want to input this https://ncsa as the base url.
I got it by scraping this: https://ncsa-delta-doc.readthedocs-hosted.com/en/latest/index.html
Heads up a new bug:
File "/app/ai_ta_backend/web_scrape.py", line 450, in breadth_crawler
url = self.queue[depth].pop(0)
IndexError: pop from empty list
Full error:
The text was updated successfully, but these errors were encountered: