You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Then I open the archive using ReplayWeb.page-2.2.4.AppImage, and navigate to the page: https://promptingweekly.substack.com/p/prompting-principle-if-youre-fighting
There are several images on the page that directly get displayed when opening the live site. However, archiving the page with grab-site and replaying with ReplayWeb.page, the images do not load directly, appearing as broken images or blank spaces.
In addition, some scripts don't work properly. When navigating to the previous or next blog page, ReplayWeb.page will first display a page saying "Post not found". Refreshing the page will make it load properly (but still with the missing images).
My belief is that both the missing images and the script errors are caused by missing files in the crawl.
Additional details
I run Ubuntu 20.04 LTS.
The text was updated successfully, but these errors were encountered:
grab-site seems to not archive the <picture><srcset> URL in a Substack blog that I tried the tool on. I believe this may be an issue in wpull.
Step-by-step reproduction instructions
First I run:
I include these other two URLs so that their domain names shouldn't be considered "offsite".
The contents of the ignores file is:
Then I open the archive using ReplayWeb.page-2.2.4.AppImage, and navigate to the page:
https://promptingweekly.substack.com/p/prompting-principle-if-youre-fighting
You can download the WARC here: https://drive.google.com/file/d/1fJuWwgSTVfh9IdD47RC2lw67tWSryG4S/view?usp=sharing
Appearance of replayed page
There are several images on the page that directly get displayed when opening the live site. However, archiving the page with grab-site and replaying with ReplayWeb.page, the images do not load directly, appearing as broken images or blank spaces.
Archived:
Live site:
Archived:
Live site:
The same issues are observed with pywb
In addition, some scripts don't work properly. When navigating to the previous or next blog page, ReplayWeb.page will first display a page saying "Post not found". Refreshing the page will make it load properly (but still with the missing images).
My belief is that both the missing images and the script errors are caused by missing files in the crawl.
Additional details
I run Ubuntu 20.04 LTS.
The text was updated successfully, but these errors were encountered: