Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

‼ web.archive.org mirrors do not work with new media url substitution #47

Open
UPLYNXED opened this issue Apr 17, 2024 · 0 comments
Open
Labels
bug Something isn't working help wanted Extra attention is needed HIGH PRIORITY PRIORITY ISSUE - breaking refactor Feature requires a refactor/rework

Comments

@UPLYNXED
Copy link
Owner

The wayback machine (WBM) rewrites all the media URLs even on a javascript and json entries level.
This means all media URLs are being rewritten to their web.archive.org archived counterparts.

However, when trying to retrieve a URL stored in an attribute in the html, the WBM rewrites that URL as well, removing the web.archive.org prefix from it at some point between the value in the html and the function in js.

This is an issue because we are storing media file replacements with the keys set to the (cleaned up) original media file URL.
Because of WBM messing with the URLs, all keys (and all property values) have the prefix, but the value we retrieved does not. We can not use this value to retrieve its associated media_replacements object. Rewriting the URL to include the prefix again might also not be an issue as the timestamp of when the URL was archived is part of that prefix, and there is probably no way to tell what it might be.

Potential fixes:

  • Figure out how to get the full value from the html attribute without WBM interfering
  • Figure out how to rewrite the URL again to include the web.archive.org prefix
  • Rewrite the media_replacements object and related methods to avoid using a full URL as the index, perhaps just filename and filetype? Might have to account for potential conflicts with matching filenames/filetypes, but at least it's unlikely when it comes to media pulled from Twitter itself.
@UPLYNXED UPLYNXED added bug Something isn't working help wanted Extra attention is needed refactor Feature requires a refactor/rework HIGH PRIORITY PRIORITY ISSUE - breaking labels Apr 17, 2024
@UPLYNXED UPLYNXED added this to the Minimum Viable Feature Set milestone Apr 17, 2024
@UPLYNXED UPLYNXED changed the title web.archive.org mirrors do not work with new media url substitution ‼ web.archive.org mirrors do not work with new media url substitution Apr 17, 2024
UPLYNXED pushed a commit that referenced this issue Apr 18, 2024
- Rewrote substituteMediaUrl() and split the URL parsing into a new function parseMediaUrl().
- media_replacements object now uses the filenames as keys instead of the URLs. [see #47, yet to be tested on WBM]
- Added function stub getMediaReplacement() to get the media_replacements object, this will use parseMediaUrl() to replace old media_replacements[url] calls.
- Redid some of the formatPicture() function and useravatar_img template tag to better handle avatar fallbacks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed HIGH PRIORITY PRIORITY ISSUE - breaking refactor Feature requires a refactor/rework
Projects
Status: 📑 To-do
Development

No branches or pull requests

1 participant