Skip to content

Parse links out of HTML in a morph.io scraper and do some archiving actions on each of them. This is kind of a meta-archiver really.

Notifications You must be signed in to change notification settings

austccr/link_and_attachment_archiver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This script finds links in HTML text and saves a backup copy of them.

Adding your morph.io API key securely

You need your morph.io API key to run this, as it requests records from the morph.io API. Find your key at https://morph.io/documentation/api .

Your morph.io API key should be set to the environment variable MORPH_API_KEY. Accessed as:

ENV['MORPH_API_KEY']

For morph.io runs

On your morph.io scraper's settings page, add the environment variable.

See the morph.io secret values documentation.

On your local machine

We use dotenv to add environment variables from a .env file locally.

Create the .env file by copying .env.example:

cp .env.example .env

Then replace the example string with your API key in your new .env file.

This file is listed in .gitignore and so won't be checked into your git repository. In other words, it will only stay on your local machine, secret from the web.

Now, to run the script with your secret key, execute it with dotenv:

bundle exec dotenv ruby scraper.rb

TODOs

  • point this at more sources, like the main lobbywatch archive for example
  • handle updates
    • if we don't have it, archive it
    • if we already have the file for this source_url, don't update
    • if we have it are there are errors, try again
  • investigate 302 records
  • if we get a cert missing https error from archive.org, try the http version of the links.
  • write a proper description in the readme

About

Parse links out of HTML in a morph.io scraper and do some archiving actions on each of them. This is kind of a meta-archiver really.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages