A command-line tool to fetch Blacklight scans for a list of urls. Directly queries the open-source Blacklight Collector tool and runs entirely locally.
nvm use
npm install
./blacklight-query urls.txt
whereurls.txt
has newline-separated absolute URLs to scan
Write all URLs you wish to scan as absolute URLs (including protocol, domain, and path). Separate each URL with a newline.
https://www.themarkup.org
https://www.calmatters.org
You can also pipe your list of URLs.
echo "https://themarkup.org/" | ./blacklight-query
./blacklight-query < urls.txt
All of the blacklight-collector
options can be specified using this tool, by editing the config
object in main.ts
.
Out of the box, this tool sets the following options:
headless: true
, this sets the collector to use a headless, behind-the-scenes browseroutDir: ./outputs/[URL]
, specifies which directory the collector should store its results in. Makes use of the url being scannednumPages: 0
, tells the collector not to scan an additional page. Setting this to1
,2
, or3
scans that number of randomly chosen pages that are accessible from the homepage
Some other options you may find useful are:
emulateDevice
, this specifies which device the collector should scan asheaders
, allows you to set custom headers on the headless browser
Read the blacklight-collector
README for a full list of options and their defaults.
All scans will be saved in the outputs
folder, in subdirectories named for the hostname of the url being scanned.
Be aware that the Collector is fairly resource-heavy, and may slow down your computer. We recommend scanning smaller lists if hardware becomes overtaxed.
npm run test