Skip to content

openzim/mwoffliner

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

2d9609a · Nov 14, 2023
Mar 22, 2023
Oct 1, 2019
Feb 6, 2023
Apr 5, 2023
Mar 20, 2020
Oct 15, 2023
Nov 14, 2023
Nov 14, 2023
Aug 17, 2023
Aug 1, 2023
Jun 28, 2019
Aug 1, 2023
Jan 20, 2023
Jan 20, 2023
Aug 1, 2023
Apr 9, 2020
Jan 20, 2023
Jan 20, 2023
Aug 1, 2023
May 24, 2023
Jul 8, 2019
Nov 9, 2017
May 21, 2023
Mar 9, 2017
Jan 15, 2023
Jul 28, 2023
Jul 28, 2023
May 18, 2019
Jan 20, 2023

Repository files navigation

MWoffliner

MWoffliner is a tool for making a local offline HTML snapshot of any online MediaWiki instance. It goes through all online articles (or a selection if specified) and create the corresponding ZIM file. It has mainly been tested against Wikimedia projects like Wikipedia and Wiktionary --- but it should also work for any recent MediaWiki.

Read CONTRIBUTING.md to know more about MWoffliner development.

NPM

npm Docker Build Status codecov CodeFactor License

Features

  • Scrape with or without image thumbnail
  • Scrape with or without audio/video multimedia content
  • S3 cache (optional)
  • Image size optimiser / Webp converter
  • Scrape all articles in namespaces or title list based
  • Specify additional/non-main namespaces to scrape

Run mwoffliner --help to get all the possible options.

Prerequisites

  • *NIX Operating System (GNU/Linux, macOS, ...)
  • Redis
  • NodeJS version 16 or greater
  • Libzim (On GNU/Linux & macOS we automatically download it)
  • Various build tools which are probably already installed on your machine (packages libjpeg-dev, libglu1, autoconf, automake, gcc on Debian/Ubuntu)

... and an online MediaWiki with its API available.

Usage

To install MWoffliner globally:

npm i -g mwoffliner

You might need to run this command with the sudo command, depending how your npm is configured.

npm permission checking can be a bit annoying for a newcomer. Please read the documentation carefully if you hit problems: https://docs.npmjs.com/cli/v7/using-npm/scripts#user

Then to run it:

mwoffliner --help

To install and run it locally:

npm i
npm run mwoffliner -- --help

To use MWoffliner with a S3 cache, you should provide a S3 URL like this:

--optimisationCacheUrl="https://wasabisys.com/?bucketName=my-bucket&keyId=my-key-id&secretAccessKey=my-sac"

API

MWoffliner provides also an API and therefore can be used as a NodeJS library. Here a stub example:

const mwoffliner = require('mwoffliner');
const parameters = {
    mwUrl: "https://es.wikipedia.org",
    adminEmail: "foo@bar.net",
    verbose: true,
    format: "nopic",
    articleList: "./articleList"
};
mwoffliner.execute(parameters); // returns a Promise

Background

Complementary information about MWoffliner:

  • MediaWiki software is used by thousands of wikis, the most famous ones being the Wikimedia ones, including Wikipedia.
  • MediaWiki is a PHP wiki runtime engine.
  • Wikitext is the name of the markup language that MediaWiki uses.
  • MediaWiki includes a parser for WikiText into HTML, and this parser creates the HTML pages displayed in your browser.

GNU/Linux - Debian based distributions

Install NodeJS: Read https://nodejs.org/en/download/current/

Install Redis:

sudo apt-get install redis-server

Troubleshooting

Older GNU/Linux distributions and/or versions of Node.js might be shipped with a deprecated version of npm. Older versions of npm have incompatbilities with certain versions of Node.js and might simply fail to install mwoffliner package.

We recommend to use a recent version of npm. Recent versions can perfectly deal with older Node.js 10. Do install the packaged version of npm and then use it to install a newer version like:

sudo npm install --unsafe-perm -g npm

Don't forget to remove the packaged version of npm afterward.

License

GPLv3 or later, see LICENSE for more details.