Skip to content

How to add a new wiki

Abel Serrano Juste edited this page Dec 3, 2018 · 8 revisions

Here are the steps for adding a new wiki to WikiChron (note that, by default, wikichron already includes some wikis for testing and example purposes:

Get the dump

First, once you know which wiki you are going to analyze, you should focus on which is the source it comes from. It can be a Wikia wiki, a Wikimedia project wiki, or another kind of wiki (like a self-hosted wiki). WikiChron can analyze any of those as long as the input data is in the right format.

The procedure to download the XML dump depends on the source, so first of all, go check the "XML Dumps" section of the README file in https://github.com/Grasia/WikiChron#xml-dumps , which will provide you with all the information you need to get the XML dump of your wiki.

Remember to join all possible parts of the dump in one dump only and make sure it has the full history of every page of the wiki you want to analyze, and not only the current-only version of the dump.

Process the dump

Once you have your XML dump, you need to process the dump in order to get the corresponding .csv file. To do so, go run the script dump_parser, this script is listed as requirement for WikiChron, but you could also install it standalone with pip install wiki-dump-parser. This script process any mediawiki dump and outputs a pre-processed and simplified csv file with all the information that WikiChron needs to print its plots. Run the script using

python3 -m dump_parser data/<name_of_your.xml>

This will create the corresponding .csv file in your local data/ directory. If you have more than one XML file, run the script as follows:

python3 -m dump_parser data/*.xml

NOTE: all this information can be found in the "Process the dump" subsection of the "XML Dumps" section of the README.

Modify the wikis.json file

As it is stated in the "provide some metadata of the wiki" section of the README, you need to provide some metadata of your wiki in the wikis.json file, like the number of pages, the number of users, the user ids of the bots, etc.

You can edit this file by hand and write the corresponding data or, in case you are using Wikia wikis or similar compatible wikis, you can use the script generate_wikis_json.py coded for this purpose.

This script gets a file called wikis.csv as input, which has a list of the wiki urls and the filename of the csvs you want to add to WikiChron, and properly find the metadata needed and edits the file wikis.json accordingly. You can see the wikis.csv given as an example for the wikis data provided with WikiChron by default. Once you have set your wikis.csv file (you can just append your wikis to the one provided, since the script won't overwrite the previous gotten wikis.json data), just run:

ptyhon3 generate_wikis_json.py

Launch WikiChron and now you should see your new wikis added to the list. Note that you might need to restart WikiChron if it was running before you added the new wiki