This repository contains data for post at WMCZ's blog, about Czech Wikipedia during the pandemy.
This repository makes use only of public data published by the Wikimedia Foundation, but the public data are processed at WMF's Hadoop cluster via Spark queries.
Data about page/project views can be downloaded from Wikimedia Dumps as pageviews
dataset. In the Hadoop cluster, the data are available as those two tables:
wmf.pageview_hourly
: per-page views, hourly granularity (docs)wmf.projectview_hourly
: per-project views, hourly granularity (docs)
Data about edits can be downloaded from Wikimedia Dumps as mediawiki_history
dataset. In the Hadoop cluster, the data are available as wmf.mediawiki_history
(docs).