Skip to content

raffiot/raffiot.github.io

Repository files navigation

Wikipedia Activity Analysis:
This project is a tool to observe Wikipedia articles activity.
The algorithmic part is in python and the visualization part is in javascript.
The result is available at https://raffiot.github.io/

Algorithmic part:
The jupyter notebook would work in a folder with the following setup


|
|--diseases_deep_graph.ipynb
|--diseases_graph.ipynb
|--wikispedia_graph.ipynb
|--XX_century_graph.ipynb
|
+--Data
|+--diseases
|||--diseases_edges.csv
|||--diseases_edges_big.csv
||
|+--normal_pages
|||--normal_pages.pkle
||
|+--wikispeedia_paths-and-graph
|||--articles.tsv
|||--links.tsv
|||--categories.tsv
||
|+--XX_century
|||--edges_20c.csv

The format for the different edges file (diseases_edges.csv, diseases_edges_big, edges_20c) is "p1.id" "rel" "p2.id"
with p1.id is link source page id, rel is the relation (LINKS_TO or BELONGS_TO) and p2.id is link target page id.

The format for normal_pages.pkle is index=id, columns=[article,isRedirect,isNew]. In .ipynb files there is also the code to import it from csv files.

The format for articles.tsv is "article_name"
The format for links.tsv is "linkSource" "linkTarget"
The format for categories is "article_name" "category"

The code takes about ten minutes to run depending on your internet connection

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published