Skip to content

Refurbish Re-CitationBot #127

@Daniel-Mietchen

Description

@Daniel-Mietchen

Re-CitationBot currently does three main things:

  • uploading full-texts to Wikisource (cf. dedicated category)
    • if that text includes formulas or tables supplied as images, they go to Wikisource as well (since Commons won't accept them)
  • uploading images to Wikimedia Commons (cf. dedicated category)

Over time, it is also to take on part or all of the following functions:

  • uploading the metadata of Wikisource imports to Wikidata
  • updating the Wikipedia page that cited the original DOI that triggered the workflow, such that that Wikipedia page then also links to the relevant Wikisource, Wikimedia Commons and Wikidata entries.

There are a number of issues around that, which we will explore in more detail later. Some of the more urgent ones:

  • UI: since our workflow (Workflow #113) is aimed at automating things to the extent possible, we do not envision a shiny UI - it basically just has to work for us, and especially me, so that we can monitor what happens at every step, and test/ debug each step individually.
  • some of the full-texts are actually not full texts, but missing or misrepresenting some parts, which is why we are still working on the JATS-to-MediaWiki converter
  • Merger with Open Access Media Importer: that bot has been active since mid-2012, its code is in a neighbouring repo here under wpoa, and it forms the basis of much of the current code of the Recitation-bot. It currently runs off a server at a German university but should eventually be moved to Wikimedia Labs, where Recitation-bot already is. But since its scope (audio and video files from open access scholarly articles available in JATS) is just a subset of the scope of Recitation-bot on Commons, it makes sense to merge the two.
  • Categorization: while uploading stuff to Wikimedia Commons, it is important to set useful categories, so that people (and, increasingly, tools) can find them there. We are doing this by making use of JATS tags for keywords and subject matter, but this is still error-prone, since publishers (who provide the JATS) use these tags inconsistently. That's where @difranco's interest in topic modelling might come in handy. Categorization is less important on Wikisource, and on Wikidata, it would be nice to enrich the basic article metadata with statements about the main subject (P921), as in this example.

Pinging #118.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions