Skip to content

A command-line application and Python library for obtaining the metadata and full-text of published journal articles for text data mining (TDM) purposes.

License

Notifications You must be signed in to change notification settings

unimelbmdap/doiget-tdm

Repository files navigation

doiget-tdm

doiget-tdm is a command-line application and Python library for obtaining the metadata and full-text of published journal articles.

Warning

This package is primarily intended for use in text data mining projects where the user has subscriptions to full-text content and has organised data exchange agreements. Acquisition for most publishers will not work without configuration - see Available publishers.

Features

  • Acquire full-text of published articles, with built-in support for multiple publishers and their acquisition methods (e.g., network or local files).
  • Currently supported publishers (given appropriate access and configuration):
    • American Medical Association (AMA)
    • American Psychological Association (APA)
    • Elsevier
    • Frontiers
    • IOP
    • PeerJ
    • PLoS
    • PNAS
    • Royal Society
    • Sage
    • Springer-Nature
    • Taylor & Francis
    • Wiley
  • Customise acquisition and add additional publishers.
  • Retrieve article metadata from Crossref, optionally using a Lightning key:value (DOI:metadata) database formed from a Crossref public data export via crossref-lmdb.

Installation

The package can be installed using pip:

pip install doiget-tdm

Quickstart

Show the default configuration settings:

doiget-tdm show-config

Download the full-text (XML) of the journal article with DOI 10.1371/journal.pbio.1002611 to the default directory:

doiget-tdm acquire '10.1371/journal.pbio.1002611'

Next, you can read through the Workflow document to understand how to use the package in a text data mining project and the Concepts document to learn more about the approach taken by doiget-tdm.

Documentation

See the documentation for detailed information about how to use doiget-tdm.

About

A command-line application and Python library for obtaining the metadata and full-text of published journal articles for text data mining (TDM) purposes.

Resources

License

Stars

Watchers

Forks

Languages