Skip to content

jkitchin/literature-alerts

Repository files navigation

Literature alerts with OpenAlex

This is a project to use https://openalex.org to create literature alerts. It creates an RSS feed and a results file with recently created entries. It also creates a GitHUB issue assigned to me, which notifies me when new entries are added.

You specify queries in this yaml file: ./queries.yml

There is a Python package in ./src/litalerts/ that provides a CLI called litalerts. That script is run on a schedule set by ./.github/workflows/scheduled.yml.

The results are written to the ./org directory as org-files and it generates RSS feeds at ./rss.

Here are some examples of what you might do with this:

  1. Replace Google/Scopus/Pubmed alerts (see https://github.com/jkitchin/literature-alerts/blob/main/org/CO2RR.org)
  2. Replace the Web Of Science report on citations to your work (see https://github.com/jkitchin/literature-alerts/blob/main/org/New-citations-for-John-Kitchin.org)
  3. Find out when a group of people publish something new (e.g. the CMU Chemical Engineering Department https://github.com/jkitchin/literature-alerts/blob/main/org/CMU-Chemical-Engineering.org)
  4. Get all new articles from a specific journal (see https://github.com/jkitchin/literature-alerts/blob/main/org/ACS-Catalysis.org)

How do you use this? I have not developed the best way to use this myself yet. Here are some ways I think you could do it.

  1. You could probably subscribe to the repo and get notified of updates.
  2. In your browser go to one of the org-files, e.g. https://github.com/jkitchin/literature-alerts/blob/main/org/water-splitting.org, and see if you want to do anything with the results. They get replaced every time the script runs.
  3. Subscribe to the rss feed and consume it as you see fit.
  4. Clone the repo and open ./org/water-splitting.org in Emacs. Interact with it as you see fit, e.g. refile entries, etc. It might be tricky to add notes, keep it running etc. There might be some git-fu, e.g. branching, etc. that makes it practical. I am still working out these kinds of details.

In Emacs you can set up elfeed like this with these RSS feeds:

(require 'elfeed)
(setq elfeed-feeds '("https://raw.githubusercontent.com/jkitchin/literature-alerts/main/rss/water-splitting.xml"
		     "https://raw.githubusercontent.com/jkitchin/literature-alerts/main/rss/CO2RR.xml"
		     "https://raw.githubusercontent.com/jkitchin/literature-alerts/main/rss/authors.xml"
		     "https://raw.githubusercontent.com/jkitchin/literature-alerts/main/rss/high-entropy-oxides.xml"
		     "https://raw.githubusercontent.com/jkitchin/literature-alerts/main/rss/liquid-metal.xml"
		     "https://raw.githubusercontent.com/jkitchin/literature-alerts/main/rss/ACS-Catalysis.xml"
		     "https://raw.githubusercontent.com/jkitchin/literature-alerts/main/rss/CMU-Chemical-Engineering.xml"
		     "https://raw.githubusercontent.com/jkitchin/literature-alerts/main/rss/New-citations-for-John-Kitchin.xml"))
(elfeed-update)
(elfeed)

Or go to some site like https://rssviewer.app/, paste in one of those urls, and click on view feed.

This is still a work in progress. Something not working? Feature requests? Post an issue at https://github.com/jkitchin/literature-alerts/issues.

How does it work?

I use GitHUB Actions to run litalerts on a schedule. This script iterates through ./queries.yml to construct URLs to query https://openalex.org. I use from_created_date in the filter which requires an OpenAlex premium API key. See https://openalex.org/pricing. OpenAlex gave me a premium API key for academic research. Thanks for that!

The API key is stored as a GitHUB secret so it is accessible to the Action script ./.github/workflows/scheduled.yml, but secure. This usually works, but apparently scheduled workflows are not always run on time (https://upptime.js.org/blog/2021/01/22/github-actions-schedule-not-working/). TBD if that is an issue. You can manually trigger the workflow at https://github.com/jkitchin/literature-alerts/actions/workflows/scheduled.yml.

The script generates some files, and I commit them to the repository so it is easy to access them. I might consider an alternative approach based on https://docs.github.com/en/actions/using-workflows/storing-workflow-data-as-artifacts, or maybe putting them on another branch.

Want to do it yourself?

You can use this repo as a template: https://github.com/new?template_name=literature-alerts&template_owner=jkitchin

You will want to modify these files:

If you want to do this yourself, you will need an OpenAlex premium API key. See https://openalex.org/pricing. Then, you will have to setup a repository secret for OPENALEX_API_KEY with the key they give you.

In your repo, go to something like https://github.com/jkitchin/literature-alerts/settings/actions and give actions “Read and write permissions” under “Workflow permissions”.

Wishlist

  • Figure out how to assign issues to specific users that are indicated in the queries.yml file. Maybe make an actions.sh file and then execute it later.
  • Add delivery methods to yml, email, rss, org, etc.
  • Consider pull-requests for other people to make their own queries? Would some constraints be needed?

Generating filters

Suppose you want new citations to your papers. I think there is a limit of 50 items in filters, My OpenAlex record lists ~195 records, so I find it convenient to generate the filter strings. Here I retrieve my results, get the id for each one, and then generate the filter queries in groups of 20. Then, you can paste this into the queries.yml file.

Whenever you have new papers that OpenAlex knows about, just rerun this to generate a new set of queries.

(let* ((entity-id "https://openalex.org/A5003442464")
       (data (oa--author entity-id))
       (works-url (plist-get data :works_api_url))
       (works-data (request-response-data
		    (request works-url
		      :sync t
		      :parser 'oa--response-parser)))
       (meta (plist-get works-data :meta)) 
       (per-page (plist-get meta :per_page))
       (count (plist-get meta :count))
       (pages (/ count per-page))
       (entries '())
       purl)
  ;; if there is a remainder we need to get the rest
  (when (> (mod count per-page) 0) (cl-incf pages))
  
  ;; Now we have to loop through the pages
  (cl-loop for i from 1 to pages
	   do
	   (setq purl (concat works-url (format "&page=%s" i))
		 works-data (request-response-data
			     (request purl
			       :sync t
			       :parser 'oa--response-parser))
		 entries (append entries (plist-get works-data :results))))
  (string-join
   (cl-loop for group in
	    (seq-partition (cl-loop for entry in entries collect (plist-get entry :id)) 25)
	    collect
	    (concat "     - cites:" (string-join group "|")))
   "\n"))

Another way to generate features from org-ref citations. Say you want papers that cite or are related to these:

cite:&ardagh-2019-princ-dynam;&ardagh-2019-catal-reson-theor;&ardagh-2020-catal-reson-theor;&gopeesingh-2020-reson-promot;&shetty-2020-elect-field

First highlight the region, then run the function below. That will copy the necessary ids to the clipboard, and then you can paste them somewhere.

(defun oa-generate-cites-filter (r1 r2)
  (interactive "r")
  (save-restriction
    (narrow-to-region r1 r2)
    (let* ((links (org-ref-get-cite-links))
	   path
	   references
	   entry
	   (dois '())
	   (oa-ids '())
	   s)
      (cl-loop for link in links do
	       (setq path (org-element-property :path link)
		     data (org-ref-parse-cite-path path)
		     references (plist-get data :references))
	       (cl-loop for reference in references do
			(setq entry (bibtex-completion-get-entry (plist-get reference :key)))
			(pushnew (concat "https://doi.org/" (cdr (assoc "doi" entry))) dois))
	       (cl-loop for doi in dois do
			(let* ((url (concat "https://api.openalex.org/works/" doi))
			       (data (request-response-data
				      (request url
					:sync t
					:parser 'oa--response-parser))))
			  (pushnew (plist-get data :id) oa-ids))))
     
      (setq s (kill-new (string-join oa-ids "|")))
      (message s))))

Open Alex Integration with Zotero

You can integrate this with Zotero. A proof of concept script is located at ./src/litalerts/zotero.py and a corresponding yaml file at ./cmu.yml. The only difference in this yaml file is the inclusion of a Zotero id for the user/group to act as, and a tag to add to the created Zotero items.

You have to go to https://www.zotero.org/settings/keys and create an API key, and save that key as a GitHUB secret called `ZOTERO_API_KEY` for Actions. If you run it locally, you need that environment variable defined.

The package installs a new CLI called lazotero that you run like this:

lazotero -f cmu.yml -s 1

I don’t love the way it works, for example, I could not figure out how to tell if an entry with a DOI/url already exists, so at the moment it only checks for an exact title match to avoid adding duplicates. That may lead to some missed entries that have the same title.

See https://www.zotero.org/groups/5480843/openalex-cmu-cheme-faculty for a web-based version of the group. We only keep journal-articles right now; datasets, dissertations, proceedings, proceedings-articles, report, and posted-content is skipped.

Articles are tagged “unread” when added. You can use this to find new articles that have been added, as long as you remove that tag when reviewing them. I don’t know how well that works in practice with a group library though.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages