Skip to content

jacobvsdanniel/pubmedkb_web

Repository files navigation

pubmedKB web

End-to-end relation extraction for biomedical literature: full datasets, python API, and web GUI

The core annotators, pretrained models, and training data can be downloaded at pubmedKB core

Authors:

  • Li Peng-Hsuan (李朋軒) @ ailabs.tw (jacobvsdanniel [at] gmail.com)
  • Sun Yih-Yun (孫懿筠) @ ailabs.tw (jessie.yy.sun [at] gmail.com)
  • Eunice You-Chi Liu (劉又綺) @ ailabs.tw (eunicecollege2019 [at] gmail.com)

Introduction

This repo hosts the full datasets, python API, and web GUI for pubmedKB, a knowledge base created from annotations of PubMed. See pubmedkb_core for the core annotators behind the knowledge base. Or see our paper:

Peng-Hsuan Li, Ting-Fu Chen, Jheng-Ying Yu, Shang-Hung Shih, Chan-Hung Su, Yin-Hung Lin, Huai-Kuang Tsai, Hsueh-Fen Juan, Chien-Yu Chen and Jia-Hsin Huang, pubmedKB: an interactive web server to explore biomedical entity relations from biomedical literature, Nucleic Acids Research, 2022, https://doi.org/10.1093/nar/gkac310

Functions

  • NEN
    • Look up similar names to query input
    • Also return IDs and aliases for each name
  • REL
    • Look up relations and evidence sentences for entity or entity pair
    • Specify an entity by name and/or ID

Dependencies

  • OS-independent
  • python3
  • Flask (a python package)

Datasets

Support data content Disk size (zip size)
gene id, name correspondence 226 MB (52 MB)
variant id, name, gene correspondence 133 MB (28 MB)
meta title, author, year, journal, citation, IF 8.6 GB (1.6 GB)
paper title, abstract, entity 99 GB (13GB)
Relation data Relations # papers with relations* Section Disk size (zip size) Memory usage
Full KB odds ratio, causal, open relations, etc. 8.5 M abstract 12 GB (2.2 GB) 15 GB
Partial KB odds ratio, causal, open relations, etc. 0.3 M abstract 487 MB (88 MB) 10 GB

*We processed all 35M PubMed citations dumped on 2023/02/17.

Deprecated Datasets

Checkout the old version of pubmedkb_web to use these datasets.

git checkout 2e79a4bbf4258c88dda1ddc7f4e4f3ee37443896
Full dataset zip size #papers section memory-efficient open access
pubmedKB-BERN-disk 1.6 GB 4.3 M abstract O O
pubmedKB-PTC-memory 3.1 GB 10.8 M abstract X X
pubmedKB-PTC-disk 3.4 GB 10.8 M abstract O X
pubmedKB-PTC-FT-disk 3.7 GB 1.7 M full text O X
Partial dataset zip size #papers section memory-efficient open access
pubmedKB-BERN-disk-small 336 MB 884 K abstract O O
pubmedKB-PTC-disk-small 605 MB 2.0 M abstract O O
pubmedKB-PTC-FT-disk-small 781 MB 336 K full text O O

GUI/API server

python server.py \
--gene_dir [gene_directory] \
--variant_dir [variant_directory] \
--meta_dir [meta_directory] \
--paper_dir [paper_directory] \
--kb_dir [KB_directory] \
--kb_type relation \
--port 8000
  • Supports both HTTP GET and POST
  • Displays results on an HTML webpage or return a JSON file

GUI/API client

  • Open browser and connect to [server_ip]:[server_port]
  • Also check out client.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published