Skip to content

Latest commit

 

History

History
40 lines (29 loc) · 1.65 KB

README.md

File metadata and controls

40 lines (29 loc) · 1.65 KB

Disease-NER

Given a medical diagnosis, identifying medical conditions within the text and mapping them to standardized medical encodings.

Data

The data directory contains:

  • The disease mentions from the text files stored in entities.tsv.
  • Text files containing the medical textual data in the text directory.

The data is taken from the English version of multilingual resources of the DisTEMIST 2022 task: https://zenodo.org/record/6532684

Pre-processing

The pre-processing stage involves:

  • Splitting medical text in each file into sentences.

  • Tokenizing the sentences into words/tokens.

  • Calculating IOB tags for the tokens for named entity recognition (NER) task.

  • Code: Pre-processing.ipynb

NER Task

Entity Linking Task