Skip to content

Issues with AbbreviationFinderStep deleting entities #21

@simoond

Description

@simoond

I noticed that kazu changed the input text. I'm curious how it made this mistake and how to prevent it.

To recreate: I passed in a text block that contained the text "The CFIm25 deletion leads to 3’ UTR shortening". The third character of the gene is an uppercase I, not a lowercase l.

When running this through kazu, the output is like below. It correctly knows CFIM25 is a gene but it can't entity link it because it changed the input text uppercase i to a lowercase L

CFlm25:gene:TransformersModelForTokenClassificationNerStep:959:965

Because it changed the input text, the entity mappings are null

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions