Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[deprecated - too many problems w dataset] Kylel/semeval2017 #16

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

kyleclo
Copy link
Collaborator

@kyleclo kyleclo commented Feb 18, 2019

just the NER portion

Copy link
Collaborator

@armancohan armancohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments. Otherwise looks good.

spacy_text = nlp(text)

# split sentences & tokenize
sent_token_spans = TokenSpan.find_sent_token_spans(text=text, sent_tokens=[[token.text for sent in spacy_text.sents for token in sent if token.text.strip() != '']])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BERT will probably complain about long sequences (entire abstract). So we need to do this at sentence level.

return spans

@classmethod
def find_sent_token_spans(cls, text: str, sent_tokens: List[List[str]]) -> List[List['TokenSpan']]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems duplicate of the first funciton.

temp = []
index_sent += 1

# append remaining mentions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment that this is the last sentence only.

assert len(sent_mention_spans) == len(sent_token_spans)
return sent_mention_spans

#
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing unused code.





Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can comment these out as they are not related to entities.

@arthurbra
Copy link

I have trained the model with the semeval2017 data and evaluated it afterwards with semeval2017 evaluation script. I'am confused why I get so different results:
"best_validation_f1-measure-overall": 0.5288376220052742
"test_f1-measure-overall": 0.4320540671010848,

Results reported by the semeval2017 evaluation script on test set:
precision recall f1-score support

Material 0.44 0.43 0.44 904
Process 0.41 0.36 0.38 954
Task 0.17 0.15 0.16 193

avg / total 0.40 0.37 0.39 2051

So F1 is 0.39.

Do you have maybe a reasonable explanation for this difference?

@kyleclo
Copy link
Collaborator Author

kyleclo commented Apr 4, 2019

hey @arthurbra thanks for your interest! i've been looking into this difference as well, and it looks like the task definitions are different between what we've implemented here & the original semeval2017 task.

specifically, the 3 tasks in semeval2017 are (1) entity identification, (2) entity type classification, and (3) relation extraction. what we've implemented here combines both (1) and (2) into a single task (we're both extraction & tagging w/ the type at the same time). this will affect how f1 scoring is performed.

instead of using the sequence tagging model we have here, my plan is to adapt an existing model made for the semeval2017 task, and substitute in the various bert variants to replace glove/w2v embeddings.

@arthurbra
Copy link

arthurbra commented Apr 6, 2019

Hey @kyleclo thanks for your reply and explanation. I am really impressed by your results and want to learn more.

It is right, you perform entity identification and classification in one task. In my understanding this is task B in semeval 2017:
Subtask B: t_B = O,M, P, T for tokens being out- side a keyphrase, or being part of a material, pro- cess or task.

In calculateMeasures of semeval2017 evaluation script you can pass the parameter remove_anno="rel" which should ignore relations during evaluation (as far as I understand the code). I already used this parameter in the evaluation of my prev. post. So I assume there should be an another explanation.

It would be great if you could apply the code of the semeval 2017 winner with BERT. Unfortunatelly I was not able to find it (AI2 system of Ammar et. al.).

@kyleclo kyleclo changed the title Kylel/semeval2017 [deprecated - too many problems w dataset] Kylel/semeval2017 Jun 28, 2019
@arthurbra
Copy link

Hi Kyle,

I think I have found the problem: in the prediction script model.eval() is not called, so that dropout is active during prediction.

Regards
Arthur

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants