Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discrepant result due to sentence segmentation #8

Open
forrestbao opened this issue Dec 4, 2022 · 2 comments
Open

discrepant result due to sentence segmentation #8

forrestbao opened this issue Dec 4, 2022 · 2 comments
Assignees

Comments

@forrestbao
Copy link
Contributor

I am comparing the results that is already in the repo and the result at /home/turx/EvalBase/results. Some method have different results in these two locations.

For example, this is a segment from realsumm_abs_summary.txt from the repo, under pearsonr:

https://github.com/SigmaWe/DocAsRef_0/blob/de4de4b4275e661621bebf3b2f92d8676e2f81c2/results/realsumm_abs_summary.txt#L8-L13

But this is the corresponding segment from /home/turx/EvalBase/results, also under pearsonr:

trad      bertscore-sentence       P                       0.087
                                   R                       0.308
                                   F                       0.233
new       bertscore-sentence      P                      -0.067
                                   R                       0.326
                                   F                       0.222

I am really worried about the code accuracy. Please find out what has changed that caused this discrepancy. We have had incidents like this before. I do not want to have it again. I do not want to publish a paper based on wrong results.

@TURX
Copy link
Collaborator

TURX commented Dec 4, 2022

Last time, we have used segmentation by split the pieces by "." (i.e., doc.split(".")), now we are using spacy segmentation on bertscore-sentence.

@forrestbao
Copy link
Contributor Author

Please check whether Spacy's sentence segmentation output makes sense on the test data. The test data frequently contains lexical noises. Paste some examples, inputs and outputs here, so we can investigate, using both `doc.split(".") and Spacy.

@forrestbao forrestbao changed the title discrepant result discrepant result due to sentence segmentation Jan 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants