You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
As also noted in the original Smatch repo issues, the Smatch score gives wrong and unverifiable results. This is also the case for HanLP.
Code to reproduce the issue
s="""(r / result-01 :ARG1 (c / compete-01 :ARG0 (w / woman) :mod (p / preliminary) :time (t / today) :mod (p2 / polo :mod (w2 / water))) :ARG2 (a / and :op1 (d / defeat-01 :ARG0 (t2 / team :mod (c2 / country :wiki + :name (n / name :op1 "Hungary"))) :ARG1 (t3 / team :mod (c3 / country :wiki + :name (n2 / name :op1 "Canada"))) :quant (s / score-entity :op1 13 :op2 7)) :op2 (d2 / defeat-01 :ARG0 (t4 / team :mod (c4 / country :wiki + :name (n3 / name :op1 "France"))) :ARG1 (t5 / team :mod (c5 / country :wiki + :name (n4 / name :op1 "Brazil"))) :quant (s2 / score-entity :op1 10 :op2 9)) :op3 (d3 / defeat-01 :ARG0 (t6 / team :mod (c6 / country :wiki + :name (n5 / name :op1 "Australia"))) :ARG1 (t7 / team :mod (c7 / country :wiki + :name (n6 / name :op1 "Germany"))) :quant (s3 / score-entity :op1 10 :op2 8)) :op4 (d4 / defeat-01 :ARG0 (t8 / team :mod (c8 / country :wiki + :name (n7 / name :op1 "Russia"))) :ARG1 (t9 / team :mod (c9 / country :wiki + :name (n8 / name :op1 "Netherlands"))) :quant (s4 / score-entity :op1 7 :op2 6)) :op5 (d5 / defeat-01 :ARG0 (t10 / team :mod (c10 / country :wiki + :name (n9 / name :op1 "United" :op2 "States"))) :ARG1 (t11 / team :mod (c11 / country :wiki + :name (n10 / name :op1 "Kazakhstan"))) :quant (s5 / score-entity :op1 10 :op2 5)) :op6 (d6 / defeat-01 :ARG0 (t12 / team :mod (c12 / country :wiki + :name (n11 / name :op1 "Italy"))) :ARG1 (t13 / team :mod (c13 / country :wiki + :name (n12 / name :op1 "New" :op2 "Zealand"))) :quant (s6 / score-entity :op1 12 :op2 2))))"""if__name__=="__main__":
fromhanlp.metrics.amrimportsmatch_evalpath="amr.tmp"withopen(path, "w") asf:
f.write(s)
for_inrange(5):
smatch_score=smatch_eval("amr.tmp", "amr.tmp")
print(smatch_score)
Describe the current behavior
Totally wrong and random Smatch scores.
Expected behavior
A deterministic Smatch score of 100
System information
Linux Ubuntu 16.04
Python version: 3.8
HanLP version: current
Other info / logs
Not necessary. The problem is simply because using a hill-climber for graph matching is unsafe and intransparent, and lacks any upper-bound on the solution. This gets worse when graphs get more large than before, but can also occur on smaller graphs. A more detailed empirical study of the problem can be found here.
I've completed this form and searched the web for solutions.
The text was updated successfully, but these errors were encountered:
Thank you @flipz357 for reporting this. The randomness of Smatch implementations has been documented on our forum for 4 years and finally, you brought the community a solid solution. Your paper is quite dense, and I'll spend some time reading it then integrating your implementation soon.
Thanks @hankcs , apologies for any density in the paper, there's a few issues of current state of amr evaluation. But I think using a hill-climber for evaluation may clearly be the biggest current issue, since any of the scores from hill-climber are only lower-bounds and thus not verifiable (there are no upper-bounds), so we can never know if an output of the hill-climber is wrong, or correct (except of course if it returns 100 since then trivially it holds upper bound = lower bound).
Describe the bug
As also noted in the original Smatch repo issues, the Smatch score gives wrong and unverifiable results. This is also the case for HanLP.
Code to reproduce the issue
Describe the current behavior
Totally wrong and random Smatch scores.
Expected behavior
A deterministic Smatch score of 100
System information
Other info / logs
Not necessary. The problem is simply because using a hill-climber for graph matching is unsafe and intransparent, and lacks any upper-bound on the solution. This gets worse when graphs get more large than before, but can also occur on smaller graphs. A more detailed empirical study of the problem can be found here.
The text was updated successfully, but these errors were encountered: