Smatch provide wrong and random scores #1870

flipz357 · 2024-01-21T20:25:39Z

Describe the bug
As also noted in the original Smatch repo issues, the Smatch score gives wrong and unverifiable results. This is also the case for HanLP.

Code to reproduce the issue

s = """(r / result-01
   :ARG1 (c / compete-01
            :ARG0 (w / woman)
            :mod (p / preliminary)
            :time (t / today)
            :mod (p2 / polo
                     :mod (w2 / water)))
   :ARG2 (a / and
            :op1 (d / defeat-01
                    :ARG0 (t2 / team
                              :mod (c2 / country
                                       :wiki +
                                       :name (n / name
                                                :op1 "Hungary")))
                    :ARG1 (t3 / team
                              :mod (c3 / country
                                       :wiki +
                                       :name (n2 / name
                                                 :op1 "Canada")))
                    :quant (s / score-entity
                              :op1 13
                              :op2 7))
            :op2 (d2 / defeat-01
                     :ARG0 (t4 / team
                               :mod (c4 / country
                                        :wiki +
                                        :name (n3 / name
                                                  :op1 "France")))
                     :ARG1 (t5 / team
                               :mod (c5 / country
                                        :wiki +
                                        :name (n4 / name
                                                  :op1 "Brazil")))
                     :quant (s2 / score-entity
                                :op1 10
                                :op2 9))
            :op3 (d3 / defeat-01
                     :ARG0 (t6 / team
                               :mod (c6 / country
                                        :wiki +
                                        :name (n5 / name
                                                  :op1 "Australia")))
                     :ARG1 (t7 / team
                               :mod (c7 / country
                                        :wiki +
                                        :name (n6 / name
                                                  :op1 "Germany")))
                     :quant (s3 / score-entity
                                :op1 10
                                :op2 8))
            :op4 (d4 / defeat-01
                     :ARG0 (t8 / team
                               :mod (c8 / country
                                        :wiki +
                                        :name (n7 / name
                                                  :op1 "Russia")))
                     :ARG1 (t9 / team
                               :mod (c9 / country
                                        :wiki +
                                        :name (n8 / name
                                                  :op1 "Netherlands")))
                     :quant (s4 / score-entity
                                :op1 7
                                :op2 6))
            :op5 (d5 / defeat-01
                     :ARG0 (t10 / team
                                :mod (c10 / country
                                          :wiki +
                                          :name (n9 / name
                                                    :op1 "United"
                                                    :op2 "States")))
                     :ARG1 (t11 / team
                                :mod (c11 / country
                                          :wiki +
                                          :name (n10 / name
                                                     :op1 "Kazakhstan")))
                     :quant (s5 / score-entity
                                :op1 10
                                :op2 5))
            :op6 (d6 / defeat-01
                     :ARG0 (t12 / team
                                :mod (c12 / country
                                          :wiki +
                                          :name (n11 / name
                                                     :op1 "Italy")))
                     :ARG1 (t13 / team
                                :mod (c13 / country
                                          :wiki +
                                          :name (n12 / name
                                                     :op1 "New"
                                                     :op2 "Zealand")))
                     :quant (s6 / score-entity
                                :op1 12
                                :op2 2))))
"""

if __name__ == "__main__":
     from hanlp.metrics.amr import smatch_eval
     path = "amr.tmp"
     with open(path, "w") as f:
         f.write(s)
     for _ in range(5):
        smatch_score = smatch_eval("amr.tmp", "amr.tmp")
        print(smatch_score)

Describe the current behavior
Totally wrong and random Smatch scores.

Expected behavior
A deterministic Smatch score of 100

System information

Linux Ubuntu 16.04
Python version: 3.8
HanLP version: current

Other info / logs
Not necessary. The problem is simply because using a hill-climber for graph matching is unsafe and intransparent, and lacks any upper-bound on the solution. This gets worse when graphs get more large than before, but can also occur on smaller graphs. A more detailed empirical study of the problem can be found here.

I've completed this form and searched the web for solutions.

hankcs · 2024-01-22T00:35:55Z

Thank you @flipz357 for reporting this. The randomness of Smatch implementations has been documented on our forum for 4 years and finally, you brought the community a solid solution. Your paper is quite dense, and I'll spend some time reading it then integrating your implementation soon.

flipz357 · 2024-01-22T19:18:49Z

Thanks @hankcs , apologies for any density in the paper, there's a few issues of current state of amr evaluation. But I think using a hill-climber for evaluation may clearly be the biggest current issue, since any of the scores from hill-climber are only lower-bounds and thus not verifiable (there are no upper-bounds), so we can never know if an output of the hill-climber is wrong, or correct (except of course if it returns 100 since then trivially it holds upper bound = lower bound).

flipz357 added the bug label Jan 21, 2024

flipz357 assigned hankcs Jan 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smatch provide wrong and random scores #1870

Smatch provide wrong and random scores #1870

flipz357 commented Jan 21, 2024 •

edited by hankcs

Loading

hankcs commented Jan 22, 2024

flipz357 commented Jan 22, 2024

Smatch provide wrong and random scores #1870

Smatch provide wrong and random scores #1870

Comments

flipz357 commented Jan 21, 2024 • edited by hankcs Loading

hankcs commented Jan 22, 2024

flipz357 commented Jan 22, 2024

flipz357 commented Jan 21, 2024 •

edited by hankcs

Loading