Adding MoverScore #11

forrestbao · 2022-12-04T09:06:57Z

Like BertScore and BLEURT, MoverScore is another modern transformer-based reference-based summerization metric.

However, we did not include it in our pilot study. Now maybe a good time to add it.

Unfortunately, HF's evaluate library does not included it. But the original author seems to have provided a good package: https://pypi.org/project/moverscore/ And the Github source is here: https://github.com/AIPHES/emnlp19-moverscore

Let's add it. Note to be fair and square (#10), let's use a RoBERTa-large based model. To select a model, see here. The model name is the model name in HuggingFace. So we can simply use RoBERTa-large (generally trained).

The text was updated successfully, but these errors were encountered:

TURX · 2022-12-09T12:50:17Z

Is this to be included in evalbase or DocAsRef?

forrestbao · 2022-12-09T22:19:49Z

I think DocAsRef.
How do did you run your experiments? I think you should define your metrics, and the import EvalBase's top level functions to evaluate, do you?

forrestbao · 2022-12-10T06:17:28Z

I think MoverScore should be added into DocAsRef. Any metrics developed or benchmarked by us for the ACL 2023 submission should go to DocAsRef.

To evaluate, just go to EvalBase/env.py, import metrics from DocAsRef folder, and then add the imported metrics in the metrics dictionary. Then run EvalBase's experiment files, i.e., {summeval, realsumm, newsroom}.py

TURX · 2022-12-10T13:26:11Z

moverscore_v2 runs on CPU. moverscore runs on GPU but unable to change model.

Refs:
https://github.com/AIPHES/emnlp19-moverscore/blob/master/moverscore.py
https://github.com/AIPHES/emnlp19-moverscore/blob/master/moverscore_v2.py

forrestbao · 2022-12-10T16:58:21Z

It's because MoverScore_v2 does not move variables to GPU.
Compare all lines in Moverscore that has device=device vs. those in MoverScore_v2 without this kwarg.
Or search for cuda:0 or .to(device in MoverScore code.
Such as the lines below in MoverScore

    padded, lens, mask = padding(arr, pad_token, dtype=torch.long)
    padded_idf, _, _ = padding(idf_weights, pad_token, dtype=torch.float)


    padded = padded.to(device=device)
    mask = mask.to(device=device)
    lens = lens.to(device=device)
    return padded, padded_idf, lens, mask, tokens

vs. the lines below in MoverScore_v2

    padded, lens, mask = padding(arr, pad_token, dtype=torch.long)
    padded_idf, _, _ = padding(idf_weights, pad_token, dtype=torch.float)

    return padded, padded_idf, lens, mask, tokens

Maybe you can give them a PR.

minor: classical metrics, moverscore topk fix: #6, #7, #11 rm: obsolete achived_experiments tofix: moverscore truncation & corr, pipeline refactor to automodel

forrestbao assigned TURX Dec 4, 2022

forrestbao added metric P2 labels Dec 4, 2022

TURX added a commit that referenced this issue Dec 11, 2022

shared sent segments, mnli on gpu, moverscore

4ca7c5f

minor: classical metrics, moverscore topk fix: #6, #7, #11 rm: obsolete achived_experiments tofix: moverscore truncation & corr, pipeline refactor to automodel

TURX assigned forrestbao and unassigned TURX Jan 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding MoverScore #11

Adding MoverScore #11

forrestbao commented Dec 4, 2022

TURX commented Dec 9, 2022

forrestbao commented Dec 9, 2022

forrestbao commented Dec 10, 2022

TURX commented Dec 10, 2022 •

edited

Loading

forrestbao commented Dec 10, 2022 •

edited

Loading

Adding MoverScore #11

Adding MoverScore #11

Comments

forrestbao commented Dec 4, 2022

TURX commented Dec 9, 2022

forrestbao commented Dec 9, 2022

forrestbao commented Dec 10, 2022

TURX commented Dec 10, 2022 • edited Loading

forrestbao commented Dec 10, 2022 • edited Loading

TURX commented Dec 10, 2022 •

edited

Loading

forrestbao commented Dec 10, 2022 •

edited

Loading