Evaluation of bucc2018 #60

chenQ1114 · 2021-03-05T02:43:23Z

Hi,

I notice that a threshold is needed before the evaluation of bucc2018 on dev or test datasets, which is determined by the gold file. We can get the threshold for dev datasets but we can not get the threshold for test process. So, I can only get the prediction files without score filter process and submit it to leadboard since I can not find the test gold file of bucc2018. Is it right? I guess it will effect the performance of bucc2018.

According to the code in third_party/utils_retrieve.py, we should dertermine the threshold before generating prediction file:

def bucc_eval(candidates_file, gold_file, src_file, trg_file, src_id_file, trg_id_file, predict_file, mode, threshold=None, encoding='utf-8'):
candidate2score = read_candidate2score(candidates_file, src_file, trg_file, src_id_file, trg_id_file, encoding)
threshold = bucc_optimize(candidate2score, gold)
bitexts = bucc_extract(candidate2score, threshold, predict_file)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation of bucc2018 #60

Evaluation of bucc2018 #60

chenQ1114 commented Mar 5, 2021 •

edited

Loading

Evaluation of bucc2018 #60

Evaluation of bucc2018 #60

Comments

chenQ1114 commented Mar 5, 2021 • edited Loading

chenQ1114 commented Mar 5, 2021 •

edited

Loading