Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scipy.stats throws error when there's no reference summary (factCC and qags dataset) #5

Open
lihebi opened this issue Jan 13, 2023 · 1 comment

Comments

@lihebi
Copy link
Collaborator

lihebi commented Jan 13, 2023

This scipy.stats call:

EvalBase/eval_utils.py

Lines 70 to 84 in cef8dbf

def batched_corr(corr_df, human_scores, batch_result_df, corr_metrics, batchID):
"""Compute the correlations between human scores and automated metric scores on batch of samples, each of which is a pair of (doc, sys summ) or (ref summ, sys summ)
Iteratively add rows to corr_df.
"""
for corr_metric in corr_metrics:
for aspect_name, human_score in human_scores.items():
for (approach, model, score_name) in batch_result_df.columns:
metric_score = batch_result_df[(approach, model, score_name)]
cc = eval(f"scipy.stats.{corr_metric}")(human_score, metric_score)[0]
corr_df.loc[
(corr_metric, aspect_name, approach, model, score_name), # row
batchID
] = cc
return corr_df

will throw error:

File ~/git/evalbase/eval_utils.py:85, in batched_corr(corr_df, human_scores, batch_result_df, corr_metrics, batchID)
     83             else:
     84                 pass
---> 85             cc = eval(f"scipy.stats.{corr_metric}")(human_score, metric_score)[0]
     86             corr_df.loc[
     87                 (corr_metric, aspect_name, approach, model, score_name),  # row
     88                 batchID
     89             ] = cc
     90 return corr_df

File ~/.local/lib/python3.10/site-packages/scipy/stats/_stats_py.py:4411, in pearsonr(x, y, alternative)
   4408     raise ValueError('x and y must have the same length.')
   4410 if n < 2:
-> 4411     raise ValueError('x and y must have length at least 2.')
   4413 x = np.asarray(x)
   4414 y = np.asarray(y)

ValueError: x and y must have length at least 2.

when

human_score.shape = (1,)
metric_score.shape = (1,)

This issue appears in factCC and qags (which have no reference summary), but not in frank (which has reference summary). Not sure if it is caused by the lack of reference summary.

@forrestbao
Copy link
Contributor

forrestbao commented Jan 13, 2023

This issue appears in factCC and qags (which have no reference summary), but not in frank (which has reference summary). Not sure if it is caused by the lack of reference summary.

If the reason is the lack of reference summaries, the error seems to have appeared too late. Can you try to use a placeholder, e.g., "I am a happy reference summary" as reference summaries for FactCC and QAGS? The placeholder will not be used by consistency metrics anyway.

EvalBase currently does NOT support FactCC, Frank, and QAGS. So, can you also make a PR to push your code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants