Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mixtral: accuracy check output contains np.float64(...) which doesnt suit metric regex #1763

Closed
viraatc opened this issue Jul 2, 2024 · 0 comments · Fixed by #1764
Closed

Comments

@viraatc
Copy link
Contributor

viraatc commented Jul 2, 2024

The output of mixtral-8x7b/evaluate-accuracy.py:

xx.xx% pass@1
{'typescript': x, 'ruby': x, 'python': x, 'javascript': x, 'php': x, 'cpp': x}
{'typescript': x, 'ruby': x, 'python': x, 'javascript': x, 'php': x, 'cpp': x}
Results
{
    'rouge1': np.float64(x),
    'rouge2': np.float64(x),
    'rougeL': np.float64(x),
    'rougeLsum': np.float64(x),
    'gsm8k': x,
    'mbxp': x,
    'gen_len': np.int64(x),
    'gen_num': x,
    'gen_tok_len': x,
    'tokens_per_sample': x
}

conatins np.float64(...) text wrapped around fp value for given fields:

  • rogue1, rogue2, rogueL, rogueLsum

contains np.int64 text around long value for given fields:

  • gen_len

this fails the regexes we have defined:

"ROUGE1": r".*'rouge1':\s([\d.]+).*",
"ROUGE2": r".*'rouge2':\s([\d.]+).*",
"ROUGEL": r".*'rougeL':\s([\d.]+).*",
"ROUGELSUM": r".*'rougeLsum':\s([\d.]+).*",
"GEN_LEN": r".*'gen_len':\s([\d.]+).*",

im not entirely sure why this behaves differently from our llama script:

result = {k: round(np.mean(v) * 100, 4) for k, v in result.items()}

Verified Fix:

  1. wrap here with built-in float(round(...))
    result = {k: round(np.mean(v) * 100, 4) for k, v in result.items()}
  2. wrap here with built-in int int(np.sum(...))
    'gen_len': np.sum(prediction_lens),
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant