Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Composite evaluators and adding more e2e tests #2888

Merged
merged 11 commits into from
Apr 22, 2024

Conversation

ninghu
Copy link
Member

@ninghu ninghu commented Apr 18, 2024

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

All Promptflow Contribution checklist:

  • The pull request does not introduce [breaking changes].
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.
  • Create an issue and link to the pull request to get dedicated review from promptflow team. Learn more: suggested workflow.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

Copy link

github-actions bot commented Apr 18, 2024

promptflow-evals test result

 12 files  ±  0  12 suites  ±0   37m 10s ⏱️ + 35m 27s
 12 tests  -  18   8 ✅  -  22   4 💤 + 4  0 ❌ ±0 
144 runs   - 216  96 ✅  - 264  48 💤 +48  0 ❌ ±0 

Results for commit b541d80. ± Comparison against base commit 5a4eb58.

This pull request removes 30 and adds 12 tests. Note that renamed tests count towards both.
tests.evals.unittests.test_chat_evaluator.TestChatEvaluator ‑ test_conversation_validation_invalid_citations
tests.evals.unittests.test_chat_evaluator.TestChatEvaluator ‑ test_conversation_validation_missing_role
tests.evals.unittests.test_chat_evaluator.TestChatEvaluator ‑ test_conversation_validation_normal
tests.evals.unittests.test_chat_evaluator.TestChatEvaluator ‑ test_conversation_validation_question_answer_not_paired
tests.evals.unittests.test_chat_evaluator.TestChatEvaluator ‑ test_per_turn_results_aggregation
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_evaluate_evaluators_not_a_dict
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_evaluate_invalid_data
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_evaluate_invalid_jsonl_data
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_evaluate_missing_data
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_evaluate_missing_required_inputs
…
tests.evals.e2etests.test_builtin_evaluators.TestBuiltInEvaluators ‑ test_composite_evaluator_chat[False-False]
tests.evals.e2etests.test_builtin_evaluators.TestBuiltInEvaluators ‑ test_composite_evaluator_chat[False-True]
tests.evals.e2etests.test_builtin_evaluators.TestBuiltInEvaluators ‑ test_composite_evaluator_chat[True-False]
tests.evals.e2etests.test_builtin_evaluators.TestBuiltInEvaluators ‑ test_composite_evaluator_chat[True-True]
tests.evals.e2etests.test_builtin_evaluators.TestBuiltInEvaluators ‑ test_composite_evaluator_content_safety[False]
tests.evals.e2etests.test_builtin_evaluators.TestBuiltInEvaluators ‑ test_composite_evaluator_content_safety[True]
tests.evals.e2etests.test_builtin_evaluators.TestBuiltInEvaluators ‑ test_composite_evaluator_qa[False]
tests.evals.e2etests.test_builtin_evaluators.TestBuiltInEvaluators ‑ test_composite_evaluator_qa[True]
tests.evals.e2etests.test_builtin_evaluators.TestBuiltInEvaluators ‑ test_individual_evaluator_prompt_based
tests.evals.e2etests.test_builtin_evaluators.TestBuiltInEvaluators ‑ test_individual_evaluator_service_based
…

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Apr 18, 2024

SDK CLI Test Result users/ninhu/composite_evaluators

    4 files      4 suites   1h 4m 27s ⏱️
  690 tests   663 ✅  27 💤 0 ❌
2 760 runs  2 652 ✅ 108 💤 0 ❌

Results for commit b541d80.

♻️ This comment has been updated with latest results.

@ninghu ninghu marked this pull request as ready for review April 19, 2024 00:08
@ninghu ninghu requested review from a team as code owners April 19, 2024 00:08
@ninghu ninghu merged commit 3f67e0d into main Apr 22, 2024
45 checks passed
@ninghu ninghu deleted the users/ninhu/composite_evaluators branch April 22, 2024 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants