Skip to content

Commit

Permalink
Log performance metrics to the application insights (#3493)
Browse files Browse the repository at this point in the history
# Description

In this PR we are adding metrics logging to the application insights,
using the functionality from promptflow-devkit.
1. Added live test, where we are checking the install time, and test run
times on the real ai workspace.
2. Couple of tests were re-ecorded.
3. Log run times for all end to end tests.


# All Promptflow Contribution checklist:
- [x] **The pull request does not introduce [breaking changes].**
- [x] **CHANGELOG is updated for new features, bug fixes or other
significant changes.**
- [x] **I have read the [contribution guidelines](../CONTRIBUTING.md).**
- [x] **Create an issue and link to the pull request to get dedicated
review from promptflow team. Learn more: [suggested
workflow](../CONTRIBUTING.md#suggested-workflow).**

## General Guidelines and Best Practices
- [x] Title of the pull request is clear and informative.
- [x] There are a small number of commits, each of which have an
informative message. This means that previously merged commits do not
appear in the history of the PR. For more information on cleaning up the
commits in your PR, [see this
page](https://github.com/Azure/azure-powershell/blob/master/documentation/development-docs/cleaning-up-commits.md).

### Testing Guidelines
- [x] Pull request includes test coverage for the included changes.
  • Loading branch information
nick863 authored Jul 11, 2024
1 parent b982cf0 commit 5a3396f
Show file tree
Hide file tree
Showing 7 changed files with 7,347 additions and 692 deletions.
5 changes: 4 additions & 1 deletion .github/workflows/promptflow-evals-e2e-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,10 @@ jobs:
creds: ${{ secrets.PF_EVALS_SP_CREDENTIALS }}
enable-AzPSSession: true
- name: run e2e tests
run: poetry run pytest -m e2etest --cov=promptflow --cov-config=pyproject.toml --cov-report=term --cov-report=html --cov-report=xml
id: run_all_e2e_tests
run: |
poetry run pytest -m e2etest --cov=promptflow --cov-config=pyproject.toml --cov-report=term --cov-report=html --cov-report=xml
poetry run python ../../scripts/code_qa/report_to_app_insights.py --activity all_e2e_tests_run_times --junit-xml test-results.xml --git-hub-action-run-id ${{ github.run_id }} --git-hub-workflow ${{ github.workflow }} --git-hub-action ${{ github.action }} --git-branch ${{ github.ref }}
working-directory: ${{ env.WORKING_DIRECTORY }}
- name: upload coverage report
uses: actions/upload-artifact@v4
Expand Down
91 changes: 91 additions & 0 deletions .github/workflows/promptflow-evals-regression-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
name: promptflow-evals-regression-test

on:
schedule:
- cron: "40 10 * * *" # 2:40 PST every day
pull_request:
paths:
- src/promptflow-evals/**
- .github/workflows/promptflow-evals-regression-test.yml
workflow_dispatch:

env:
IS_IN_CI_PIPELINE: "true"
WORKING_DIRECTORY: ${{ github.workspace }}/src/promptflow-evals
PROMPT_FLOW_TEST_MODE: "live"

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: snok/install-poetry@v1
- name: build
run: poetry build
working-directory: ${{ env.WORKING_DIRECTORY }}
- uses: actions/upload-artifact@v4
with:
name: promptflow-evals
path: ${{ env.WORKING_DIRECTORY }}/dist/promptflow_evals-*.whl

test:
needs: build
strategy:
matrix:
os: [ubuntu-latest, windows-latest, macos-13]
# TODO: Encounter hash mismatch for ubuntu-latest and 3.9 combination during installing promptflow-evals package
# https://github.com/microsoft/promptflow/actions/runs/9009397933/job/24753518853?pr=3158
# Add 3.9 back after we figure out the issue
python-version: ['3.8', '3.10', '3.11']
fail-fast: false
# snok/install-poetry need this to support Windows
defaults:
run:
shell: bash
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- uses: snok/install-poetry@v1
- uses: actions/download-artifact@v4
with:
name: promptflow-evals
path: ${{ env.WORKING_DIRECTORY }}
- name: install test dependency group
run: poetry install --only test
working-directory: ${{ env.WORKING_DIRECTORY }}
- name: install promptflow-evals from wheel
id: install_promptflow
run: |
# Estimate the installation time.
export start_tm=`date +%s`
poetry run pip install -e ../promptflow
poetry run pip install -e ../promptflow-core
poetry run pip install -e ../promptflow-devkit
poetry run pip install -e ../promptflow-tracing
poetry run pip install -e ../promptflow-tools
poetry run pip install -e ../promptflow-azure
poetry run pip install --pre $(python -c "import glob; print(glob.glob('promptflow_evals-*.whl')[0])")
export install_time=$((`date +%s` - ${start_tm}))
poetry run python ../../scripts/code_qa/report_to_app_insights.py --activity install_time_s --value $install_time --git-hub-action-run-id ${{ github.run_id }} --git-hub-workflow ${{ github.workflow }} --git-hub-action ${{ github.action }} --git-branch ${{ github.ref }}
test ${install_time} -le $TIME_LIMIT || echo "::warning file=pyproject.toml,line=40,col=0::The installation took ${install_time} minutes, the limit is ${TIME_LIMIT}."
working-directory: ${{ env.WORKING_DIRECTORY }}
- name: install recording
run: poetry run pip install -e ../promptflow-recording
working-directory: ${{ env.WORKING_DIRECTORY }}
- name: generate end-to-end test config from secret
run: echo '${{ secrets.PF_EVALS_E2E_TEST_CONFIG }}' >> connections.json
working-directory: ${{ env.WORKING_DIRECTORY }}
- uses: azure/login@v2
with:
creds: ${{ secrets.PF_EVALS_SP_CREDENTIALS }}
enable-AzPSSession: true
- name: run performance tests
id: performance_tests
run: |
# Estimate the run time for evaluator.
poetry run pytest -m performance_test --junit-xml=test-results.xml
poetry run python ../../scripts/code_qa/report_to_app_insights.py --activity evaluator_live_tests_run_time_s --junit-xml test-results.xml --git-hub-action-run-id ${{ github.run_id }} --git-hub-workflow ${{ github.workflow }} --git-hub-action ${{ github.action }} --git-branch ${{ github.ref }}
working-directory: ${{ env.WORKING_DIRECTORY }}
4 changes: 3 additions & 1 deletion .github/workflows/promptflow-evals-unit-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,9 @@ jobs:
run: poetry run pip install -e ../promptflow-recording
working-directory: ${{ env.WORKING_DIRECTORY }}
- name: run unit tests
run: poetry run pytest -m unittest --cov=promptflow --cov-config=pyproject.toml --cov-report=term --cov-report=html --cov-report=xml --cov-fail-under=63
id: run_unit_tests
run: |
poetry run pytest -m unittest --cov=promptflow --cov-config=pyproject.toml --cov-report=term --cov-report=html --cov-report=xml --cov-fail-under=63
working-directory: ${{ env.WORKING_DIRECTORY }}
- name: upload coverage report
uses: actions/upload-artifact@v4
Expand Down
99 changes: 99 additions & 0 deletions scripts/code_qa/report_to_app_insights.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
from typing import Dict, Optional, Union

import argparse
import platform

from promptflow._sdk._configuration import Configuration
from promptflow._sdk._telemetry.telemetry import get_telemetry_logger
from xml.dom import minidom


def parse_junit_xml(fle: str) -> Dict[str, Dict[str, Union[float, str]]]:
"""
Parse the xml in Junit xml format.
:param fle: The file in JUnit xml format.
:type fle: str
:return: The dictionary with tests, their run times and pass/fail status.
"""
test_results = {}
dom = minidom.parse(fle)
# Take node list Document/testsuites/testsuite/
for test in dom.firstChild.firstChild.childNodes:
test_name = f"{test.attributes['classname'].value}::{test.attributes['name'].value}"
test_results[test_name] = {'fail_message': '', 'time': float(test.attributes['time'].value)}

for child in test.childNodes:
if child.nodeName == 'failure':
test_results['fail_message'] = child.attributes["message"].value
return test_results


def main(activity_name: str,
value: float,
run_id: str,
workflow: str,
action: str,
branch: str,
junit_file: Optional[str]) -> None:
"""
Log the CI-CD event.
:param activity_name: The name of a n activity to be logged, for example, installation time.
:type activity_name: str
:param value: The value of a parameter
:type value: float
:param run_id: The CI-CD run id.
:type run_id: str
:param workflow: The name of a workflow or path to a workflow file.
:type workflow: str
:param action: The name of running action or a step.
:type action: str
:param branch: The branch from which the CI-CD was triggered.
:type branch: str
:param junit_file: The path to junit test file results.
:type junit_file: str
"""
# Enable telemetry
config = Configuration.get_instance()
config.set_config(Configuration.COLLECT_TELEMETRY, True)
logger = get_telemetry_logger()
activity_info = {
"activity_name": activity_name,
"activity_type": "ci_cd_analytics",
"OS": platform.system(),
"OS_release": platform.release(),
"branch": branch,
"git_hub_action_run_id": run_id,
"git_hub_workflow": workflow
}
if junit_file:
junit_dict = parse_junit_xml(junit_file)
for k, v in junit_dict.items():
activity_info[k] = -1 if v["fail_message"] else v['time']
else:
activity_info["value"] = value

# write information to the application insights.
logger.info(action, extra={"custom_dimensions": activity_info})


if __name__ == '__main__':
parser = argparse.ArgumentParser(
description="Log the value to application insights along with platform characteristics and run ID.")
parser.add_argument('--activity', help='The activity to be logged.',
required=True)
parser.add_argument('--value', type=float, help='The value for activity.',
required=False, default=-1)
parser.add_argument('--junit-xml', help='The path to junit-xml file.',
dest="junit_xml", required=False, default=None)
parser.add_argument('--git-hub-action-run-id', dest='run_id',
help='The run ID of GitHub action run.', required=True)
parser.add_argument('--git-hub-workflow', dest='workflow',
help='The name of a workflow or a path to workflow file.', required=True)
parser.add_argument('--git-hub-action', dest='action',
help='Git hub action or step.', required=True)
parser.add_argument('--git-branch', dest='branch',
help='Git hub Branch.', required=True)
args = parser.parse_args()
main(args.activity, args.value, args.run_id, args.workflow, args.action, args.branch, args.junit_xml)
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ def test_log_artifact(self, setup_data, caplog, tmp_path):
ev_run.log_artifact(tmp_path)
self._assert_no_errors_for_module(caplog.records, EvalRun.__module__)

@pytest.mark.skip(reason="Test runs individually but not when run with entire suite.")
@pytest.mark.performance_test
@pytest.mark.usefixtures("vcr_recording")
def test_e2e_run_target_fn(self, caplog, project_scope, questions_answers_file, monkeypatch):
"""Test evaluation run logging."""
Expand Down Expand Up @@ -161,7 +161,7 @@ def test_e2e_run_target_fn(self, caplog, project_scope, questions_answers_file,
)
self._assert_no_errors_for_module(caplog.records, (ev_utils.__name__, EvalRun.__module__))

@pytest.mark.skip(reason="Test runs individually but not when run with entire suite.")
@pytest.mark.performance_test
@pytest.mark.usefixtures("vcr_recording")
def test_e2e_run(self, caplog, project_scope, questions_answers_file, monkeypatch):
"""Test evaluation run logging."""
Expand Down
Loading

0 comments on commit 5a3396f

Please sign in to comment.