Skip to content

Commit

Permalink
Add reference to LLM evaluation page in Prompt Engineering UI docs (m…
Browse files Browse the repository at this point in the history
…lflow#10279)

Signed-off-by: Daniel Lok <[email protected]>
  • Loading branch information
daniellok-db committed Nov 15, 2023
1 parent daafa3a commit 46a40a7
Show file tree
Hide file tree
Showing 3 changed files with 65 additions and 4 deletions.
Binary file added docs/source/_static/images/evaluate_metrics.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 6 additions & 0 deletions docs/source/llms/llm-evaluate/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,8 @@ There are two ways to select metrics to evaluate your model:
* Use **default** metrics for pre-defined model types.
* Use a **custom** list of metrics.

.. _llm-eval-default-metrics:

Use Default Metrics for Pre-defined Model Types
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -173,6 +175,8 @@ The supported LLM model types and associated metrics are listed below:
:sup:`3` Requires package `evaluate <https://pypi.org/project/evaluate>`_, `nltk <https://pypi.org/project/nltk>`_, and
`rouge-score <https://pypi.org/project/rouge-score>`_

.. _llm-eval-custom-metrics:

Use a Custom List of Metrics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -441,6 +445,8 @@ up OpenAI authentication to run the code below.
model_type="question-answering",
)
.. _llm-eval-static-dataset:

Evaluating with a Static Dataset
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
63 changes: 59 additions & 4 deletions docs/source/llms/prompt-engineering/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -248,11 +248,11 @@ as follows:
.. _quickstart-score:

Step 12: Score or deploy the best configuration programmatically
Step 12: Generate predictions programmatically
----------------------------------------------------------------
Once you have found a configuration of LLM, prompt template, and parameters that performs well, you
can score the corresponding MLflow Model in a Python environment of your choosing, or you can
:ref:`deploy it for real-time serving <deploy-prompt-serving>`.
can generate predictions using the corresponding MLflow Model in a Python environment of your choosing,
or you can :ref:`deploy it for real-time serving <deploy-prompt-serving>`.

1. To load the MLflow Model in a notebook for batch inference, click on the Run's name to open the
**Run Page** and select the *model* directory in the **Artifact Viewer**. Then, copy the first
Expand All @@ -272,7 +272,7 @@ can score the corresponding MLflow Model in a Python environment of your choosin
# Load model as a PyFuncModel.
loaded_model = mlflow.pyfunc.load_model(logged_model)
2. Then, to score the model, call the :py:func:`predict() <mlflow.pyfunc.PyFuncModel.predict>` method
2. Then, to generate predictions, call the :py:func:`predict() <mlflow.pyfunc.PyFuncModel.predict>` method
and pass in a dictionary of input variables. For example:

.. code-block:: python
Expand All @@ -293,6 +293,61 @@ can score the corresponding MLflow Model in a Python environment of your choosin
For more information about deployment for real-time serving with MLflow,
see the :ref:`instructions below <deploy-prompt-serving>`.

Step 13: Perform metric-based evaluation of your model's outputs
----------------------------------------------------------------
If you'd like to assess your model's performance on specific metrics, MLflow provides the :py:func:`mlflow.evaluate()`
API. Let's evaluate our model on some :ref:`pre-defined metrics <llm-eval-default-metrics>`
for text summarization:

.. code-block:: python
import mlflow
import pandas as pd
logged_model = "runs:/840a5c43f3fb46f2a2059b761557c1d0/model"
article_text = """
An MLflow Project is a format for packaging data science code in a reusable and reproducible way.
The MLflow Projects component includes an API and command-line tools for running projects, which
also integrate with the Tracking component to automatically record the parameters and git commit
of your source code for reproducibility.
This article describes the format of an MLflow Project and how to run an MLflow project remotely
using the MLflow CLI, which makes it easy to vertically scale your data science code.
"""
question = "What is an MLflow project?"
data = pd.DataFrame(
{
"article": [article_text],
"question": [question],
"ground_truth": [
article_text
], # used for certain evaluation metrics, such as ROUGE score
}
)
with mlflow.start_run():
results = mlflow.evaluate(
model=logged_model,
data=data,
targets="ground_truth",
model_type="text-summarization",
)
eval_table = results.tables["eval_results_table"]
print(f"See evaluation table below: \n{eval_table}")
The evaluation results can also be viewed in the MLflow Evaluation UI:

.. figure:: ../../_static/images/evaluate_metrics.png
:scale: 40%
:align: center

The :py:func:`mlflow.evaluate()` API also supports :ref:`custom metrics <llm-eval-custom-metrics>`,
:ref:`static dataset evaluation <llm-eval-static-dataset>`, and much more. For a
more in-depth guide, see :ref:`llm-eval`.

.. _deploy-prompt-serving:

Deployment for real-time serving
Expand Down

0 comments on commit 46a40a7

Please sign in to comment.