Skip to content
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
185 changes: 185 additions & 0 deletions evals/evaluation/lm_evaluation_harness/model_card/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
# Model Card Generator

Model Card Generator allows users to create interactive HTML and static Markdown reports containing model performance and fairness metrics.

**Model Card Sections**

<table class="tg">
<thead>
<tr>
<th class="tg-0pky">Section<br></th>
<th class="tg-0pky">Subsection</th>
<th class="tg-73oq">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tg-0pky" rowspan="9">Model Details</td>
<td class="tg-0pky">Overview</td>
<td class="tg-0pky">A brief, one-line description of the model card.</td>
</tr>
<tr>
<td class="tg-0pky">Documentation</td>
<td class="tg-0pky">A thorough description of the model and its usage.</td>
</tr>
<tr>
<td class="tg-0pky">Owners</td>
<td class="tg-0pky">The individuals or teams who own the model.</td>
</tr>
<tr>
<td class="tg-0pky">Version</td>
<td class="tg-0pky">The version of the schema</td>
</tr>
<tr>
<td class="tg-0pky">Licenses</td>
<td class="tg-0pky">The model's license for use.</td>
</tr>
<tr>
<td class="tg-0pky">References</td>
<td class="tg-0pky">Links providing more information about the model.</td>
</tr>
<tr>
<td class="tg-0pky">Citations</td>
<td class="tg-0pky">How to reference this model card.</td>
</tr>
<tr>
<td class="tg-0pky">Path</td>
<td class="tg-0pky">The path where the model is stored.</td>
</tr>
<tr>
<td class="tg-0pky">Graphics</td>
<td class="tg-0pky">Collection of overview graphics.</td>
</tr>
<tr>
<td class="tg-0pky" rowspan="6">Model Parameters</td>
<td class="tg-0pky">Model Architecture</td>
<td class="tg-0pky">The architecture of the model.</td>
</tr>
<tr>
<td class="tg-0pky">Data</td>
<td class="tg-0pky">The datasets used to train and evaluate the model.</td>
</tr>
<tr>
<td class="tg-0pky">Input Format</td>
<td class="tg-0pky">The data format for inputs to the model.</td>
</tr>
<tr>
<td class="tg-0pky">Input Format Map</td>
<td class="tg-0pky">The data format for inputs to the model, in key-value format.</td>
</tr>
<tr>
<td class="tg-0pky">Output Format</td>
<td class="tg-0pky">The data format for outputs from the model.</td>
</tr>
<tr>
<td class="tg-0pky">Output Format Map</td>
<td class="tg-0pky">The data format for outputs from the model, in key-value format.</td>
</tr>
<tr>
<td class="tg-0pky" rowspan="2">Quantitative analysis</td>
<td class="tg-0pky">Performance Metrics</td>
<td class="tg-0pky">The model performance metrics being reported.</td>
</tr>
<tr>
<td class="tg-0pky">Graphics</td>
<td class="tg-0pky">Collection of performance graphics</td>
</tr>
<tr>
<td class="tg-0pky" rowspan="5">Considerations</td>
<td class="tg-0pky">Users</td>
<td class="tg-0pky">Who are the intended users of the model?</td>
</tr>
<tr>
<td class="tg-0pky">Use Cases</td>
<td class="tg-0pky">What are the intended use cases of the model?</td>
</tr>
<tr>
<td class="tg-0pky">Limitations</td>
<td class="tg-0pky">What are the known technical limitations of the model? E.g. What kind(s) of data should the model be expected not to perform well on? What are the factors that might degrade model performance?</td>
</tr>
<tr>
<td class="tg-0pky">Tradeoffs</td>
<td class="tg-0pky">What are the known tradeoffs in accuracy/performance of the model?</td>
</tr>
<tr>
<td class="tg-0pky">Ethical Considerations</td>
<td class="tg-0pky">What are the ethical (or environmental) risks involved in the application of this model?</td>
</tr>
</tbody>
</table>

## Steps to generate a Model Card

**Step 1**: Clone the GitHub repository.

```shell
git clone https://github.com/opea-project/GenAIEval.git
```

**Step 2**: Navigate to `model_card` directory.

```shell
cd evals/evaluation/lm_evaluation_harness/model_card/
```

**Step 3**: Choose a virtual environment to use: eg. Using virtualenv:

```shell
python3 -m virtualenv mg_venv
source mg_venv/bin/activate
```

**Step 4**: Install the required dependencies using `pip`.

```shell
pip install -r requirements.txt
```

**Step 5**: Prepare the input Model Card metadata JSON

Draft your Model Card metadata by following the specified [JSON schema](https://github.com/intel/intel-xai-tools/blob/main/model_card_gen/intel_ai_safety/model_card_gen/schema/v0.0.1/model_card.schema.json) and save the content in a `.json` file. Refer to the above table for sections and fields to include in the JSON file. You can add any fields that comply with the schema, but ensure the required field 'model name' is included."
For guidance, refer to example Model Card JSONs available [here](https://github.com/intel/intel-xai-tools/tree/main/model_card_gen/intel_ai_safety/model_card_gen/docs/examples/json). The path to Model Card metadata JSON should be provided to the `input_mc_metadata_json` argument.

Optionally, specify the template for rendering the model card by replacing `MODEL_CARD_TEMPLATE` with either "html" for an interactive HTML model card or "md" for a static Markdown version. By default, the template type is set to HTML.
Additionally, provide the directory path where the generated model card and related files should be saved using the `OUTPUT_DIRECTORY` argument.

```shell
INPUT_MC_METADATA_JSON_PATH=/path/to/model_card_metadata.json
MODEL_CARD_TEMPLATE="html"
OUTPUT_DIRECTORY=/path/to/output

python examples/main.py --input_mc_metadata_json ${INPUT_MC_METADATA_JSON_PATH} --mc_template_type ${MODEL_CARD_TEMPLATE} --output_dir ${OUTPUT_DIRECTORY}
```

**Step 6 (Optional)**: Generate Performance Metrics

Draft a Metrics by Threshold CSV file based on the generated metric results. To see examples of metric files, click [here](https://github.com/intel/intel-xai-tools/tree/main/model_card_gen/intel_ai_safety/model_card_gen/docs/examples/csv).
For a step-by-step guide on creating these files, follow this [link](https://github.com/intel/intel-xai-tools/blob/main/notebooks/model_card_gen/hugging_face_model_card/hugging-face-model-card.ipynb). The "Metrics by Threshold" section of the Model Card enables you to visually analyze how metric values vary with different probability thresholds.
Provide the path to the Metrics by Threshold CSV file using the `metrics_by_threshold` argument.


Draft a Metrics by Group CSV file based on the generated metric results. To see examples of metric files, click [here](https://github.com/intel/intel-xai-tools/tree/main/model_card_gen/intel_ai_safety/model_card_gen/docs/examples/csv).
For a step-by-step guide on creating these files, follow this [link](https://github.com/intel/intel-xai-tools/blob/main/notebooks/model_card_gen/hugging_face_model_card/hugging-face-model-card.ipynb). The "Metrics by Group" section of Model Card is used to organize and display a model's performance metrics by distinct groups or subcategories within the data. Provide the path to the Metrics by Group CSV file using the `metrics_by_group` argument.

```shell
INPUT_MC_METADATA_JSON_PATH=/path/to/model_card_metadata.json
MODEL_CARD_TEMPLATE="html"
OUTPUT_DIRECTORY=/path/to/output
METRICS_BY_THRESHOLD=/path/to/metrics_by_threshold.csv
METRICS_BY_GROUP=/path/to/metrics_by_group.csv

python examples/main.py --input_mc_metadata_json ${INPUT_MC_METADATA_JSON_PATH} --mc_template_type ${MODEL_CARD_TEMPLATE} --output_dir ${OUTPUT_DIRECTORY} --metrics_by_threshold ${METRICS_BY_THRESHOLD} --metrics_by_group ${METRICS_BY_GROUP}
```

**Step 7 (Optional)**: Optional Step to generate Metrics by Threshold for `lm_evaluation_harness`

Additionally, you can generate a Metrics by Threshold CSV for some of the `lm_evaluation_harness` tasks by providing the path to the metric results JSONL file in place of `METRICS_RESULTS_PATH`.

```shell
INPUT_MC_METADATA_JSON_PATH=/path/to/model_card_metadata.json
MODEL_CARD_TEMPLATE="html"
OUTPUT_DIRECTORY=/path/to/output
METRICS_RESULTS_PATH=/path/to/metrics_results.jsonl

python ./examples/main.py --input_mc_metadata_json ${INPUT_MC_METADATA_JSON_PATH} --mc_template_type ${MODEL_CARD_TEMPLATE} --output_dir ${OUTPUT_DIRECTORY} --metric_results_path ${METRICS_RESULTS_PATH}
```
45 changes: 45 additions & 0 deletions evals/evaluation/lm_evaluation_harness/model_card/arguments.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0


import argparse


def parse_arguments():
parser = argparse.ArgumentParser(description="Generate a model card with optional metrics processing.")
parser.add_argument(
"--input_mc_metadata_json",
type=str,
required=True,
help="Path to the JSON file containing input model card metadata.",
)
parser.add_argument(
"--metrics_by_threshold",
type=str,
default=None,
help="Metrics by threshold dataframe or the path to the metrics by threshold CSV file.",
)
parser.add_argument(
"--metrics_by_group",
type=str,
default=None,
help="Metrics by group dataframe or Path to the metrics by group CSV file.",
)
parser.add_argument(
"--metric_results_path",
type=str,
default=None,
help="Path to the metric results JSONL file for which metrics by threshold dataframe needs to be generated.",
)
parser.add_argument(
"--mc_template_type",
type=str,
default="html",
help="Template to use for rendering the model card. html for an interactive HTML model card or md for a static Markdown version. Defaults to html",
)
parser.add_argument(
"--output_dir", type=str, default=None, help="Directory to save the generated model card and related files."
)
args = parser.parse_args()

return args
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0


import os

from evals.evaluation.lm_evaluation_harness.model_card.arguments import parse_arguments
from evals.evaluation.lm_evaluation_harness.model_card.generate_model_card import generate_model_card
from evals.evaluation.lm_evaluation_harness.model_card.utils import generate_metrics_by_threshold, generate_pred_prob


def main():
args = parse_arguments()
metric_results_path = args.metric_results_path
output_dir = args.output_dir
metrics_by_threshold = args.metrics_by_threshold
# Generate the metrics by threshold for the metric results if provided by the user

if metric_results_path:
if not os.path.exists(args.metric_results_path):
raise FileNotFoundError(
f"The file at {metric_results_path} does not exist. Please provide a valid file path."
)

try:
y_pred_prob, labels, num_options, class_label_index_map = generate_pred_prob(metric_results_path)
metrics_by_threshold = generate_metrics_by_threshold(
y_pred_prob, labels, num_options, class_label_index_map, output_dir
)
except OSError as e:
print(f"Error: {e}")
except Exception:
print("Task is currently not supported for metrics by threshold generation.")
return

# Generate the model card
model_card = generate_model_card(
args.input_mc_metadata_json,
metrics_by_threshold,
args.metrics_by_group,
mc_template_type=args.mc_template_type,
output_dir=output_dir,
)
return model_card


if __name__ == "__main__":
main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0


import json
import os

from intel_ai_safety.model_card_gen.model_card_gen import ModelCardGen
from intel_ai_safety.model_card_gen.validation import validate_json_schema
from jsonschema import ValidationError


def generate_model_card(
input_mc_metadata_json_path,
metric_by_threshold=None,
metric_by_group=None,
mc_template_type="html",
output_dir=None,
):
"""Generates an HTML or Markdown representation of a model card.

Parameters:
input_mc_metadata_json_path (json, required): The model card JSON object containing the model's metadata and other details.
metric_threshold_csv (str, optional): The file path to a CSV containing metric threshold data.
metric_grp_csv (str, optional): The file path to a CSV containing metric group data.
mc_template_type (str, optional): Template to use for rendering the model card. Options include "html" for an interactive HTML model card or "md" for a static Markdown version. Defaults to "html"
output_dir (str, optional): The directory where the model card file will be saved. Defaults to the current directory.

Returns:
str: The HTML or Markdown representation of the model card.
"""
if output_dir is None:
output_dir = os.getcwd()

if os.path.exists(input_mc_metadata_json_path) and os.path.isfile(input_mc_metadata_json_path):
try:
with open(input_mc_metadata_json_path, "r") as file:
model_card_json = json.load(file)

except json.JSONDecodeError as e:
raise ValueError("The file content is not valid JSON.") from e
else:
raise FileNotFoundError(f"The JSON file at {input_mc_metadata_json_path} does not exist.")

try:
validate_json_schema(model_card_json)

except ValidationError as e:
raise ValidationError(
"Warning: The schema version of the uploaded JSON does not correspond to a model card schema version or "
"the uploaded JSON does not follow the model card schema."
)

model_card = ModelCardGen.generate(
model_card_json,
metrics_by_threshold=metric_by_threshold,
metrics_by_group=metric_by_group,
template_type=mc_template_type,
)

model_card_name = f"Model Card.{mc_template_type}"

full_path = os.path.join(output_dir, model_card_name)
model_card.export_model_card(full_path)

if mc_template_type == "html":
return model_card._repr_html_()
else:
return model_card._repr_md_()
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
intel-ai-safety-model-card-gen@git+https://github.com/intel/intel-xai-tools.git#subdirectory=model_card_gen
kaleido
lm-eval==0.4.3
lxml
numpy
pandas
plotly
scikit-learn
Loading