Skip to content

Commit

Permalink
[Fix] To change the model zip file name from hugging face org id to …
Browse files Browse the repository at this point in the history
…a custom prefix when upload_prefix provided. (#413)

* [Feature] Add a workflow parameter that model uploader can specific a customize prefix.

Signed-off-by: conggguan <[email protected]>

* [Fix] To change the model zip file name from hugging face org id to a custom prefix when upload_prefix provided.

Signed-off-by: conggguan <[email protected]>

* [Fix] Revert the redundant history.

Signed-off-by: conggguan <[email protected]>

* [Add] add a changelog item.

Signed-off-by: conggguan <[email protected]>

---------

Signed-off-by: conggguan <[email protected]>
  • Loading branch information
conggguan authored Aug 10, 2024
1 parent f41c2ef commit 7002c56
Show file tree
Hide file tree
Showing 9 changed files with 46 additions and 41 deletions.
3 changes: 2 additions & 1 deletion .ci/run-repository.sh
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ elif [[ "$TASK_TYPE" == "SentenceTransformerTrace" || "$TASK_TYPE" == "SparseTra
echo -e "\033[34;1mINFO:\033[0m TRACING_FORMAT: ${TRACING_FORMAT}\033[0m"
echo -e "\033[34;1mINFO:\033[0m EMBEDDING_DIMENSION: ${EMBEDDING_DIMENSION:-N/A}\033[0m"
echo -e "\033[34;1mINFO:\033[0m POOLING_MODE: ${POOLING_MODE:-N/A}\033[0m"
echo -e "\033[34;1mINFO:\033[0m UPLOAD_PREFIX: ${UPLOAD_PREFIX:-N/A}\033[0m"
echo -e "\033[34;1mINFO:\033[0m MODEL_DESCRIPTION: ${MODEL_DESCRIPTION:-N/A}\033[0m"

if [[ "$TASK_TYPE" == "SentenceTransformerTrace" ]]; then
Expand All @@ -95,7 +96,7 @@ elif [[ "$TASK_TYPE" == "SentenceTransformerTrace" || "$TASK_TYPE" == "SparseTra
--env "TEST_TYPE=server" \
--name opensearch-py-ml-trace-runner \
opensearch-project/opensearch-py-ml \
nox -s "${NOX_TRACE_TYPE}-${PYTHON_VERSION}" -- ${MODEL_ID} ${MODEL_VERSION} ${TRACING_FORMAT} ${EXTRA_ARGS} -md ${MODEL_DESCRIPTION:+"$MODEL_DESCRIPTION"}
nox -s "${NOX_TRACE_TYPE}-${PYTHON_VERSION}" -- ${MODEL_ID} ${MODEL_VERSION} ${TRACING_FORMAT} ${EXTRA_ARGS} -up ${UPLOAD_PREFIX} -md ${MODEL_DESCRIPTION:+"$MODEL_DESCRIPTION"}

# To upload a model, we need the model artifact, description, license files into local path
# trace_output should include description and license file.
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/model_uploader.yml
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,8 @@ jobs:
echo "MODEL_VERSION=${{ github.event.inputs.model_version }}" >> $GITHUB_ENV
echo "TRACING_FORMAT=${{ github.event.inputs.tracing_format }}" >> $GITHUB_ENV
echo "EMBEDDING_DIMENSION=${{ github.event.inputs.embedding_dimension }}" >> $GITHUB_ENV
echo "POOLING_MODE=${{ github.event.inputs.pooling_mode }}" >> $GITHUB_ENV
echo "POOLING_MODE=${{ github.event.inputs.pooling_mode }}" >> $GITHUB_ENV
echo "UPLOAD_PREFIX=${{ github.event.inputs.upload_prefix }}" >> $GITHUB_ENV
echo "MODEL_DESCRIPTION=${{ github.event.inputs.model_description }}" >> $GITHUB_ENV
- name: Autotracing ${{ matrix.cluster }} secured=${{ matrix.secured }} version=${{matrix.entry.opensearch_version}}
run: "./.ci/run-tests ${{ matrix.cluster }} ${{ matrix.secured }} ${{ matrix.entry.opensearch_version }} ${{github.event.inputs.model_type}}Trace"
Expand Down
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
- updating listing file with three v2 sparse model - by @dhrubo-os ([#412](https://github.com/opensearch-project/opensearch-py-ml/pull/412))

### Fixed
- Fix the wrong final zip file name in model_uploader workflow, now will name it by the upload_prefix alse.([#413](https://github.com/opensearch-project/opensearch-py-ml/pull/413/files))
- Fix the wrong input parameter for model_uploader's base_download_path in jekins trigger.([#402](https://github.com/opensearch-project/opensearch-py-ml/pull/402))
- Enable make_model_config_json to add model description to model config file by @thanawan-atc in ([#203](https://github.com/opensearch-project/opensearch-py-ml/pull/203))
- Correct demo_ml_commons_integration.ipynb by @thanawan-atc in ([#208](https://github.com/opensearch-project/opensearch-py-ml/pull/208))
Expand Down
4 changes: 2 additions & 2 deletions opensearch_py_ml/ml_models/sparse_encoding_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,8 @@ def save_as_pt(
add_apache_license: bool = True,
) -> str:
"""
Download sentence transformer model directly from huggingface, convert model to torch script format,
zip the model file and its tokenizer.json file to prepare to upload to the Open Search cluster
Download sparse encoding model directly from huggingface, convert model to torch script format,
zip the model file and its tokenizer.json file to prepare to upload to the OpenSearch cluster
:param sentences:
Required, for example sentences = ['today is sunny']
Expand Down
7 changes: 6 additions & 1 deletion utils/model_uploader/autotracing_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,7 @@ def prepare_files_for_uploading(
model_format: str,
src_model_path: str,
src_model_config_path: str,
upload_prefix: str = None,
) -> tuple[str, str]:
"""
Prepare files for uploading by storing them in UPLOAD_FOLDER_PATH
Expand All @@ -253,7 +254,11 @@ def prepare_files_for_uploading(
(path to model config json file) in the UPLOAD_FOLDER_PATH
:rtype: Tuple[str, str]
"""
model_type, model_name = model_id.split("/")
model_type, model_name = (
model_id.split("/")
if upload_prefix is None
else (upload_prefix, model_id.split("/")[-1])
)
model_format = model_format.lower()
folder_to_delete = (
TORCHSCRIPT_FOLDER_PATH if model_format == "torch_script" else ONNX_FOLDER_PATH
Expand Down
11 changes: 11 additions & 0 deletions utils/model_uploader/model_autotracing.py
Original file line number Diff line number Diff line change
Expand Up @@ -281,6 +281,7 @@ def main(
embedding_dimension: Optional[int] = None,
pooling_mode: Optional[str] = None,
model_description: Optional[str] = None,
upload_prefix: Optional[str] = None,
) -> None:
"""
Perform model auto-tracing and prepare files for uploading to OpenSearch model hub
Expand Down Expand Up @@ -363,6 +364,7 @@ def main(
TORCH_SCRIPT_FORMAT,
torchscript_model_path,
torchscript_model_config_path,
upload_prefix,
)

config_path_for_checking_description = torchscript_dst_model_config_path
Expand Down Expand Up @@ -425,6 +427,14 @@ def main(
choices=["BOTH", "TORCH_SCRIPT", "ONNX"],
help="Model format for auto-tracing",
)
parser.add_argument(
"-up",
"--upload_prefix",
type=str,
nargs="?",
default=None,
help="Model customize path prefix for upload",
)
parser.add_argument(
"-ed",
"--embedding_dimension",
Expand Down Expand Up @@ -462,4 +472,5 @@ def main(
args.embedding_dimension,
args.pooling_mode,
args.model_description,
args.upload_prefix,
)
25 changes: 22 additions & 3 deletions utils/model_uploader/sparse_model_autotracing.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,7 @@ def main(
model_version: str,
tracing_format: str,
model_description: Optional[str] = None,
upload_prefix: Optional[str] = None,
) -> None:
"""
Perform model auto-tracing and prepare files for uploading to OpenSearch model hub
Expand Down Expand Up @@ -235,7 +236,10 @@ def main(
torchscript_model_path,
torchscript_model_config_path,
) = trace_sparse_encoding_model(
model_id, model_version, TORCH_SCRIPT_FORMAT, model_description=None
model_id,
model_version,
TORCH_SCRIPT_FORMAT,
model_description=model_description,
)

torchscript_encoding_datas = register_and_deploy_sparse_encoding_model(
Expand All @@ -262,6 +266,7 @@ def main(
TORCH_SCRIPT_FORMAT,
torchscript_model_path,
torchscript_model_config_path,
upload_prefix,
)

config_path_for_checking_description = torchscript_dst_model_config_path
Expand All @@ -273,7 +278,7 @@ def main(
onnx_model_path,
onnx_model_config_path,
) = trace_sparse_encoding_model(
model_id, model_version, ONNX_FORMAT, model_description=None
model_id, model_version, ONNX_FORMAT, model_description=model_description
)

onnx_embedding_datas = register_and_deploy_sparse_encoding_model(
Expand Down Expand Up @@ -325,6 +330,14 @@ def main(
choices=["BOTH", "TORCH_SCRIPT", "ONNX"],
help="Model format for auto-tracing",
)
parser.add_argument(
"-up",
"--upload_prefix",
type=str,
nargs="?",
default=None,
help="Model customize path prefix for upload",
)
parser.add_argument(
"-md",
"--model_description",
Expand All @@ -336,4 +349,10 @@ def main(
)
args = parser.parse_args()

main(args.model_id, args.model_version, args.tracing_format, args.model_description)
main(
args.model_id,
args.model_version,
args.tracing_format,
args.model_description,
args.upload_prefix,
)
3 changes: 0 additions & 3 deletions utils/model_uploader/upload_history/MODEL_UPLOAD_HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,3 @@ The following table shows sentence transformer model upload history.
|2023-09-13 18:03:32|@dhrubo-os|`sentence-transformers/distiluse-base-multilingual-cased-v1`|1.0.1|TORCH_SCRIPT|N/A|N/A|6178024517|
|2023-10-18 18:06:15|@dhrubo-os|`sentence-transformers/paraphrase-mpnet-base-v2`|1.0.0|ONNX|N/A|N/A|6568285400|
|2023-10-18 18:06:15|@dhrubo-os|`sentence-transformers/paraphrase-mpnet-base-v2`|1.0.0|TORCH_SCRIPT|N/A|N/A|6568285400|
|2024-08-07 18:01:26|@dhrubo-os|`opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill`|1.0.0|TORCH_SCRIPT|N/A|N/A|10293890748|
|2024-08-07 18:23:41|@dhrubo-os|`opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini`|1.0.0|TORCH_SCRIPT|N/A|N/A|10294048787|
|2024-08-08 09:40:44|@dhrubo-os|`opensearch-project/opensearch-neural-sparse-encoding-v2-distill`|1.0.0|TORCH_SCRIPT|N/A|N/A|10295327692|
30 changes: 0 additions & 30 deletions utils/model_uploader/upload_history/supported_models.json
Original file line number Diff line number Diff line change
Expand Up @@ -48,35 +48,5 @@
"Embedding Dimension": "N/A",
"Pooling Mode": "N/A",
"Workflow Run ID": "6568285400"
},
{
"Model Uploader": "@dhrubo-os",
"Upload Time": "2024-08-07 18:01:26",
"Model ID": "opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill",
"Model Version": "1.0.0",
"Model Format": "TORCH_SCRIPT",
"Embedding Dimension": "N/A",
"Pooling Mode": "N/A",
"Workflow Run ID": "10293890748"
},
{
"Model Uploader": "@dhrubo-os",
"Upload Time": "2024-08-07 18:23:41",
"Model ID": "opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini",
"Model Version": "1.0.0",
"Model Format": "TORCH_SCRIPT",
"Embedding Dimension": "N/A",
"Pooling Mode": "N/A",
"Workflow Run ID": "10294048787"
},
{
"Model Uploader": "@dhrubo-os",
"Upload Time": "2024-08-08 09:40:44",
"Model ID": "opensearch-project/opensearch-neural-sparse-encoding-v2-distill",
"Model Version": "1.0.0",
"Model Format": "TORCH_SCRIPT",
"Embedding Dimension": "N/A",
"Pooling Mode": "N/A",
"Workflow Run ID": "10295327692"
}
]

0 comments on commit 7002c56

Please sign in to comment.