Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Automating Model tracing and uploading #193

Closed
wants to merge 34 commits into from

Conversation

thanawan-atc
Copy link
Contributor

@thanawan-atc thanawan-atc commented Jul 17, 2023

Description

I implement a GitHub Action workflow that will automate model tracing and uploading process, with an initial focus on pre-trained sentence transformer models.

Issues Resolved

OpenSearch Team receives frequent requests from customers to incorporate additional models into our model hub. Moreover, the current manual process of tracing and uploading these models is laborious and time-consuming. As a result, there is a compelling need to automate this process to streamline model availability effectively.

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
@codecov
Copy link

codecov bot commented Jul 17, 2023

Codecov Report

Merging #193 (d2afc36) into main (7622af5) will not change coverage.
The diff coverage is n/a.

❗ Current head d2afc36 differs from pull request most recent head 4d37394. Consider uploading reports for the commit 4d37394 to get more accurate results

@@           Coverage Diff           @@
##             main     #193   +/-   ##
=======================================
  Coverage   91.06%   91.06%           
=======================================
  Files          37       37           
  Lines        4052     4052           
=======================================
  Hits         3690     3690           
  Misses        362      362           

thanawan-atc and others added 2 commits July 17, 2023 22:35
…ert-base-tas-b)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
* @thanawan-atc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? I might need to talk with release team to give you write access in the repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I just made this changes for testing on my own repo. I will revert it back before the PR is merged.

else
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some comments for all these if else conditions

@@ -33,7 +33,7 @@ docker build \
echo -e "\033[1m>>>>> Run [opensearch-project/opensearch-py-ml container] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>\033[0m"


if [[ "$IS_DOC" == "false" ]]; then
if [[ "$TASK_TYPE" == "test" ]]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment.

required: true
type: choice
options:
- "BOTH"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How this will work? Will this create two separate workflow? How will we notify release team? One notification or two?

id-token: write
contents: read
strategy:
fail-fast: false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this for? May be add a comment why do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, after thinking about it, we can remove this. I initially followed the workflow for integration test.

Comment on lines +83 to +85
python -m pip install pandas~=${PANDAS_VERSION};
python utils/model_uploader/model_autotracing.py ${MODEL_ID} ${MODEL_VERSION} ${TRACING_FORMAT} -ed ${EMBEDDING_DIMENSION} -pm ${POOLING_MODE}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding as a nox session?

@@ -20,7 +20,7 @@ jobs:
- name: Checkout Repository
uses: actions/checkout@v2
- name: Integ ${{ matrix.cluster }} secured=${{ matrix.secured }} version=${{matrix.entry.opensearch_version}}
run: "./.ci/run-tests ${{ matrix.cluster }} ${{ matrix.secured }} ${{ matrix.entry.opensearch_version }} true"
run: "./.ci/run-tests ${{ matrix.cluster }} ${{ matrix.secured }} ${{ matrix.entry.opensearch_version }} doc"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to make sure you tried this command in your end and it successfully generated the doc files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it works as expected.

@@ -18,7 +18,7 @@ jobs:
- name: Checkout
uses: actions/checkout@v2
- name: Integ ${{ matrix.cluster }} secured=${{ matrix.secured }} version=${{matrix.entry.opensearch_version}}
run: "./.ci/run-tests ${{ matrix.cluster }} ${{ matrix.secured }} ${{ matrix.entry.opensearch_version }}"
run: "./.ci/run-tests ${{ matrix.cluster }} ${{ matrix.secured }} ${{ matrix.entry.opensearch_version }} test"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

id: dryrun_model_uploading
run: |
aws s3 sync ./upload/ s3://opensearch-exp/ml-models/huggingface/sentence-transformers/ --dryrun
dryrun_output=$(aws s3 sync ./upload/ s3://opensearch-exp/ml-models/huggingface/sentence-transformers/ --dryrun)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to talk about the file path in our meeting.

assert False, f"Raised Exception in {model_format} model deployment: {e}"

# 3.) Check model status
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to check model task status also. IF model is deployed then we can start generating embedding.

thanawan-atc and others added 11 commits July 18, 2023 19:27
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
…ansformers/msmarco-distilbert-base-tas-b)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants