Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make make_model_config_json function more concise #191

Merged

Conversation

thanawan-atc
Copy link
Contributor

@thanawan-atc thanawan-atc commented Jul 14, 2023

Description

Instead of scrapping embedding_dimension, pooling_mode, and normalize_result from config.json file, we can use sentence transformer attributes and build-in function directly.

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
@codecov
Copy link

codecov bot commented Jul 14, 2023

Codecov Report

Merging #191 (ce0179e) into main (8784cc4) will increase coverage by 0.10%.
The diff coverage is 88.88%.

@@            Coverage Diff             @@
##             main     #191      +/-   ##
==========================================
+ Coverage   90.96%   91.06%   +0.10%     
==========================================
  Files          37       37              
  Lines        4073     4052      -21     
==========================================
- Hits         3705     3690      -15     
+ Misses        368      362       -6     
Impacted Files Coverage Δ
...search_py_ml/ml_models/sentencetransformermodel.py 70.50% <88.88%> (-0.05%) ⬇️

if normalize_result is not None:
model_config_content["model_config"]["normalize_result"] = normalize_result
else:
normalize_result_json_file_path = os.path.join(folder_path, "2_Normalize")
if os.path.exists(normalize_result_json_file_path):
if len(model._modules) >= 3 and isinstance(model._modules["2"], Normalize):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think otherwise we can say normalize_result = False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edited!

model_type = model._modules["0"].auto_model.__class__.__name__
model_type = model_type.lower().rstrip("model")
except Exception as e:
raise Exception(f"Raised exception while getting model_type: {e}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to cover these exception branches. I think either we can mock the model or we can stub the _modules as None so that it can go to the except branch. Same for other exceptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can do it that way because the function loads a hugging face SentenceTransformer within the function, so we cannot stub the _modules as None in test file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this will be bit tricky to implement. We can skip this for now. Not urgent.

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
# if user input model_type and embedding_dimension, it will skip reading the config.json file
if model_type is None or embedding_dimension is None:
# if user input model_type/embedding_dimension/pooling_mode, it will skip this step.
model = SentenceTransformer(self.model_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use model_name here? Customer might provide another model name here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it to be model_name, but model_name still needs to be hugging face model_id for it to work though (i.e., 'sentence-transformers/msmarco-distilbert-base-tas-b').

I was also wondering why we need to have model_name. I thought that we just want to allow user to set the arbitrary name for the model. But in that case line 1026 should still be self.model_id because SentenceTransformer("Arbitrary Name") would not work [Note that it is hugging face class SentenceTransformer not our class SentenceTransformerModel here.] I'm not sure if I misunderstand anything. Let me know if you still want me to change it to model_name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were right. We can't use model_name. For some reason I mixed up with model_name with model_id. Ignore my comment.

@dhrubo-os dhrubo-os merged commit 7622af5 into opensearch-project:main Jul 18, 2023
12 of 13 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jul 18, 2023
* Edit make_model_config_json function

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>

* Improve cov

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>

* Improve cov

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>

* Improve cov (2)

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>

* Correct linting

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>

---------

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
(cherry picked from commit 7622af5)
dhrubo-os pushed a commit that referenced this pull request Jul 18, 2023
* Edit make_model_config_json function

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>

* Improve cov

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>

* Improve cov

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>

* Improve cov (2)

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>

* Correct linting

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>

---------

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>
(cherry picked from commit 7622af5)

Co-authored-by: Thanawan Atchariyachanvanit <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants