Make `make_model_config_json` function more concise #191

thanawan-atc · 2023-07-14T20:43:13Z

Description

Instead of scrapping embedding_dimension, pooling_mode, and normalize_result from config.json file, we can use sentence transformer attributes and build-in function directly.

Check List

Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>

codecov · 2023-07-14T21:05:09Z

Codecov Report

Merging #191 (ce0179e) into main (8784cc4) will increase coverage by 0.10%.
The diff coverage is 88.88%.

@@            Coverage Diff             @@
##             main     #191      +/-   ##
==========================================
+ Coverage   90.96%   91.06%   +0.10%     
==========================================
  Files          37       37              
  Lines        4073     4052      -21     
==========================================
- Hits         3705     3690      -15     
+ Misses        368      362       -6

Impacted Files	Coverage Δ
...search_py_ml/ml_models/sentencetransformermodel.py	`70.50% <88.88%> (-0.05%)`	⬇️

dhrubo-os · 2023-07-14T21:56:02Z

opensearch_py_ml/ml_models/sentencetransformermodel.py

        if normalize_result is not None:
            model_config_content["model_config"]["normalize_result"] = normalize_result
        else:
-            normalize_result_json_file_path = os.path.join(folder_path, "2_Normalize")
-            if os.path.exists(normalize_result_json_file_path):
+            if len(model._modules) >= 3 and isinstance(model._modules["2"], Normalize):


I think otherwise we can say normalize_result = False?

dhrubo-os · 2023-07-14T22:06:12Z

opensearch_py_ml/ml_models/sentencetransformermodel.py

+                    model_type = model._modules["0"].auto_model.__class__.__name__
+                    model_type = model_type.lower().rstrip("model")
+                except Exception as e:
+                    raise Exception(f"Raised exception while getting model_type: {e}")


Let's try to cover these exception branches. I think either we can mock the model or we can stub the _modules as None so that it can go to the except branch. Same for other exceptions.

I don't think we can do it that way because the function loads a hugging face SentenceTransformer within the function, so we cannot stub the _modules as None in test file.

I agree this will be bit tricky to implement. We can skip this for now. Not urgent.

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>

dhrubo-os · 2023-07-15T00:31:51Z

opensearch_py_ml/ml_models/sentencetransformermodel.py

-        # if user input model_type and embedding_dimension, it will skip reading the config.json file
-        if model_type is None or embedding_dimension is None:
+        # if user input model_type/embedding_dimension/pooling_mode, it will skip this step.
+        model = SentenceTransformer(self.model_id)


can we use model_name here? Customer might provide another model name here

I can change it to be model_name, but model_name still needs to be hugging face model_id for it to work though (i.e., 'sentence-transformers/msmarco-distilbert-base-tas-b').

I was also wondering why we need to have model_name. I thought that we just want to allow user to set the arbitrary name for the model. But in that case line 1026 should still be self.model_id because SentenceTransformer("Arbitrary Name") would not work [Note that it is hugging face class SentenceTransformer not our class SentenceTransformerModel here.] I'm not sure if I misunderstand anything. Let me know if you still want me to change it to model_name.

You were right. We can't use model_name. For some reason I mixed up with model_name with model_id. Ignore my comment.

* Edit make_model_config_json function Signed-off-by: Thanawan Atchariyachanvanit <[email protected]> * Improve cov Signed-off-by: Thanawan Atchariyachanvanit <[email protected]> * Improve cov Signed-off-by: Thanawan Atchariyachanvanit <[email protected]> * Update CHANGELOG.md Signed-off-by: Thanawan Atchariyachanvanit <[email protected]> * Improve cov (2) Signed-off-by: Thanawan Atchariyachanvanit <[email protected]> * Correct linting Signed-off-by: Thanawan Atchariyachanvanit <[email protected]> --------- Signed-off-by: Thanawan Atchariyachanvanit <[email protected]> (cherry picked from commit 7622af5)

* Edit make_model_config_json function Signed-off-by: Thanawan Atchariyachanvanit <[email protected]> * Improve cov Signed-off-by: Thanawan Atchariyachanvanit <[email protected]> * Improve cov Signed-off-by: Thanawan Atchariyachanvanit <[email protected]> * Update CHANGELOG.md Signed-off-by: Thanawan Atchariyachanvanit <[email protected]> * Improve cov (2) Signed-off-by: Thanawan Atchariyachanvanit <[email protected]> * Correct linting Signed-off-by: Thanawan Atchariyachanvanit <[email protected]> --------- Signed-off-by: Thanawan Atchariyachanvanit <[email protected]> (cherry picked from commit 7622af5) Co-authored-by: Thanawan Atchariyachanvanit <[email protected]>

Edit make_model_config_json function

655aa99

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>

thanawan-atc requested review from dhrubo-os, greaa-aws, ylwu-amzn, b4sjoo, jngz-es and rbhavna as code owners July 14, 2023 20:43

dhrubo-os reviewed Jul 14, 2023

View reviewed changes

thanawan-atc added 2 commits July 14, 2023 15:21

Improve cov

4017f6b

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>

Improve cov

33940bb

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>

thanawan-atc force-pushed the edit-make-model-config-json branch from a254149 to 33940bb Compare July 14, 2023 22:31

thanawan-atc added 3 commits July 14, 2023 15:33

Update CHANGELOG.md

c2fd054

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>

Improve cov (2)

c685cd1

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>

Correct linting

ce0179e

Signed-off-by: Thanawan Atchariyachanvanit <[email protected]>

dhrubo-os reviewed Jul 15, 2023

View reviewed changes

dhrubo-os approved these changes Jul 15, 2023

View reviewed changes

rbhavna approved these changes Jul 18, 2023

View reviewed changes

dhrubo-os merged commit 7622af5 into opensearch-project:main Jul 18, 2023
12 of 13 checks passed

dhrubo-os added the backport 1.x label Jul 18, 2023

opensearch-trigger-bot bot mentioned this pull request Jul 18, 2023

[Backport 1.x] Make make_model_config_json function more concise #194

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `make_model_config_json` function more concise #191

Make `make_model_config_json` function more concise #191

thanawan-atc commented Jul 14, 2023 •

edited

Loading

codecov bot commented Jul 14, 2023 •

edited

Loading

dhrubo-os Jul 14, 2023

thanawan-atc Jul 14, 2023

dhrubo-os Jul 14, 2023

thanawan-atc Jul 14, 2023

dhrubo-os Jul 15, 2023

dhrubo-os Jul 15, 2023

thanawan-atc Jul 15, 2023

dhrubo-os Jul 15, 2023

Make make_model_config_json function more concise #191

Make make_model_config_json function more concise #191

Conversation

thanawan-atc commented Jul 14, 2023 • edited Loading

Description

Check List

codecov bot commented Jul 14, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Make `make_model_config_json` function more concise #191

Make `make_model_config_json` function more concise #191

thanawan-atc commented Jul 14, 2023 •

edited

Loading

codecov bot commented Jul 14, 2023 •

edited

Loading