Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Staging to main: Fix bug in MAP and added new notebook programmatic execution #2035

Merged
merged 30 commits into from
Dec 23, 2023

Conversation

miguelgfierro
Copy link
Collaborator

Description

Related Issues

References

Checklist:

  • I have followed the contribution guidelines and code style for this project.
  • I have added tests covering my contributions.
  • I have updated the documentation accordingly.
  • This PR is being made to staging branch and not to main branch.

miguelgfierro and others added 20 commits October 30, 2023 22:59
Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: miguelgfierro <[email protected]>
* Not triggering unit tests on Draft PR

Signed-off-by: Jun Ki Min <[email protected]>

* Change a PR-triggering file to test

Signed-off-by: Jun Ki Min <[email protected]>

---------

Signed-off-by: Jun Ki Min <[email protected]>
* Announcement LF

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* Update email

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* Update README.md

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* security

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* license and contribution notice

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* update author link

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* Add new code of conduct from LF

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* Replacing references GRU4Rec to GRU

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* Replacing references GRU4Rec to GRU

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* Replacing references GRU4Rec in config files

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* Update references

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* Delete conda.md

Signed-off-by: Jun Ki Min <[email protected]>

* refactor map_at_k and map to be the same as Spark's

Signed-off-by: Jun Ki Min <[email protected]>

* list of test failing to fix

Signed-off-by: Jun Ki Min <[email protected]>

* Update readme LF feedback @wutaomsft

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* Update NEWS.md

Co-authored-by: Andreas Argyriou <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* Update README.md

Co-authored-by: Andreas Argyriou <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* Fix test errors, Refactor column check utils to be simpler

Signed-off-by: Jun Ki Min <[email protected]>

* Rename ranking tests to be _at_k suffixed

Signed-off-by: Jun Ki Min <[email protected]>

* Change test names in the test group

Signed-off-by: Jun Ki Min <[email protected]>

* add comment to mocked fn in a test

Signed-off-by: Jun Ki Min <[email protected]>

* 📝

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* remove unused input

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* 📝

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* no need to output the logs twice

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* packages

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* skipping flaky test

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* Issue with TF

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* Comment out the PR gate affected tests with the upgrade to TF>2.10.1

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* Comment out the nightly builds affected tests with the upgrade to TF>2.10.1

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* 🐛

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* Comment out the nightly builds affected tests with the upgrade to TF>2.10.1

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* revert the breaking tests with TF 2.10.1

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* temporary pin to TF=2.8.4

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* Update security tests

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>

* Update expected values to not use fixture

Signed-off-by: Jun Ki Min <[email protected]>

* list of test failing to fix

Signed-off-by: Jun Ki Min <[email protected]>

* Fix missing fixture error

Signed-off-by: Jun Ki Min <[email protected]>

---------

Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>
Co-authored-by: miguelgfierro <[email protected]>
Co-authored-by: Andreas Argyriou <[email protected]>
Co-authored-by: Miguel Fierro <[email protected]>
Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: miguelgfierro <[email protected]>
Signed-off-by: miguelgfierro <[email protected]>
@miguelgfierro
Copy link
Collaborator Author

@loomlike I think there are some of the tests in the nightly builds that haven't been updated:

@pytest.mark.notebooks
    @pytest.mark.parametrize(
        "size, expected_values",
        [
            (
                "1m",
                ***
                    "map": 0.060579,
                    "ndcg": 0.299245,
                    "precision": 0.[2701](https://github.com/recommenders-team/recommenders/actions/runs/6749833186/job/18351719718?pr=2035#step:3:2708)16,
                    "recall": 0.104350,
                ***,
            ),
            (
                "10m",
                ***
                    "map": 0.098745,
                    "ndcg": 0.319625,
                    "precision": 0.275756,
                    "recall": 0.154014,
                ***,
            ),
        ],
    )
    def test_sar_single_node_functional(
        notebooks, output_notebook, kernel_name, size, expected_values
    ):
        notebook_path = notebooks["sar_single_node"]
        pm.execute_notebook(
            notebook_path,
            output_notebook,
            kernel_name=kernel_name,
            parameters=dict(TOP_K=10, MOVIELENS_DATA_SIZE=size),
        )
        results = sb.read_notebook(output_notebook).scraps.dataframe.set_index("name")[
            "data"
        ]
    
        for key, value in expected_values.items():
>           assert results[key] == pytest.approx(value, rel=TOL, abs=ABS_TOL)
E           assert 0.1850985029206066 == 0.060579 ± 5.0e-02
E             comparison failed
E             Obtained: 0.1850985029206066
E             Expected: 0.060579 ± 5.0e-02

@@ -194,7 +181,7 @@ def test_python_exp_var(rating_true, rating_pred):
rating_pred=rating_true,
col_prediction=DEFAULT_RATING_COL,
) == pytest.approx(1.0, TOL)
assert exp_var(rating_true, rating_pred) == pytest.approx(-6.4466, TOL)
assert exp_var(rating_true, rating_pred) == pytest.approx(-6.4466, 0.01)
Copy link
Collaborator

@anargyri anargyri Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we now get something between -6.38 or -6.51? It looks a bit strange, since nothing changed with explained variance, doesn't it?

Copy link
Collaborator

@loomlike loomlike Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this happened when I copied expected outputs values from the fixture over to each tests. This shouldn't be changed. I'll update codes back to what we had. Thank you for the good catch!

evaluator2 = SparkRatingEvaluation(df_true, df_pred)
assert evaluator2.exp_var() == target_metrics["exp_var"]
evaluator = SparkRatingEvaluation(df_true, df_pred)
assert evaluator.exp_var() == pytest.approx(-6.4466, 0.01)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here. So both spark and python evaluations now give slightly different results. Maybe something changed in the input dataframes?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should use the same TOL for the consistency.

@loomlike loomlike self-assigned this Nov 8, 2023
@loomlike
Copy link
Collaborator

@miguelgfierro To fix this issue, I need to change map_at_k() in the notebooks to use map() which will cause conflicts with your changes in PR #2031 where you're removing glue parts:

# Record results with papermill for tests
import scrapbook as sb
sb.glue("map", rank_eval.map_at_k())  --> rank_eval.map()
...

So I think it will be good to wait for your PR gets merged into staging first and then fix the issues if that's okay.

@miguelgfierro
Copy link
Collaborator Author

So I think it will be good to wait for your PR gets merged into staging first and then fix the issues if that's okay.

Got it, makes sense

@loomlike
Copy link
Collaborator

@SimonYansenZhao Hi Simon, I found an issue at execute_notebook function I wanted you to confirm:

        if (
            "tags" in cell.metadata
            and "parameters" in cell.metadata["tags"]
            and cell.cell_type == "code"
        ):
            cell_source = cell.source
            modified_cell_source = (
                cell_source  # Initialize a variable to hold the modified source
            )
            for param, new_value in parameters.items():
                if (
                    isinstance(new_value, str)
                    and not (new_value.startswith('"') and new_value.endswith('"'))
                    and not (new_value.startswith("'") and new_value.endswith("'"))
                ):
                    # Check if the new value is a string and surround it with quotes if necessary
                    new_value = f'"{new_value}"'
                # # Check if the new value is a string and surround it with quotes if necessary
                # if isinstance(new_value, str):
                #     new_value = f'"{new_value}"'
                # Define a regular expression pattern to match parameter assignments and ignore comments
                pattern = re.compile(
                    rf"(\b{param})\s*=\s*([^#\n]+)(?:#.*$)?",
                    re.MULTILINE
                    # rf"\b{param}\s*=\s*([^\n]+)\b"
                )
                modified_cell_source = pattern.sub(rf"\1 = {new_value}", cell_source)  <-------- Here.

In the code above, modified_cell_source = pattern.sub(rf"\1 = {new_value}", cell_source) this part always replaces a new value with the original cell_source. This means that the next parameter value is updated into the original cell source string again, not the one we applied the previous value to, which is modified_cell_source. And thus modified_cell_source will only contain the last parameter value that was updated.

To test and fix this, I changed the code to apply the pattern to modified_cell_source cumulatively, and added a test case (to do so, I pull that part into a separate function, as I suggested earlier).

If you see those changes make sense or have any comments, please let me know!
The changes are in jumin/fix_nightly branch.

* Revert tests tolerance
* Fix notebook parameter parsing
* Add notebook utils tests to test groups
* Fix notebooks
* Fix notebook unit tests
* Update evaluation metrics name map. Handle None for exp_var
* Fix smoke tests
* cleanup
* Fix functional test errors
* make notebook parameter update function to be private
* Fix benchmark notebook bug
* fix remaining bugs
---------

Signed-off-by: Jun Ki Min <[email protected]>
@miguelgfierro miguelgfierro changed the title Staging to main: Fix bug in MAP Staging to main: Fix bug in MAP and added new notebook programmatic execution Dec 22, 2023
…t_cell

Fix benchmarks last cell to store value, not [value]
Copy link
Collaborator

@loomlike loomlike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are all greeeeeeen!

@miguelgfierro
Copy link
Collaborator Author

Vamos!!!!

@miguelgfierro miguelgfierro merged commit 0d9d7c7 into main Dec 23, 2023
45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants