Basic statistics allow computation on sparse data and add test #2095

md-shafiul-alam · 2024-10-08T18:23:01Z

Description

Add a comprehensive description of proposed changes

List associated issue number(s) if exist(s): #6 (for example)

Documentation PR (if needed): #1340 (for example)

Benchmarks PR (if needed): IntelPython/scikit-learn_bench#155 (for example)

Checklist to comply with before moving PR from draft:

PR completeness and readability

I have reviewed my changes thoroughly before submitting this pull request.
I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have added a respective label(s) to PR if I have a permission for that.
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.
I have extended testing suite if new functionality was introduced in this PR.

Performance

I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
I have provided justification why performance has changed or why changes are not expected.
I have provided justification why quality metrics have changed or why changes are not expected.
I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

Vika-F · 2024-10-09T12:03:56Z

sklearnex/basic_statistics/tests/test_basic_statistics.py

+    if weighted:
+        weights = gen.uniform(low=-0.5, high=1.0, size=row_count)
+        weights = weights.astype(dtype=dtype)
+    basicstat = BasicStatistics(result_options=["mean", "max", "sum"])


According to onedal tests, need to exclude "max" at it contains bugs:
https://github.com/intel/scikit-learn-intelex/blob/main/onedal/basic_statistics/tests/test_basic_statistics.py#L273

According to onedal tests, need to exclude "max" at it contains bugs: https://github.com/intel/scikit-learn-intelex/blob/main/onedal/basic_statistics/tests/test_basic_statistics.py#L273

@olegkkruglov please message out a link to the ticket associated with this error (just to make sure it wasn't lost)

I'm not sure if I have it. This skip was added not by me.

Sorry, my mistake, I didn't dig deep enough in the git blame. Turns out it was introduced here #1846 by @Vika-F . Do you know if there was any follow-up work after #1846 on the max issues/ any memory on what was going on?

I observed that issue as well, temporarily removed the "max" in tests for this PR.

icfaust · 2024-10-09T12:08:41Z

Please add this to run_to_run_stability sparse stability testing.

samir-nasibli · 2024-10-11T01:26:42Z

sklearnex/basic_statistics/tests/test_basic_statistics.py

@@ -178,6 +181,53 @@ def test_multiple_options_on_random_data(
    assert_allclose(gtr_sum, res_sum, atol=tol)


+@pytest.mark.parametrize("queue", get_queues())


please use _get_dataframes_and_queues instead

Sparse data can't work with dataframes

md-shafiul-alam added 6 commits October 8, 2024 09:27

basic-stat-sparse-test

d8bf640

lint

f1fb204

fix

49804f9

import

353cf0c

minor fix

7df2a3f

import

793ab23

md-shafiul-alam added bug Something isn't working enhancement New feature or request and removed bug Something isn't working labels Oct 8, 2024

md-shafiul-alam changed the title ~~Basic statistics sparse fix and adding test~~ Basic statistics allow computation on sparse data and add test Oct 8, 2024

md-shafiul-alam added 2 commits October 8, 2024 14:42

add dense

f5210ea

exclude max

168a897

Vika-F reviewed Oct 9, 2024

View reviewed changes

md-shafiul-alam added 2 commits October 9, 2024 16:28

remove test for max

8f50d85

turn of weighted test

ecd34e3

samir-nasibli reviewed Oct 11, 2024

View reviewed changes

md-shafiul-alam added 4 commits October 10, 2024 19:05

test without weighted

49f9ad7

improve bs sparse test

32955d4

minor fix

2bf51fb

minor

f5eb5c0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic statistics allow computation on sparse data and add test #2095

Basic statistics allow computation on sparse data and add test #2095

md-shafiul-alam commented Oct 8, 2024

Vika-F Oct 9, 2024

icfaust Oct 9, 2024

olegkkruglov Oct 9, 2024

icfaust Oct 10, 2024

md-shafiul-alam Oct 10, 2024 •

edited

Loading

icfaust commented Oct 9, 2024

samir-nasibli Oct 11, 2024

md-shafiul-alam Oct 11, 2024

		@@ -178,6 +181,53 @@ def test_multiple_options_on_random_data(
		assert_allclose(gtr_sum, res_sum, atol=tol)


		@pytest.mark.parametrize("queue", get_queues())

Basic statistics allow computation on sparse data and add test #2095

Are you sure you want to change the base?

Basic statistics allow computation on sparse data and add test #2095

Conversation

md-shafiul-alam commented Oct 8, 2024

Description

Vika-F Oct 9, 2024

Choose a reason for hiding this comment

icfaust Oct 9, 2024

Choose a reason for hiding this comment

olegkkruglov Oct 9, 2024

Choose a reason for hiding this comment

icfaust Oct 10, 2024

Choose a reason for hiding this comment

md-shafiul-alam Oct 10, 2024 • edited Loading

Choose a reason for hiding this comment

icfaust commented Oct 9, 2024

samir-nasibli Oct 11, 2024

Choose a reason for hiding this comment

md-shafiul-alam Oct 11, 2024

Choose a reason for hiding this comment

md-shafiul-alam Oct 10, 2024 •

edited

Loading