-
Notifications
You must be signed in to change notification settings - Fork 730
Add column correlations metric #1711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
a0026e9
Add column correlations metric
mike0sv 4f21861
add dataset correlations
mike0sv e15d598
add dataset correlations
mike0sv 21f099f
fix lint
mike0sv c15c693
Merge branch 'main' into feature/matrix-result
mike0sv eff5a59
Add iter single values for dataframe value
mike0sv 8a3bff0
Add dataframe value handling
mike0sv 0380de9
Add examples
mike0sv 7f870a6
fix lint
mike0sv f74d814
fix lint
mike0sv 7fc2b36
Merge branch 'main' into feature/matrix-result
Liraim 4a29ff9
Add bikes tests data into repo.
Liraim a7bb6c4
Fix bikes path.
Liraim 14c7b87
Handle bikes errors.
Liraim 4e4483a
Move correlations metrics into separate notebook (before we fix visua…
Liraim File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
from typing import List | ||
from typing import Optional | ||
from typing import Sequence | ||
from typing import Tuple | ||
|
||
from evidently.core.metric_types import BoundTest | ||
from evidently.core.metric_types import DataframeValue | ||
from evidently.core.metric_types import Metric | ||
from evidently.core.report import Context | ||
from evidently.legacy.metrics.data_quality.column_correlations_metric import ColumnCorrelationsMetric | ||
from evidently.legacy.metrics.data_quality.column_correlations_metric import ColumnCorrelationsMetricResult | ||
from evidently.legacy.metrics.data_quality.dataset_correlations_metric import DatasetCorrelationsMetric | ||
from evidently.legacy.metrics.data_quality.dataset_correlations_metric import DatasetCorrelationsMetricResult | ||
from evidently.legacy.model.widget import BaseWidgetInfo | ||
from evidently.metrics._legacy import LegacyMetricCalculation | ||
|
||
|
||
class ColumnCorrelations(Metric): | ||
column_name: str | ||
|
||
def get_bound_tests(self, context: "Context") -> Sequence[BoundTest]: | ||
return [] | ||
|
||
|
||
class LegacyColumnCorrelationsCalculation( | ||
LegacyMetricCalculation[ | ||
DataframeValue, | ||
ColumnCorrelations, | ||
ColumnCorrelationsMetricResult, | ||
ColumnCorrelationsMetric, | ||
], | ||
): | ||
def display_name(self) -> str: | ||
return f"Correlations between {self.metric.column_name} column and all the other columns." | ||
|
||
def calculate_value( | ||
self, context: "Context", legacy_result: ColumnCorrelationsMetricResult, render: List[BaseWidgetInfo] | ||
) -> Tuple[DataframeValue, Optional[DataframeValue]]: | ||
current_result = legacy_result.current | ||
current_correlations = next(iter(current_result.values())) | ||
current_df = current_correlations.get_pandas() | ||
current_value = DataframeValue(display_name=self.display_name(), value=current_df) | ||
current_value.widget = render | ||
reference_value = None | ||
if legacy_result.reference is not None: | ||
reference_result = next(iter(legacy_result.reference.values())) | ||
reference_df = reference_result.get_pandas() | ||
reference_value = DataframeValue(display_name=self.display_name(), value=reference_df) | ||
reference_value.widget = [] | ||
return current_value, reference_value | ||
|
||
def legacy_metric(self) -> ColumnCorrelationsMetric: | ||
return ColumnCorrelationsMetric(column_name=self.metric.column_name) | ||
|
||
|
||
class DatasetCorrelations(Metric): | ||
def get_bound_tests(self, context: "Context") -> Sequence[BoundTest]: | ||
return [] | ||
|
||
|
||
class LegacyDatasetCorrelationsCalculation( | ||
LegacyMetricCalculation[ | ||
DataframeValue, | ||
DatasetCorrelations, | ||
DatasetCorrelationsMetricResult, | ||
DatasetCorrelationsMetric, | ||
], | ||
): | ||
def legacy_metric(self) -> DatasetCorrelationsMetric: | ||
return DatasetCorrelationsMetric() | ||
|
||
def calculate_value( | ||
self, context: "Context", legacy_result: DatasetCorrelationsMetricResult, render: List[BaseWidgetInfo] | ||
) -> Tuple[DataframeValue, Optional[DataframeValue]]: | ||
current_result = legacy_result.current | ||
current_df = next(iter(current_result.correlation.values())) | ||
current_value = DataframeValue(display_name=self.display_name(), value=current_df) | ||
current_value.widget = render | ||
reference_value = None | ||
if legacy_result.reference is not None: | ||
reference_df = next(iter(legacy_result.reference.correlation.values())) | ||
reference_value = DataframeValue(display_name=self.display_name(), value=reference_df) | ||
reference_value.widget = [] | ||
return current_value, reference_value | ||
|
||
def display_name(self) -> str: | ||
return """Calculate different correlations with target, predictions and features""" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
========================================= | ||
License | ||
========================================= | ||
Use of this dataset in publications must be cited to the following publication: | ||
|
||
[1] Fanaee-T, Hadi, and Gama, Joao, "Event labeling combining ensemble detectors and background knowledge", Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg, doi:10.1007/s13748-013-0040-3. | ||
|
||
@article{ | ||
year={2013}, | ||
issn={2192-6352}, | ||
journal={Progress in Artificial Intelligence}, | ||
doi={10.1007/s13748-013-0040-3}, | ||
title={Event labeling combining ensemble detectors and background knowledge}, | ||
url={http://dx.doi.org/10.1007/s13748-013-0040-3}, | ||
publisher={Springer Berlin Heidelberg}, | ||
keywords={Event labeling; Event detection; Ensemble learning; Background knowledge}, | ||
author={Fanaee-T, Hadi and Gama, Joao}, | ||
pages={1-15} | ||
} | ||
|
||
========================================= | ||
Contact | ||
========================================= | ||
|
||
For further information about this dataset please contact Hadi Fanaee-T ([email protected]) |
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
import numpy as np | ||
import pandas as pd | ||
|
||
from evidently import BinaryClassification | ||
from evidently import DataDefinition | ||
from evidently import Dataset | ||
from evidently import Report | ||
from evidently.core.metric_types import DataframeValue | ||
from evidently.metrics import ColumnCorrelations | ||
from evidently.metrics.data_quality import DatasetCorrelations | ||
|
||
|
||
def test_column_correlations(): | ||
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}) | ||
ds = Dataset.from_pandas(df) | ||
|
||
metric = ColumnCorrelations(column_name="a") | ||
report = Report(metrics=[metric]) | ||
|
||
run = report.run(ds) | ||
|
||
result = run.context.get_metric_result(metric) | ||
assert isinstance(result, DataframeValue) | ||
pd.testing.assert_frame_equal(result.value, pd.DataFrame([{"kind": "cramer_v", "column_name": "b", "value": 1.0}])) | ||
|
||
|
||
def test_dataset_correlations(): | ||
df = pd.DataFrame( | ||
{ | ||
"my_target": [1, np.nan, 3] * 1000, | ||
"my_prediction": [1, 2, np.nan] * 1000, | ||
"feature_1": [1, 2, 3] * 1000, | ||
"feature_2": ["a", np.nan, "a"] * 1000, | ||
} | ||
) | ||
ds = Dataset.from_pandas( | ||
df, | ||
data_definition=DataDefinition( | ||
classification=[BinaryClassification(target="my_target", prediction_labels="my_prediction")] | ||
), | ||
) | ||
|
||
metric = DatasetCorrelations() | ||
report = Report(metrics=[metric]) | ||
|
||
run = report.run(ds) | ||
|
||
result = run.context.get_metric_result(metric) | ||
assert isinstance(result, DataframeValue) | ||
pd.testing.assert_frame_equal( | ||
result.value, | ||
pd.DataFrame( | ||
[{"my_target": 1, "my_prediction": np.nan}, {"my_target": np.nan, "my_prediction": 1}], | ||
index=["my_target", "my_prediction"], | ||
), | ||
) |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this correctly serialize into json with arbitrary dataframes?
If we allow this to be a type in result - so we should be ready to support all possible data of this type and this can be challenging.