You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/book/component-guide/data-validators/README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,7 +34,7 @@ Data Validator are optional stack components provided by integrations. The follo
34
34
|[Deepchecks](deepchecks.md)| <p>data quality<br>data drift<br>model drift<br>model performance</p> | <p>tabular: <code>pandas.DataFrame</code><br>CV: <code>torch.utils.data.dataloader.DataLoader</code></p> | <p>tabular: <code>sklearn.base.ClassifierMixin</code><br>CV: <code>torch.nn.Module</code></p> | Add Deepchecks data and model validation tests to your pipelines |`deepchecks`|
35
35
|[Evidently](evidently.md)| <p>data quality<br>data drift<br>model drift<br>model performance</p> | tabular: `pandas.DataFrame`| N/A | Use Evidently to generate a variety of data quality and data/model drift reports and visualizations |`evidently`|
36
36
|[Great Expectations](great-expectations.md)| <p>data profiling<br>data quality</p> | tabular: `pandas.DataFrame`| N/A | Perform data testing, documentation and profiling with Great Expectations |`great_expectations`|
37
-
|[Whylogs/WhyLabs](whylogs.md)| data drift | tabular: `pandas.DataFrame`| N/A | Generate data profiles with whylogs and upload them to WhyLabs|`whylogs`|
37
+
|[Whylogs/WhyLabs](whylogs.md)| data drift | tabular: `pandas.DataFrame`| N/A | Generate data profiles with whylogs. Hosted WhyLabs platform is being discontinued after Apple's acquisition—see the integration page for OSS deployment options.|`whylogs`|
38
38
39
39
If you would like to see the available flavors of Data Validator, you can use the command:
Copy file name to clipboardExpand all lines: docs/book/component-guide/data-validators/whylogs.md
+11-9Lines changed: 11 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,11 +6,13 @@ description: >-
6
6
7
7
# Whylogs
8
8
9
-
The whylogs/WhyLabs [Data Validator](./) flavor provided with the ZenML integration uses [whylogs](https://whylabs.ai/whylogs) and [WhyLabs](https://whylabs.ai) to generate and track data profiles, highly accurate descriptive representations of your data. The profiles can be used to implement automated corrective actions in your pipelines, or to render interactive representations for further visual interpretation, evaluation and documentation.
9
+
The whylogs/WhyLabs [Data Validator](./) flavor provided with the ZenML integration uses the open-source [whylogs](https://github.com/whylabs/whylogs) library together with the now open-sourced [WhyLabs platform](https://github.com/whylabs/whylabs-oss) to generate and track data profiles, highly accurate descriptive representations of your data. The profiles can be used to implement automated corrective actions in your pipelines, or to render interactive representations for further visual interpretation, evaluation and documentation.
10
+
11
+
> **Warning:**[WhyLabs was acquired by Apple](https://whylabs.ai/) and the hosted WhyLabs platform is being discontinued. While the whylogs library remains open source and the WhyLabs platform source code is publicly available, hosted deployments may no longer be accessible. Make sure to plan your usage of the integration accordingly and consider self-hosting the OSS platform if you still need WhyLabs features.
10
12
11
13
### When would you want to use it?
12
14
13
-
[Whylogs](https://whylabs.ai/whylogs) is an open-source library that analyzes your data and creates statistical summaries called whylogs profiles. Whylogs profiles can be processed in your pipelines and visualized locally or uploaded to the [WhyLabs platform](https://whylabs.ai/), where more in depth analysiscan be carried out. Even though [whylogs also supports other data types](https://github.com/whylabs/whylogs#data-types), the ZenML whylogs integration currently only works with tabular data in `pandas.DataFrame` format.
15
+
[Whylogs](https://github.com/whylabs/whylogs) is an open-source library that analyzes your data and creates statistical summaries called whylogs profiles. Whylogs profiles can be processed in your pipelines and visualized locally or uploaded to a WhyLabs deployment for more in depth analysis. The official hosted WhyLabs service is being discontinued, but you can continue to operate a WhyLabs instance yourself by using the open-source release at [https://github.com/whylabs/whylabs-oss](https://github.com/whylabs/whylabs-oss). Even though [whylogs also supports other data types](https://github.com/whylabs/whylogs#data-types), the ZenML whylogs integration currently only works with tabular data in `pandas.DataFrame` format.
14
16
15
17
You should use the whylogs/WhyLabs Data Validator when you need the following data validation features that are possible with whylogs and WhyLabs:
16
18
@@ -28,7 +30,7 @@ The whylogs Data Validator flavor is included in the whylogs ZenML integration,
28
30
zenml integration install whylogs -y
29
31
```
30
32
31
-
If you don't need to connect to the WhyLabs platform to upload and store the generated whylogs data profiles, the Data Validator stack component does not require any configuration parameters. Adding it to a stack is as simple as running e.g.:
33
+
If you don't need to connect to a WhyLabs deployment to upload and store the generated whylogs data profiles, the Data Validator stack component does not require any configuration parameters. Adding it to a stack is as simple as running e.g.:
Adding WhyLabs logging capabilities to your whylogs Data Validator is just slightly more complicated, as you also need to create a [ZenML Secret](https://docs.zenml.io/getting-started/deploying-zenml/secret-management) to store the sensitive WhyLabs authentication information in a secure location and then reference the secret in the Data Validator configuration. To generate a WhyLabs access token, you can follow [the official WhyLabs instructions documented here](https://docs.whylabs.ai/docs/whylabs-api/#creating-an-api-token).
43
+
Adding WhyLabs logging capabilities to your whylogs Data Validator is just slightly more complicated, as you also need to create a [ZenML Secret](https://docs.zenml.io/getting-started/deploying-zenml/secret-management) to store the sensitive WhyLabs authentication information in a secure location and then reference the secret in the Data Validator configuration. To generate a WhyLabs access token for a deployment that you host yourself, refer to the guidance in the [WhyLabs OSS repository](https://github.com/whylabs/whylabs-oss).
42
44
43
45
Then, you can register the whylogs Data Validator with WhyLabs logging capabilities as follows:
You'll also need to enable whylabs logging for your custom pipeline steps if you want to upload the whylogs data profiles that they return as artifacts to the WhyLabs platform. This is enabled by default for the standard whylogs step. For custom steps, you can enable WhyLabs logging by setting the `upload_to_whylabs` parameter to `True` in the step configuration, e.g.:
58
+
You'll also need to enable whylabs logging for your custom pipeline steps if you want to upload the whylogs data profiles that they return as artifacts to your WhyLabs deployment. This is enabled by default for the standard whylogs step. For custom steps, you can enable WhyLabs logging by setting the `upload_to_whylabs` parameter to `True` in the step configuration, e.g.:
57
59
58
60
```python
59
61
from typing import Annotated
@@ -104,7 +106,7 @@ You can [visualize whylogs profiles](whylogs.md#visualizing-whylogs-profiles) in
104
106
105
107
#### The whylogs standard step
106
108
107
-
ZenML wraps the whylogs/WhyLabs functionality in the form of a standard `WhylogsProfilerStep` step. The only field in the step config is a `dataset_timestamp` attribute which is only relevant when you upload the profiles to WhyLabs that uses this field to group and merge together profiles belonging to the same dataset. The helper function `get_whylogs_profiler_step` used to create an instance of this standard step takes in an optional `dataset_id` parameter that is also used only in the context of WhyLabs upload to identify the model in the context of which the profile is uploaded, e.g.:
109
+
ZenML wraps the whylogs/WhyLabs functionality in the form of a standard `WhylogsProfilerStep` step. The only field in the step config is a `dataset_timestamp` attribute which is only relevant when you upload the profiles to a WhyLabs deployment that uses this field to group and merge together profiles belonging to the same dataset. The helper function `get_whylogs_profiler_step` used to create an instance of this standard step takes in an optional `dataset_id` parameter that is also used only in the context of WhyLabs uploads to identify the model in the context of which the profile is uploaded, e.g.:
108
110
109
111
```python
110
112
from zenml.integrations.whylogs.steps import get_whylogs_profiler_step
@@ -149,7 +151,7 @@ You can view [the complete list of configuration parameters](https://sdkdocs.zen
149
151
150
152
The whylogs Data Validator implements the same interface as do all Data Validators, so this method forces you to maintain some level of compatibility with the overall Data Validator abstraction, which guarantees an easier migration in case you decide to switch to another Data Validator.
151
153
152
-
All you have to do is call the whylogs Data Validator methods when you need to interact with whylogs to generate data profiles. You may optionally enable whylabs logging to automatically upload the returned whylogs profile to WhyLabs, e.g.:
154
+
All you have to do is call the whylogs Data Validator methods when you need to interact with whylogs to generate data profiles. You may optionally enable whylabs logging to automatically upload the returned whylogs profile to your WhyLabs deployment, e.g.:
153
155
154
156
```python
155
157
@@ -191,7 +193,7 @@ def data_profiler(
191
193
profile = data_validator.data_profiling(
192
194
dataset,
193
195
)
194
-
# optionally upload the profile to WhyLabs, if WhyLabs credentials are configured
196
+
# optionally upload the profile to your WhyLabs deployment, if WhyLabs credentials are configured
195
197
data_validator.upload_profile_view(profile)
196
198
197
199
# validation post-processing (e.g. interpret results, take actions) can happen here
@@ -203,7 +205,7 @@ Have a look at [the complete list of methods and parameters available in the `Wh
203
205
204
206
#### Call whylogs directly
205
207
206
-
You can use the whylogs library directly in your custom pipeline steps, and only leverage ZenML's capability of serializing, versioning and storing the `DatasetProfileView` objects in its Artifact Store. You may optionally enable whylabs logging to automatically upload the returned whylogs profile to WhyLabs, e.g.:
208
+
You can use the whylogs library directly in your custom pipeline steps, and only leverage ZenML's capability of serializing, versioning and storing the `DatasetProfileView` objects in its Artifact Store. You may optionally enable whylabs logging to automatically upload the returned whylogs profile to your WhyLabs deployment, e.g.:
0 commit comments