Releases: zenml-io/zenml
0.5.2
0.5.2
0.5.2 brings an improved post-execution workflow and lots of minor changes and upgrades for the developer experience when
creating pipelines. It also improves the Airflow orchestrator logic to accommodate for more real world scenarios. Check out the
low level API guide for more details!
What's Changed
- Fix autocomplete for step and pipeline decorated functions by @schustmi in #144
- Add reference docs for CLI example functionality by @alex-zenml in #145
- Fix mypy integration by @schustmi in #147
- Improve Post-Execution Workflow by @schustmi in #146
- Fix CLI examples bug by @alex-zenml in #148
- Update quickstart example notebook by @alex-zenml in #150
- Add documentation images by @alex-zenml in #151
- Add prettierignore to gitignore by @alex-zenml in #154
- Airflow orchestrator improvements by @schustmi in #153
- Google colab added by @htahir1 in #155
- Tests for
core
andcli
modules by @alex-zenml in #149 - Add Paperspace environment check by @alex-zenml in #156
- Step caching by @schustmi in #157
- Add documentation for pipeline step parameter and run name configuration by @schustmi in #158
- Automatically disable caching if the step function code has changed by @schustmi in #159
Full Changelog: 0.5.1...0.5.2
0.5.1
0.5.1
0.5.1 builds on top of Slack of the 0.5.0 release with quick bug updates.
Overview
- Pipeline can now be run via a YAML file. #132
- CLI now let's you pull directly from GitHub examples folder. 🔥 Amazing @alex-zenml with #141!
- ZenML now has full mypy compliance. 🎉 Thanks @schustmi for #140!
- Numerous bugs and performance improvements. #136, @bcdurak great job with #142
- Airflow orchestrator now bootstraps airflow locally and spins it up before running pipelines.
- Added new docs with a low level API guide. #143
Our roadmap goes into further detail on the timeline. Vote on the next features now.
We encourage every user (old or new) to start afresh with this release. Please go over our latest docs and examples to get a hang of the new system.
Auto-generated docs:
- Some random optimizations by @htahir1 in #129
- Add codecov threhold for PRs by @schustmi in #131
- remove misspelling of preprocessor by @alex-zenml in #134
- Quick edit of documentation by @alex-zenml in #133
- Very confusing codecov update by @htahir1 in #135
- Small bugs here and there by @htahir1 in #136
- Michael/type annotations by @schustmi in #137
- Hamza/docs upgrade by @htahir1 in #138
- add tests by @alex-zenml in #130
- Michael/mypy integration by @schustmi in #139
- First version of low level api guide by @htahir1 in #140
- CLI examples by @alex-zenml in #141
- Implement pipeline configuration with yaml file by @schustmi in #132
- Baris/improved performance by @bcdurak in #142
- Hamza/docs upgrade by @htahir1 in #143
Full Changelog: 0.5.0...0.5.1
0.5.0
0.5.0
This long-awaited ZenML release marks a seminal moment in the project's history. We present to you a complete
revamp of the internals of ZenML, with a fresh new design and API. While these changes are significant, and have been months
in the making, the original vision of ZenML has not wavered. We hope that the ZenML community finds the new
design choices easier to grasp and use, and we welcome feedback on the issues board.
Warning
0.5.0 is a complete API change from the previous versions of ZenML, and is a breaking upgrade. Fundamental
concepts have been changed, and therefore backwards compatibility is not maintained. Please use only this version
with fresh projects.
With such significant changes, we expect this release to also be breaking. Please report any bugs in the issue board, and
they should be addressed in upcoming releases.
Overview
- Introducing a new functional API for creating pipelines and steps. This is now the default mechanism for building ZenML pipelines. read more
- Steps now use Materializers to handle artifact serialization/deserialization between steps. This is a powerful change, and will be expanded upon in the future. read more
- Introducing the new
Stack
paradigm: Easily transition from one MLOps stack to the next with a few CLI commands read more - Introducing a new
Artifact
,Typing
, andAnnotation
system, withpydantic
(anddataclasses
) support read more - Deprecating the
pipelines_dir
: Now individual pipelines will be stored in their metadata stores, making the metadata store a single source of truth. read more - Deprecating the YAML config file: ZenML no longer natively compiles to an intermediate YAML-based representation. Instead, it compiles and deploys directly into the selected orchestrator's
representation. While we do plan to support running pipelines directly through YAML in the future, it will no longer be
the default route through which pipelines are run. read more about orchestrators here
Technical Improvements
- A completely new system design, please refer to the docs.
- Better type hints and docstrings.
- Auto-completion support.
- Numerous performance improvements and bug fixes, including a smaller dependency footprint.
What to expect in the next weeks and the new ZenML
Currently, this release is bare bones. We are missing some basic features which used to be part of ZenML 0.3.8 (the previous release):
- Standard interfaces for
TrainingPipeline
. - Individual step interfaces like
PreprocesserStep
,TrainerStep
,DeployerStep
etc. need to be rewritten from within the new paradigm. They should
be included in the non-RC version of this release. - A proper production setup with an orchestrator like Airflow.
- A post-execution workflow to analyze and inspect pipeline runs.
- The concept of
Backends
will evolve into a simple mechanism of transitioning individual steps into different runners. - Support for
KubernetesOrchestrator
,KubeflowOrchestrator
,GCPOrchestrator
andAWSOrchestrator
are also planned. - Dependency management including Docker support is planned.
Our roadmap goes into further detail on the timeline.
We encourage every user (old or new) to start afresh with this release. Please go over our latest docs
and examples to get a hang of the new system.
Onwards and upwards to 1.0.0!
0.3.8
preparing 0.3.8
0.3.7.1rc5
preparing for 0.3.7.1
0.3.7
0.3.7
0.3.7 is a much-needed, long-awaited, big refactor of the Datasources paradigm of ZenML. There are also bug fixes, improvements, and more!
For those upgrading from an older version of ZenML, we ask to please delete their old pipelines
dir and .zenml
folders and start afresh with a zenml init
.
If only working locally, this is as simple as:
cd zenml_enabled_repo
rm -rf pipelines/
rm -rf .zenml/
And then another ZenML init:
pip install --upgrade zenml
cd zenml_enabled_repo
zenml init
New Features
-
The inner-workings of the
BaseDatasource
have been modified along with the concrete implementations. Now, there is no relation between aDataStep
and aDatasource
: ADatasource
holds all the logic to version and track itself via the newcommit
paradigm. -
Introduced a new interface for datasources, the
process
method which is responsible for ingesting data and writing to TFRecords to be consumed by later steps. -
Datasource versions (snapshots) can be accessed directly via the
commits
paradigm: Every commit is a new version of data. -
Added
JSONDatasource
andTFRecordsDatasource
.
Bug Fixes + Refactor
A big thanks to our new contributer @aak7912 for the help in this release with issue #71 and PR #75.
- Added an example for regression.
compare_training_runs()
now takes an optionaldatasource
parameter to filter by datasource.Trainer
interface refined to focus onrun_fn
rather than other helper functions.- New docs released with a streamlined vision and coherent storyline: https://docs.zenml.io
- Got rid of unnecessary Torch dependency with base ZenML version.
0.3.6
0.3.6
0.3.6 is a more inwards-facing release as part of a bigger effort to create a more flexible ZenML. As a first step, ZenML now supports arbitrary splits for all components natively, freeing us from the train/eval
split paradigm. Here is an overview of changes:
New Features
-
The inner-workings of the
BaseTrainerStep
,BaseEvaluatorStep
and theBasePreprocesserStep
have been modified along with their respective components to work with the new split_mapping. Now, users can define arbitrary splits (not just train/eval). E.g. Doing atrain/eval/test
split is possible. -
Within the instance of a
TrainerStep
, the user has access toinput_patterns
andoutput_patterns
which provide the required uris with respect to their splits for the input and output(test_results) examples. -
The built-in trainers are modified to work with the new changes.
Bug Fixes + Refactor
A big thanks to our new super supporter @zyfzjsc988 for most of the feedback that led to bug fixes and enhancements for this release:
- #63: Now one can specify which ports ZenML opens its add-on applications.
- #64 Now there is a way to list integrations with the following code:
from zenml.utils.requirements_utils import list_integrations.
list_integrations()
- Fixed #61:
view_anomalies()
breaking in the quickstart. - Analytics is now
opt-in
by default, to get rid of the unnecessary prompt atzenml init
. Users can still freelyopt-out
by using the CLI:
zenml config analytics opt-out
Again, the telemetry data is fully anonymized and just used to improve the product. Read more here
0.3.5
This release finally brings model agnostic automatic evaluation to ZenML! Now you can easily use TFMA with any model type to produce evaluation visualizations. This means you can now use TFMA with PyTorch or Scikit - a big win for automated sliced evaluation! It also introduces a new language for differentiation between features, raw features, labels and predictions, in addition to solving a few big bugs in the examples
directory! Read more below.
As has been the case in the last few releases, this release is yet another breaking upgrade.
For those upgrading from an older version of ZenML, we ask to please delete their old pipelines
dir and .zenml
folders and start afresh with a zenml init
.
If only working locally, this is as simple as:
cd zenml_enabled_repo
rm -rf pipelines/
rm -rf .zenml/
And then another ZenML init:
pip install --upgrade zenml
cd zenml_enabled_repo
zenml init
New Features
-
Added a new interface into the trainer step called
test_fn
which is utilized to produce model predictions and save them as test results -
Implemented a new evaluator step called
AgnosticEvaluator
which is designed to work regardless of the model type as long as you run thetest_fn
in your trainer step -
The first two changes allow torch trainer steps to be followed by an agnostic evaluator step, see the example here.
-
Proposed a new naming scheme, which is now integrated into the built-in steps, in order to make it easier to handle feature/label names.
-
Modified the
TorchFeedForwardTrainer
to showcase how to use TensorBoard in conjunction with PyTorch
Bug Fixes + Refactor
- Refactored how ZenML treats relative imports for custom steps. Now, rather than doing absolute imports like:
from examples.scikit.step.trainer import MyScikitTrainer
One can also do the following:
from step.trainer import MyScikitTrainer
ZenML automatically figures out the absolute path of the module based on the root of the directory.
- Updated the Scikit Example, PyTorch Lightning Example, GAN Example accordingly. Now they should work according to their README's.
Big shout out to @SaraKingGH in issue #55 for raising the above issues!
0.3.4
This release is a big design change and refactor. It involves a significant change in the Configuration file structure, meaning this is a breaking upgrade.
For those upgrading from an older version of ZenML, we ask to please delete their old pipelines
dir and .zenml
folders and start afresh with a zenml init
.
If only working locally, this is as simple as:
cd zenml_enabled_repo
rm -rf pipelines/
rm -rf .zenml/
And then another ZenML init:
pip install --upgrade zenml
cd zenml_enabled_repo
zenml init
New Features
- Introduced another higher-level pipeline: The NLPPipeline. This is a generic
NLP pipeline for a text-datasource based training task. Full example of how to use the NLPPipeline can be found here - Introduced a BaseTokenizerStep as a simple mechanism to define how to train and encode using any generic
tokenizer (again for NLP-based tasks). - Introduced a new HuggingFace integration, with the first concrete implementation of the BaseTokenizerStep, i.e., the HuggingFaceTokenizer.
- Show-cased how to use HuggingFace with the ZenML TrainerStep in the NLP Example.
Bug Fixes + Refactor
- Significant change to imports: Now imports are way simpler and user-friendly. E.g. Instead of:
from zenml.core.pipelines.training_pipeline import TrainingPipeline
A user can simple do:
from zenml.pipelines import TrainingPipeline
The caveat is of course that this might involve a re-write of older ZenML code imports.
Note: Future releases are also expected to be breaking. Until announced, please expect that upgrading ZenML versions may cause older-ZenML generated pipelines to behave unexpectedly.
Special shout-out to @nicholasmaiot for major contributions to this release!
0.3.3
This release is a significant one as it includes the first version of the AWS integration. It allows you to use ZenML to launch an EC2 instance as an orchestrator and execute a ZenML pipeline possibly coupled with an S3 artifact store and RDS metadata store.
It is a new feature and it does not include any breaking changes.
In order to install ZenML with the AWS integration attached, you can follow:
pip install --upgrade zenml[aws]
zenml init
New Features
- OrchestratorAWSBackend implemented to launch an EC2 instance as the orchestrator.
- While you are using the new orchestrator backend, you may use S3 and RDS.
- Implemented an example which covers the basic process if you would like to start testing it right away.
Bug Fixes + Refactor
- For more advanced use-cases, more examples will follow in the future.
- Numerous small bugs and refinements.