-
Notifications
You must be signed in to change notification settings - Fork 470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow custom log formats #3288
Allow custom log formats #3288
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The style guide flagged several spelling errors that seemed like false positives. We skipped posting inline suggestions for the following words:
- [Ee]ndhint
src/zenml/orchestrators/utils.py
Outdated
@@ -40,6 +45,14 @@ | |||
if TYPE_CHECKING: | |||
from zenml.artifact_stores.base_artifact_store import BaseArtifactStore | |||
|
|||
ENV_VARS_TO_PROPAGATE = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefannica Do you think this is a reasonable change, to automatically forward these logging related environment variables to the Docker containers that run steps? Or should I revert this, or add another environment variable/setting somewhere that allows specifying which environment variables get forwarded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concern I have is that some of these client-side env vars may override whatever is set in the container images or docker settings that the users define for containerized pipelines. How would a user change the logging verbosity for a containerized pipeline before this PR ? And would that still work ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that is the case with the current implementation. Previously, the only way for users to configure the logging behaviour for steps in container images was by setting these environment variables in the Docker settings, which means they will be included in the Dockerfile.
My changes in this PR essentially pass the environment as runtime options using the orchestrator, which will override the values present in the Docker image. So this would lead to different behaviour than previous versions of ZenML in cases where users have local settings that differ from the environment variables they set in the Docker settings. Which I definitely agree is not the best solution.
I see a few ways to change this:
- Remove the automatic forwarding of any environment variable. This means that the logging format will only be applied client-side, and if wanted users would also have to set it on the Docker settings to apply it to step logs.
- Only forward my new environment variable for the log format. This wouldn't change any existing behaviour, but would be very inconsistent in the way we treat these (logging) environment variables.
- Allow users to configure which environment variables from the client environment get forwarded (probably via a different environment variable or a pipeline configuration?). We could then still decide whether we want to default this to nothing, only the log format, or all logging environment variables
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefannica @bcdurak After a discussion with @htahir1 we decided to keep the new environment variable consistent with all the existing ones, and revisit this environment forwarding when we take a closer look at environments.
The formatter. | ||
""" | ||
if log_format := os.environ.get(ENV_ZENML_LOGGING_FORMAT, None): | ||
return logging.Formatter(fmt=log_format) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schustmi Do you think we can also add a standard Python entry-point logic, in order to be able to inject custom implementation of the Logger?
It is a clean way to provide extensibility in a plugin fashion, just by pip-installing dependencies in the runtime.
Here is an example of how MLFlow does its plugin system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are many ways to extend this (this was supposed to just be a quick way to have some basic configurability). There's also the logging file config which we could give access to I assume (https://docs.python.org/3/library/logging.config.html#logging-config-fileformat), but I'm curious to hear what kind of plugin you'd like, both specifically for this logging feature as well as in general in ZenML :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My company uses a custom JSON log format to ingest in the central logging infrastructure. Some of the fields can come from environment variables, and some can be dynamically added based on the context (e.g., if there is an HTTP context, we add the correlation_id
). We can also reformat the exception stack trace.
We use the json-logging
lib to do this, thus we need to provide our implementation of the CustomLogger class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I just had a quick look at the library and it seems like it needs to be initialized which supplies a different log formatter for all loggers. Could you provide me with a quick pseudo-code of your internal logging setup, so that I can make sure it's possible when designing more logging configurability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schustmi here's the pseudo-code / sequence we use when setting up the log facility:
- Define a set of loggers to skip
- Define CustomLogger class (json_logging lib)
- Define function to get root logger names
- Define function to filter logger names based on the set of loggers to skip
- Define function to attach logger with handlers
- Define function to set up logger with parameters
- Clear existing handlers and set up new ones
- Attach loggers for main package and additional loggers if autodiscovery is enabled
- Set log level and propagation for the main logger
✅ No broken markdown links found! |
src/zenml/utils/dashboard_utils.py
Outdated
@@ -157,8 +158,17 @@ def show_dashboard(url: str) -> None: | |||
|
|||
elif environment in (EnvironmentType.NATIVE, EnvironmentType.WSL): | |||
if constants.handle_bool_env_var( | |||
constants.ENV_AUTO_OPEN_DASHBOARD, default=True | |||
constants.ENV_AUTO_OPEN_DASHBOARD, default=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current state, if they use the old variable and set it to False
, this check will still return True
. Here, I would still expect the old one to work, while throwing a warning.
Alternatively, if they configured the old one to be True
and the new one to be False
, this will still open the dashboard.
Not sure, if this is the intended behavior, that's why I just wanted to note it down here. (IMO, if the new one is set, I would expect it to take precedence.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, the dashboard automatically openend unless you set the legacy environment variable to False
. IMO, setting any of these environment variable to True
should be a no-op. It's the default behaviour anyway.
With the current logic, there is no precedence: If any of the two are set to False
, the dashboard will not open automatically. If the old one is used, it will warn that it's deprecated.
WDYT?
Co-authored-by: Barış Can Durak <[email protected]>
Describe changes
This PR adds an environment variable to allow specifying custom log formats. Additionally, we now propagate logging-related environment variables for remote pipeline runs.
Pre-requisites
Please ensure you have done the following:
develop
and the open PR is targetingdevelop
. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes