-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable lazy loading support for DataCatalog 2.0 on Kedro-Viz #2272
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Sajid Alam <[email protected]>
Signed-off-by: Sajid Alam <[email protected]>
Signed-off-by: Sajid Alam <[email protected]>
Signed-off-by: Sajid Alam <[email protected]>
Signed-off-by: Sajid Alam <[email protected]>
Signed-off-by: Sajid Alam <[email protected]>
Signed-off-by: Sajid Alam <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with the new catalog we can significantly simplify overall logic, which will be the same for default
and lite
modes:
- We still need to resolve dataset factory patterns, but we do it via the
catalog.config_resolver.resolve_pattern()
method—it returns dataset configuration and does not require dataset initialization. - The above step is the only step needed to display the dataset
- Once the user clicks on the dataset, we get it from the catalog, then call preview and process failure if something is uninstalled
- That way, we don't really need
UnavailableDataset
andlite
mode becomesdefault.
I see that the difficulty is in supporting different catalogs simultaneously and fitting different concepts together. Maybe we need to discuss how to apply the above concept to further transition to a new catalog while maintaining backward compatibility.
Signed-off-by: Sajid Alam <[email protected]>
Signed-off-by: Sajid Alam <[email protected]>
Signed-off-by: Sajid Alam <[email protected]>
Signed-off-by: Sajid Alam <[email protected]>
@model_validator(mode="before") | ||
@classmethod | ||
def check_kedro_obj_exists(cls, values): | ||
assert "kedro_obj" in values | ||
# assert "kedro_obj" in values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is kedro_obj not needed for TaskNode ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the new catalogs, kedro_obj
might be None for some nodes so we skip this. We store None
for dataset_obj so we don’t cause an import or library load for the lazy loading approach.
Signed-off-by: Sajid Alam <[email protected]>
Signed-off-by: Sajid Alam <[email protected]>
try: | ||
dataset_obj = self.catalog.get_dataset(dataset_name) | ||
except DatasetError: | ||
dataset_obj = UnavailableDataset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's test this out in kedro viz --lite, remove kedro-datasets from pip. and see what happens.
Description
Related to: #2213
This PR integrates the new lazy-loading feature from Kedro’s updated DataCatalog API into Kedro-Viz.
Specifically, it checks if the project’s catalog is an instance of
KedroDataCatalog
. This allows Kedro-Viz to show dataset details without requiring full dataset installation. It also prevents unintended data loads if a dataset is configured to be lazy.Development notes
QA notes
Checklist
RELEASE.md
file