Start by ⭐️ starring lakeFS open source project.
This repository includes following Databricks Notebooks which you can run in your Databricks cluster:
- AWS Databricks Tutorial:
- This notebook is used in this blog Databricks and lakeFS Integration: Step-by-Step Configuration Tutorial.
- Unstructured Data ML Demo:
- Use Case: Isolated Reproducible Unstructured Datasets for ML
- This notebook also runs unstructuredDataMLDemoSetup notebook internally.
- lakeFS installed and running on your local machine or on a server or in the cloud. If you don't have lakeFS already running then either use lakeFS Cloud which provides lakeFS server on-demand with a single click or refer to lakeFS Quickstart doc.
- Databricks server with the ability to run compute clusters on top of it.
- Configure your Databricks cluster to use lakeFS Hadoop file system. Read this blog Databricks and lakeFS Integration: Step-by-Step Configuration Tutorial or lakeFS documentation for the configuration.
- Permissions to manage the cluster configuration, including adding libraries.
- Download these notebooks from GitHub and import it in your Databricks workspace.
Once you have successfully completed setup then open any notebook from Databricks UI and follow the instructions.