Skip to content

Latest commit

 

History

History

lakefs-mount-demo

Fast Data Loading and Reproducibility for Deep Learning Workloads with lakeFS Mount

Start by ⭐️ starring lakeFS open source project.

This demo includes a Jupyter Notebook which you can run on your local machine.

Prerequisites

  • Docker installed on your local machine
  • Watch this video to understand the use case as well as the demo.
  • Contact lakeFS to get the lakeFS Everest binary for Linux x86_64 OS. Download and save the binary on your laptop.
  • OPTIONAL: Contact lakeFS to get the token for Fluffy if you want to provision lakeFS Enterprise server.

Setup

  1. Start by cloning this repository:

    git clone https://github.com/treeverse/lakeFS-samples && cd lakeFS-samples/01_standalone_examples/lakefs-mount-demo
  2. You now have two options:

    Run a Jupyter Notebook server with your existing lakeFS Server

    If you have already installed lakeFS or are utilizing lakeFS cloud, all you need to run is the Jupyter notebook server:

    docker compose up

    Once you've finished, run the following to remove all the containers:

    docker compose down

    Don't have a lakeFS Server or Object Store?

    If you want to provision a lakeFS Enterprise server as well as MinIO for your object store, plus Jupyter then first login to Treeverse Dockerhub by using the granted token so Fluffy proprietary image can be retrieved:

    docker login -u externallakefs

    then bring up the full stack:

    docker compose --profile local-lakefs-enterprise up
  3. Copy the Everest binary for Linux x86_64 OS on your laptop inside

    "lakeFS-samples/01_standalone_examples/lakefs-mount-demo" folder.

URLs and login details

Demo Instructions

Demo includes following 3 notebooks. Open any notebook from the JupyterLab UI and follow the instructions.

  1. "lakeFS Mount Demo" notebook demonstrates how to mount lakeFS datasets on laptop or server as local filesystem.
  2. "lakeFS Mount Demo with Git Integration" notebook demonstrates lakeFS Mount feature as well as how it integrates with Git. In this demo, Git is used to version control your code while lakeFS is used to version control your data and model.
  3. "lakeFS Hugging Face Mount Demo" notebook demonstrates lakeFS Mount feature but uses Hugging Face dataset.