Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tackling pain points of computational notebooks #1223

Closed
7 tasks done
josevalim opened this issue Jun 12, 2022 · 8 comments
Closed
7 tasks done

Tackling pain points of computational notebooks #1223

josevalim opened this issue Jun 12, 2022 · 8 comments
Labels
discussion Needs to be discussed before moving forward

Comments

@josevalim
Copy link
Contributor

josevalim commented Jun 12, 2022

The paper What's wrong with computational notebooks highlights several pain points. We list them below and annotate the ones already tackled:

  • Setup - Loading and cleaning data from multiple sources and platforms is a tortuous, multi-step, manual process.
    Our Comment: Our goal is that smart cells provide the path for automating workflows such as setup. Cleaning data is supported via Kino.Explorer. External data (files and URLs) is on the roadmap (see Support mime drag and drop #1604).

  • Manage Code - Managing code without software engineering support results in “dependency hell” with ad hoc workarounds that only go so far.
    Our Comment: Code assistance, dependency management, and testing are built-in and improve on every release.

  • Archival - Preserving the history of changes and states within and between notebooks is unsupported, leading to unnecessary rework.
    Our Comment: Livebooks are designed to be persisted and versioned with SCM software.

  • Security - Maintaining data confidentiality and access control is an ad hoc, manual process where errors can leak private client data.
    Our Comment: Secret management is built-in.

  • Share and Collaborate - Sharing data or parts of the notebook interactively and at different levels—demo/reports, review/comment, collaborative editing—is generally unsupported.
    Our Comment: Collaboration is mostly tackled and not only Livebook is collaborative, but you can build collaborative applications too.

  • Reproduce and Reuse - Replicating results or reusing parts of code is infeasible because of high levels of customization and environment dependencies.
    Our Comment: Livebooks are designed to be fully reproducible. Smart cells as well as Hex packages are two of the existing mechanisms for reuse. More granular mechanisms may be added later.

  • Notebooks as Products - Deploying to production requires significant cleanup and packaging of libraries—DevOps skills that are outside the core skill set of data scientists.
    Our Comment: App deployment is already available and we are working on one click deployment to Hugging Face, Fly.io, etc.

The following points were important design considerations in the design of Livebook but limitations may surface as usage increases. Therefore we may receive feedback on how Livebook is affected and how it can improve in the future:

  • Reliability - Scaling to large datasets is unsupported, causing kernel crashes and inconsistent data.

  • Explore and Analyze - An unending cycle of copy-paste and tweaking bits of code made worse by feedback latency and kernel crashes.

@josevalim josevalim changed the title Pain points of computation notebooks Tackling pain points of computational notebooks Jun 12, 2022
@josevalim josevalim added the discussion Needs to be discussed before moving forward label Jun 12, 2022
@samrose
Copy link

samrose commented Oct 26, 2022

Update: see comment #1223 (comment)
@josevalim I have been trying to come up with a basic packaging of livebook for our company https://floxdev.com here https://github.com/flox-examples/livebook

I will also then upstream this to https://github.com/nixos/nixpkgs

The builder for elixir in the nix community is discussed at https://nixos.org/manual/nixpkgs/stable/#sec-beam and is usually mixRelease which basically performs a prod mix release of the project, and then passes it into the nix store to make it reproducible in nix builds. This actually works great for nearly every elixir and phoenix project, and makes elixir and nix (and flox) a powerful combo!

One hitch with livebook is that it has a file env.sh that contains the following content

export RELEASE_NODE=livebook_server
export RELEASE_MODE=interactive

cookie_path="${RELEASE_ROOT}/releases/COOKIE"
if [ ! -f $cookie_path ]; then
  cat /dev/urandom | env LC_ALL=C tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1 > $cookie_path
fi

When the release is run "from" the nix store after it has been built with nix (via mix release), because the nix store is read-only, this issue appears

$ result/bin/livebook start
/nix/store/6hgfah3hgiayzb34jjx95939wr4z39jd-livebook-0.0.0/releases/0.6.3/env.sh: line 6: /nix/store/6hgfah3hgiayzb34jjx95939wr4z39jd-livebook-0.0.0/releases/COOKIE: Permission denied

I might just try to patch livebook source code on build in the package to remove this issue. However, I wanted to bring this to the livebook project attention, to see if you had opinions about possibly existing ways to stop this script from being triggered somehow (maybe by some other env var that can be set?)

One big payoff is with nix and flox is that once we can package and run it well, now not only is the notebook reproducible, but so is the entire graph of dependencies one might need to run it on any linux or macos machine.

@Munksgaard
Copy link

I've had the same problem as @samrose. I have resorted to creating a fork of livebook with a patch that removes the relevant lines from env.sh: https://github.com/Munksgaard/livebook/blob/nix/fix_cookies.patch

The result is that the branch in question can be built using nix-build and that the resulting release can be run without any issues, by explicitly setting RELEASE_COOKIE:

RELEASE_COOKIE=/var/lib/livebook/.cookie result/bin/livebook start

Seems to work here.

@jonatanklosko
Copy link
Member

@josevalim perhaps we should opt-out of creating the cookie file if RELEASE_COOKIE is set?

@josevalim
Copy link
Contributor Author

@jonatanklosko sounds good to me!

@samrose
Copy link

samrose commented May 23, 2023

fwiw if anyone comes across this through searching for how to package livebook with nix

The approach here, that uses mix escript.build is the best way to package livebook via nix

https://github.com/hauleth/nix-elixir/blob/master/pkgs/livebook.nix

@Munksgaard
Copy link

fwiw if anyone comes across this through searching for how to package livebook with nix

The approach here, that uses mix escript.build is the best way to package livebook via nix

https://github.com/hauleth/nix-elixir/blob/master/pkgs/livebook.nix

May I suggest packaging that up for nixpkgs? Or, if you don't want to, can I?

@samrose
Copy link

samrose commented May 30, 2023

@Munksgaard you don’t need my permission go for it! And if you do it, thank you.

@Munksgaard
Copy link

Munksgaard commented May 30, 2023

@Munksgaard you don’t need my permission go for it! And if you do it, thank you.

Done! I might look into adding it as a service afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Needs to be discussed before moving forward
Projects
None yet
Development

No branches or pull requests

4 participants