Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support different package indexes for html-wasm notebooks #3831

Open
damienrj opened this issue Feb 18, 2025 · 10 comments
Open

Support different package indexes for html-wasm notebooks #3831

damienrj opened this issue Feb 18, 2025 · 10 comments
Labels
enhancement New feature or request

Comments

@damienrj
Copy link

Description

Hello,
On our company network we can not access pypi directly and instead have our own index that we use. This means we can't use a WASM notebook and be on the company network at the same time since otherwise packages fail to install. Having a way to support more than one package index would be great.

Suggested solution

Enable notebooks to use different package indexes.

Alternative

No response

Additional context

ModuleNotFoundError: No module named 'markdown' The module 'markdown' is included in the Pyodide distribution, but it is not installed. You can install it by calling: await micropip.install("markdown") in Python, or await pyodide.loadPackage("markdown") in JavaScript See https://pyodide.org/en/stable/usage/loading-packages.html for more details.
@damienrj damienrj added the enhancement New feature or request label Feb 18, 2025
@mscolnick
Copy link
Contributor

mscolnick commented Feb 18, 2025

Does your index mirror PyPI? this is a bit tricky since there are many pre-built packages by the Pyodide camptaible for WASM, that exist on PyPI? Will these packages exist on your index?

@damienrj
Copy link
Author

Yeah, we have a full mirror of PyPI (Really it has a pass through), but also enables us to use private internal packages that are not on pypy. But it sounds like maybe internal packages would not work with with Pyodide and WASM as well.

@mscolnick
Copy link
Contributor

mscolnick commented Feb 18, 2025

@damienrj the internal packages would work if they are pure-python (i.e. no C or Rust bindings) and you publish a wheel for them

@damienrj
Copy link
Author

damienrj commented Feb 18, 2025 via email

@mscolnick
Copy link
Contributor

This might be a decent refactor. there are two options for implementation:

  1. we could allow passing an index through marimo config, wiring it all the way to pyodide/micropip, but then we can't use our pre-generated lockfile (which helps with performance by resolving dependencies faster)
  2. we could allow passing a lockfile URL, but then you are responsible for generating that

For timeline:

  1. not sure when we would get to this, but would help you if you'd be interested in contributing it.
  2. we could help with this and show you how we generate these (it's not open source, mostly to avoid additional maintenance/support/requests), but happy to share learnings (my calendar)

@damienrj
Copy link
Author

damienrj commented Feb 18, 2025

Yeah that makes sense, I think given that it is more for enterprise that #2 would make the most sense to not interfere with non-enterprise setups. I will demo it more without the VPN off first, and I am happy to take a stab at contributing. However, I think with my company policy I am more limited to contributing to open sourced part of the code since you mentioned that part is closed source. (Also thanks for your replies)

@Ryanphoenix
Copy link

If it were possible to implement both of these options that would be of a great help on some of my use cases as well in an internal company network. We have CDN and pypi mirrors, but being able to specify a custom lockfile, pyodide, and index argument when exporting the WASM would be extremely useful. My curent workaround has been a super hacky way of using regex to edit the generated files to point towards all of the pyodide files, whls, and lockfile that is in a local folder that gets packaged/deployed alongside all of the marimo WASM files. It also entails manually downloading each extra whl I want to include and then editing the lock file to include those new one's info.

@mscolnick
Copy link
Contributor

@Ryanphoenix the regex is not a bad solution. Our front end assets are already pre-built, so it would be hard to parameterize this, and the actual implementation may be regexs as well.

The lockfile is a bit trickier. You could completely remove it and see if that works (it might append the correct index).

If you are open to sharing your snippet, maybe it's something that can help for our implementation.

@Ryanphoenix
Copy link

@mscolnick yeah, happy to share the snippet for the regex. I ended up pulling down a version of the pyodide distro, and using their lockfile (plus the marimo specific bits). VERY hacky way around things, but being able to say "here's my pypi mirror, the link to my specific (compatible) pyodide distro, and my specific lock-file" when generating the wasm and having some regex that goes in behind the scenes to overwrite them with the user's values would at least take a few steps out, and also let me not have to upload al of the .whls in the /assets/ folder alongside the other marimo generated assets. Give me a few min to scrub the code into a slimmed down state and i'll toss it in here.

@Ryanphoenix
Copy link

Ryanphoenix commented Feb 27, 2025

This is my super hacky way of doing it. Pretty much was a trial and error of using gitlab ci/cd to deploy the page, then looking to see which resource was failing to load and then finding a regex pattern for it. I hand jammed this on my personal computer so I might've accidentally introduced a typo here or there, but hopefully gets you into the right ballpark!

import os
import re

#Define the local directory containing the pyodide files (includes pyodide-lock.json)
local_pyodide_dir = "pyodide-0-26-2/pyodide"

#Define the folder containing the generated files
output_dir = "public"#using gitlab ci/cd to host as static pages form the public directory

#Define the pattern for the CDN urls, this was largely trial and error until the static pages were working 
cdn_pattern = r"https://cdn\.jsdelivr\.net/pyodide/[^\s]+/full/pyodide\.asm\.js"
cdn_template_pattern = r"https://cdn\.jsdelivr\.net/pyodide/v\${[^}]+}/full"
cdn_version_pattern = r"https://cdn\.jsdelivr\.net/pyodide/\${[^}]+}/full"
lock_file_pattern = r"https://wasm\.marimo\.app/pyodide-lock\.json\?v=\{[^}]+)&pyodide=\${[^}]+}"

#Define the local file path
local_file_path = fr"./{local_pyodide_dir}/pyodide.asm.js"

#iterate through the files in the output directory
for root, dirs, files in os.walk(output_dir):
    for file in files:
        if file.endswith(".html") or file.endswith(".js"):
            file_path = os.path.join(root, file)
            with open(file_path, "r+", encoding="utf-8",errors="ignore") as f:
                content = f.read()
                #Replace the CDN URL with the local file path
                new_content = re.sub(cdn_pattern, lambda x: (print(f"Replacement made in {file_path}: {x.group(0)} -> {local_file_path}"), local_file_path)[1], content)
                new_content = re.sub(cdn_template_pattern, lambda x: (print(f"Replacement made in {file_path}: {x.group(0)} -> {local_pyodide_dir}"), f"./{local_pyodide_dir}")[1], new_content)
                new_content = re.sub(cdn_version_pattern, lambda x: (print(f"Replacement made in {file_path}: {x.group(0)} -> {local_pyodide_dir}"), f"./{local_pyodide_dir}")[1], new_content)
                new_content = re.sub(lock_file_pattern, lambda x: (print(f"Replacement made in {file_path}: {x.group(0)} -> {local_pyodide_dir}pyodide-lock.json"), f"./{local_pyodide_dir}pyodide-lock.json")[1], new_content)

                if new_content != content:
                    f.seek(0)
                    f.write(new_content)
                    f.truncate()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants