Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multiple versions of a package #470

Open
JCGoran opened this issue Dec 7, 2022 · 9 comments
Open

Support for multiple versions of a package #470

JCGoran opened this issue Dec 7, 2022 · 9 comments

Comments

@JCGoran
Copy link
Contributor

JCGoran commented Dec 7, 2022

Problem Description

Currently, pdoc only supports one version of a given Python package.

Proposal

I would like to see (optional) support for multiple versions of a package in pdoc. It's possible that some users of a package do not use the version whose current documentation is available, and having the option to switch versions would greatly aid in usability, especially if some objects were removed/renamed between different package versions.

Alternatives

I think Sphinx has support for this feature, but I have not tested it.

Additional context

None.

@mhils
Copy link
Member

mhils commented Dec 7, 2022

I absolutely agree that this is a useful feature, but I'm don't necessarily think it's something that should be natively supported by pdoc. Here's what I would do instead:

  1. Upload your docs to S3-compatible object storage with a directory structure like this:

    1.0.0/...
    2.0.0/...
    latest/...
    

    Future uploads can be done as part of CI.

  2. Write a small JavaScript snippet that enumerates the bucket for all versions (see here for some similar code).

  3. Add some HTML/JS to pdoc's template which adds a version selector based on the JavaScript snippet.

It'd be awesome to have a custom template in examples/ with a snippet that does 2)! 😃

@dAnjou
Copy link

dAnjou commented May 21, 2023

Hello 👋 (TL;DR below)

While I agree that this functionality is not necessarily in the scope of pdoc, I'm afraid there's a bit more to it than meets the eye. The problem with your suggestion is that I'd never be able to change the design and layout of the docs.

I am actually already publishing docs for multiple versions of https://gitlab.com/dAnjou/fs-code here: https://danjou.gitlab.io/fs-code (well, just two versions for now). Much like you're describing, I have a CI job triggering the whole thing via a Python script and I have a bit of JavaScript in a customized pdoc template, all the code is here: https://gitlab.com/dAnjou/fs-code/-/blob/main/docs/.

What I do is iterating over the version tags, checking them out, and regenerating the docs each time, so that the pages keep looking the same. It works okay so far, but there's a rather unpleasant hack in there 😞

In the Python script that's generating the docs, I happen to use my very same library for which I'm generating the docs, which means it needs to be installed.

Now it seems that the installed module always takes precedence over a local package directory.

When I say codefs, I get this:

/Users/dAnjou/Projects/fs-code/.venv/lib/python3.10/site-packages/pdoc/extract.py:123: RuntimeWarning: 'codefs' may refer to either the installed Python module or the local file/directory with the same name. pdoc will document the installed module, prepend './' to force documentation of the local file/directory.
 - Module location: /Users/dAnjou/Projects/fs-code/src/codefs/__init__.py
 - Local file/directory: /private/var/folders/68/qf6k_9s17vd3yyh97mmy_mz40000gr/T/tmporitzu14HEAD/codefs

And when I then say ./codefs, I get this:

/Users/dAnjou/Projects/fs-code/.venv/lib/python3.10/site-packages/pdoc/extract.py:145: RuntimeWarning: pdoc cannot load 'codefs' because a module with the same name is already imported in pdoc's Python process. pdoc will document the loaded module from /Users/dAnjou/Projects/fs-code/src/codefs/__init__.py instead.

So, to hack myself around this, I'm currently doing a poetry install in my CI job, which includes an editable installation of my library. In the Python script, for each checked out version I'm replacing the current src directory. Only then I'm running pdoc, because then it can use the installed module.

(Don't mind the subprocess stuff. That's still from when I was using pdoc 9 and using it as a library had some rough edges, which are all fixed now, so I'm in the process of migrating - also why I'm writing this comment here now.)

TL;DR

So, from what I can see, the core issue is that you basically cannot use pdoc as a library in a Python script that's supposed to generate docs for multiple versions of your code, if that code is also "imported in pdoc's Python process".

Imagine someone wanted to write a Python library that does exactly what OP is asking, as a pdoc add-on, for example. Ironically, you could not use pdoc to document multiple versions for this library, because it would always prefer the installed module.

@mhils
Copy link
Member

mhils commented May 22, 2023

@dAnjou: For your specific use case, it's probably easier to do the "check out another version" part in a bash script and then invoke your make-docs.py script from there. Having multiple versions of the same module in a Python process is usually a recipe for disaster. :)

@dAnjou
Copy link

dAnjou commented May 22, 2023

For your specific use case, it's probably easier to do the "check out another version" part in a bash script and then invoke your make-docs.py script from there.

I don't think it matters whether I check out the version in a bash script or in the Python script.

Having multiple versions of the same module in a Python process is usually a recipe for disaster.

Yes, that's the core problem, and I acknowledge that my situation might be an edge case, because I happen to use my library for checking out the version. But any situation where you want to use the latest version of your own code in the docs script would fail.

And it still makes me wonder whether there's a way for pdoc to extract code structure and doc strings without importing the module 🤔

@mhils
Copy link
Member

mhils commented May 22, 2023

I don't think it matters whether I check out the version in a bash script or in the Python script.

Well the idea would be that you have a fresh Python interpreter for each version.

But it still makes me wonder whether there's a way for pdoc to extract code structure and doc strings without importing the module 🤔

pdoc heavily relies on dynamic analysis (as opposed to static analysis), so the answer is a resounding no unfortunately.

FWIW pdoc.extract.invalidate_caches may be an alternative here. In either case, my recommendation would be not to overcomplicate things. Render once into an S3 bucket and then be done with it. :)

@dAnjou
Copy link

dAnjou commented May 22, 2023

Well the idea would be that you have a fresh Python interpreter for each version.

I can still do it in a Python script if I don't run it in an env that has my library installed, which I also cannot use then of course, but I can use Dulwich directly, for example.

pdoc heavily relies on dynamic analysis (as opposed to static analysis), so the answer is a resounding no unfortunately.

Totally understand that 👍

In either case, my recommendation would be not to overcomplicate things. Render once into an S3 bucket and then be done with it. :)

That's not an option for me. I want to be able to change the design and layout, and it should be applied to all versions already published.

@JCGoran
Copy link
Contributor Author

JCGoran commented May 25, 2023

After some messing around with Bash scripting, I managed to get multiple versions to work by using the using following steps, which I'm writing down in case it helps anyone:

  1. figure out which versions to document; I opted for documenting all of the tags + the master branch, and stored this in an env variable VERSIONS (space separated)
  2. I've added a drop-down menu to the custom template with the code below; note that since I use GH pages, redirectURL and match have a package variable (also set from the env variable PACKAGE) since the docs will always be generated for [USERNAME].github.io/[PACKAGE]/, so those parts are GH-specific, and should be modified appropriately for other platforms (I am also assuming that the full path starts with /[PACKAGE]/[VERSION]/, so that the drop-down reflects the currently selected version regardless of which sub-page the user is at):
{% set versions = env.get("VERSIONS", "").strip().split(" ") %}
{% set package = env.get("PACKAGE", "test") %}
{% block nav_footer %}
    <footer>
      <label for="page-select">Version:</label>
      <select id="page-select" onchange="redirectToPage()">
        <option value="">Select an option</option>
        {% for item in versions %}
            <option value="{{ item }}">{{ item }}</option>
        {% endfor %}
        </select>

      <script>
        function redirectToPage() {
          var select = document.getElementById("page-select");
          var selectedOption = select.options[select.selectedIndex].value;
          if (selectedOption !== "") {
            var redirectURL = "/{{ package }}/" + selectedOption + "/index.html";
            window.location.href = redirectURL;
          }
        }
        // Set the default value of the dropdown to the selected option
        window.onload = function() {
          var select = document.getElementById("page-select");
          var currentURL = window.location.pathname;
          var match = currentURL.match(/^\/{{ package }}\/([^/]+)\/.+$/);
          if (match && match[1]) {
            select.value = match[1];
          }
        };
      </script>
{% endblock %}
  1. build the docs for each version; I git reset && git checkout only the source files, since this way, the script for building the docs is not affected, while it can still remain part of the repo
  2. publish via a GH action. Note that in order to be able to check out the tags in the CI, one should add the following to the YAML file:
    - uses: actions/checkout@v2
      with:
        fetch-depth: 0

@adigitoleo
Copy link

@JCGoran Thanks for the informative write-up on how to achieve this. Can you clarify what the directory structure needs to look like when building for multiple versions? I'm a bit unclear about what the output directory should be for pdoc after I checkout a particular tag in the CI script.

@JCGoran
Copy link
Contributor Author

JCGoran commented Sep 23, 2024

Can you clarify what the directory structure needs to look like when building for multiple versions?

Unfortunately I haven't touched the project using this particular setup in a while, but the general setup is described in this file, which I just run as bash generate_docs.sh -t, which generates the docs for all of the git tags.

The rough idea is as follows:

  • find all of the git tags and sort them
  • add any additional branches (like master) to the final list
  • for each tag, git checkout it, and build the docs under something like docs/[VERSION] (note that this part is very destructive so I only run it fully in the CI where I can't nuke anything important)
  • finally, create a file docs/index.html which redirects to the latest stable version (so usually not master) when loading the docs
  • since I wanted all of the versions navigable on the docs website (otherwise there'd be no point in this exercise), I needed to modify the default pdoc template with some HTML and custom JS (see here https://github.com/JCGoran/fitk/blob/db8efab706875cd51909aac95b69ea4f3987eed2/templates/module.html.jinja2#L20-L52). You can use pdoc's -t [DIRECTORY] flag for this, where [DIRECTORY] is the dir where the templates are located (don't remember if the name of the template file has to be exactly module.html.jinja2)
  • publish the docs dir (since it has all of the versions)

You can see the final result here (the most notable difference w.r.t. the default pdoc template is the "Version" drop-down on the bottom of the left sidebar, which is clickable and redirects to the right version properly).

pdoc's handling of images was (is?) a bit cumbersome, and as a result the generate_docs.sh script is a bit convoluted, but I am content with the results. Note that I haven't implemented caching in the CI, which means if you have many versions, it may be a bit slow to build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants