Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace ZConfig.loader.openPackageResource with pkg_resources? #28

Open
jamadden opened this issue Mar 16, 2017 · 11 comments
Open

Replace ZConfig.loader.openPackageResource with pkg_resources? #28

jamadden opened this issue Mar 16, 2017 · 11 comments

Comments

@jamadden
Copy link
Member

The openPackageResource function is fairly complicated. It may or may not support things like zipped imports and all the complicated ways that namespace packages work these days (it's not immediately clear to me).

Can we replace its guts with pkg_resources? Something like this:

    try:
        import pkg_resources
        return pkg_resources.resource_stream(package, path)
    except Exception as e:
        raise ZConfig.SchemaResourceError(
                    "error opening schema component: " + repr(e),
                    filename=path,
                    package=package)

Running the test suite with that shows only one real failure: There is a module that deliberately manipulates its own __path__ to point to a non-package subdirectory. I know that's technically allowed, but it's also discouraged these days. If we still need to support that we could have that in a small fallback handler---that would still let us remove a third of the code.

Thoughts?

@freddrake
Copy link
Contributor

That would add a dependency on setuptools, which ZConfig doesn't have now. I know there's a plan for pkg_resources to become (more?) independent (which I support), but I'm not entirely sure of the current status of that, or if Python 2.7 is on the radar for packaging folks these days.

I have a slight preference for avoiding a dependency on setuptools, but I'll admit it's slight, especially after going back to take a look at that function.

@jamadden
Copy link
Member Author

That would add a dependency on setuptools, which ZConfig doesn't have now.

True. Though in practice it's probably there anyway. If pip is installed, setuptools is installed.

I know there's a plan for pkg_resources to become (more?) independent (which I support), but I'm not entirely sure of the current status of that,

There is, yes. pypa/setuptools#863 says its blocked on not requiring setuptools to be self-installable, which was accomplished earlier this year in pypa/setuptools#581 when setuptools started taking dependencies on other projects (packaging, six and appdirs so far, most of which actually seem to be dependencies of pkg_resources). So I'd say we're closer than ever. Asymptotically, maybe, but still 😄

I have a slight preference for avoiding a dependency on setuptools, but I'll admit it's slight, especially after going back to take a look at that function.

It's not a real urgent need for me right now. But I did have reason to notice how complex it was when I was trying to debug a problem with it (Python 2 and non-absolute __path__ for modules imported from the current directory break it...but that also breaks pkg_resources too). And then I noticed that it hasn't been substantially changed since it was written in 2005 (except for some minor work in 2013 for the Python 3 port). I know the state of the import world has moved on since then (PEP420, etc) so I would guess there are corners we no longer support...but again, that's just a guess, not a current concrete need.

@freddrake
Copy link
Contributor

Though in practice it's probably there anyway. If pip is installed, setuptools is installed.

Perhaps. Though there are still those of us happy to live in a pip-free world.

Some parts of openPackageResource could be simplified using pkgutil. Whether that buys enough cleansing, I don't know offhand.

@jamadden
Copy link
Member Author

pkgutil.find_data gives us more or less the same thing as pkg_resources.resource_string, albeit with a different extensibility model. They both fail in the "find a data file that's just in a directory on the module's __path__" test case. We'd have to write that fallback in either case. It does let us drop the pkg.__loader__-based branch of the code, though.

@jamadden
Copy link
Member Author

This is what it looks like when all the tests pass using pkgutil (and some unified error handling). The complexity is to handle __path__. I'm not sure how much better this is...

def openPackageResource(package, path):
    data = None
    v, tb = (None, None)
    extra_path = []

    def make_error(msg):
        v = ZConfig.SchemaResourceError(
            msg,
            filename=path,
            package=package,
            path=extra_path)
        tb = sys.exc_info()[2]
        return v, tb

    try:
        data = pkgutil.get_data(package, path)
    except Exception as e:
        v, tb = make_error("schema component not found: " + repr(e))

    if data is None:
        __import__(package)
        pkg = sys.modules[package]
        extra_path.extend(pkg.__path__)

        loader = pkgutil.get_loader(pkg)
        for pkg_path in extra_path:
            loadpath = os.path.join(pkg_path, path)
            try:
                data = loader.get_data(loadpath)
            except Exception as e:
                v, tb = make_error("error opening schema component: " + repr(e))
            else:
                break


    if data is None:
        if v is not None:
            try:
                reraise(type(v), v, tb)
            finally:
                del tb
        raise make_error("schema component not found")[0]

    assert data is not None
    return StringIO(data.decode('utf-8'))

@freddrake
Copy link
Contributor

Ouch. Right, not a clear win.

@jamadden
Copy link
Member Author

I've been thinking about it, and I guess my question at this point is: Why is this __path__ handling necessary? Can it be deprecated and dropped, or just dropped? I think I can see why it used to be useful, but I also think it's dangerous nowadays.

Here's the test that uses it. Given the directory structure:

ZConfig/
   __init__.py
   tests/
     __init__.py
     library/
       __init__.py
       thing/
          __init__.py
          extras/
              extras.xml

The test wants to be able to do <import package="ZConfig.tests.library.thing" file="extras.xml" />. That is, it wants to "magically" find extras.xml inside a package it's not actually inside. This is accomplished by having ZConfig/tests/library/thing/__init__.py do __path__.append('extras').

The natural way to write that would be <import package="ZConfig.tests.library.thing.extras" file="extras.xml" />. Under Python 3, ZConfig.tests.libarry.thing.extras is an importable package even without the presence of an __init__.py (under Python 2 one would have to create an __init__.py).

Or, to not change the <import> one would move extras.xml inside the thing directory directly.

So given those two different ways to do this same thing, why would one want this extra indirection of extending __path__?

My best guess is trying to avoid conflicts with (namespace) packages and single-file modules. Suppose you were distributing a single-file module inside a namespace, let's say zope.fizzbin, distributed as zope/fizzbin.py. You'd like people to be able to do <import package="zope.fizzbin" /> to read zope/component.xml. But what if someone else, say zope.tongo was also a single-file module and also wanted to enable <import package="zope.tongo" />? If you both tried to distribute zope/component.xml you'd conflict. Solution: distribute zope/extras/component.xml and have fizzbin.py do __path__.append('extras').

If both zope/fizzbin.py and zope/tongo.py are part of the same source directory or distribution, this works out ok because they can perfectly coordinate and be sure to not both use the same directory name. One might use fizzbin_extras and one might use tongo_extras (leaving aside why they wouldn't use a single unified schema file).

But let's suppose they're not part of the same source directory or distribution and both just use extras.

In the world where every distribution is installed as a separate egg---meaning buildout and old versions of setuptools---this will work fine. You'll wind up with zope.fizzbin.egg/zope/extras/component.xml and zope.tongo.egg/zope/extras/component.xml, and fizzbin.py and tongo.py will both have a separate loader object pointed to their respective egg directories. Heck, in this model, they could both just distribute component.xml directly, not adjust __path__, and be done with it.

However, there's a problem. Modern versions of setuptools and pip have stopped installing namespace packages as separate eggs. Instead, they're all merged into one directory ("zope" here). Conflicting data files are simply overwritten and whatever distribution was installed last wins. There will be a zope/extras/component.xml, but which one it is depends on which distribution was installed last.

The only way to be sure that <import package="zope.fizzbin" /> and <import package="zope.tongo" /> work with all installation methods is to make them both actual packages, i.e., zope/fizzbin/__init__.py and zope/tongo/__init__.py. The __path__ trick is a ticking time-bomb for namespace packages.

I looked through the zopefoundation and Plone packages on github. I didn't locate any uses of it for this purpose, but I only did a spot check.

Thoughts? I'm probably missing something.

@freddrake
Copy link
Contributor

Agree that messing with the __path__ of a namespace package is a bad idea; I can't imagine I was particularly expecting that. At this point, I'd rather write that import as

<import
    package="ZConfig.tests.library.thing"
    file="extras/extras.xml"
    />

and not do anything to manually update __path__.

Unfortunately, importlib.resources doesn't like that.

@freddrake
Copy link
Contributor

Of course, importlib.resources doesn't use the package's __path__ either. Using the package's __loader__.open_resource() doesn't either. open_resource does work with 'extras/extras.xml' as the argument, however.

Needless to say, my frustration with Python package-resources APIs remains strong.

@freddrake
Copy link
Contributor

I'm not aware of any actual ZConfig component.xml that isn't located directly in a normal package, alongside the init.py file. If we're able live with that constraint, switching to importlib.resources seems reasonable, but would be strictly be backward incompatible, necessitating a bump in the first part of the version number. importlib_resources is available to support Python versions that don't have importlib.resources.

I'm not sure how best to determine if anyone has a component that would not work using this, however.

@mgedmin
Copy link
Member

mgedmin commented Sep 24, 2019

I'm not sure how best to determine if anyone has a component that would not work using this, however.

The scream test is traditional: release a new version with this change and see if any users scream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants