Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the argument target_wrapper to hydra.utils.instantiate to support recursive type checking #2880

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

JulesGM
Copy link

@JulesGM JulesGM commented Apr 2, 2024

Motivation

beartype and other libraries offer runtime typechecking with function and class decorators, like so:

from beartype import beartype
from dataclasses import dataclass

@beartype
@dataclass
class ExampleDataclass:
    name: str
    unit_price: float
    quantity_on_hand: int = 0

inst = ExampleDataclass("a name", 123.32, 23)
# Is ok

inst = ExampleDataclass(123, None, 23)
# Breaks

In this PR, we suggest adding an argument to hydra.utils.instantiate that is called target_wrapper, that receives a callable as an argument. The idea is that callable is then called on _target_ before it itself is called. This allows to decorate all of a configuration's _target_s all at once. (It's possible that the argument should be called target_decorator)

One of the ways this can be used is with something like beartype, where runtime type-checking to the portion of the code being instantiated with _target_ where annotations are present, with a single line of non-invasive code, which would be really incredibly helpful for us.

It would look like this:

hydra.utils.instantiate(cfg, target_wrapper=beartype.beartype)

See the test example lower in the post for a more full example.

One could imagine other types of wrappers that people could want to do, for profiling for example, where folks could wrap every class type with a profiler before the instantiation happens.

One could also add recursive support for OmegaConf.structured with the following, which is very cool:

import dataclasses
import attrs
from omegaconf import OmegaConf


def apply_structured(_target_):
    if dataclasses.is_dataclass(_target_) or attrs.has(_target_):
        def factory(*args, **kwargs):
            return OmegaConf.structured(_target_(*args, **kwargs))
   
    return _target_

and then

hydra.utils.instantiate(cfg, target_wrapper=apply_structured)

Have you read the Contributing Guidelines on pull requests?

Yes

Test Plan

Here is a standalone test with beartype:

from pathlib import Path
import builtins
import beartype
import dataclasses
import hydra
import tempfile

CHECKER = beartype.beartype

@dataclasses.dataclass
class DataclassExample:
    argument_integer_example: int


config_example_good = """
_target_: test.DataclassExample
argument_integer_example: 123
"""
config_name_good = "config_name_good"


config_example_error = """
_target_: test.DataclassExample
argument_integer_example: 123aa
"""
config_name_error = "config_name_bad"


def doer(config_name, file_content, should_fail):
    with tempfile.TemporaryDirectory() as temp_dir:
        config_path_good = Path(temp_dir) / f"{config_name}.yaml"
        config_path_good.write_text(file_content)
    
        with hydra.initialize_config_dir(version_base=None, config_dir=temp_dir):
            cfg = hydra.compose(config_name=config_name)

            try:
                instantiated = hydra.utils.instantiate(
                    cfg, 
                    target_wrapper=CHECKER,
                )
                print(instantiated)
                if should_fail:
                    raise RuntimeError(f"`{config_name}` was supposed to fail but didn't: {e}")

            except hydra.errors.InstantiationException as e:
                raise RuntimeError(f"`{config_name}` was not supposed to fail: {e}")


def test_compatible_config():
    doer(config_name_good, config_example_good, should_fail=False)
        

def test_incompatible_config():
    doer(config_name_error, config_example_error, should_fail=True)


if __name__ == "__main__":
    test_compatible_config()
    test_incompatible_config()

Related Issues and PRs

(Is this PR part of a group of changes? Link the other relevant PRs and Issues here. Use https://help.github.com/en/articles/closing-issues-using-keywords for help on GitHub syntax)

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 2, 2024
@JulesGM
Copy link
Author

JulesGM commented Apr 2, 2024

@leycec you might find this interesting.

@odelalleau
Copy link
Collaborator

Makes sense to me. But it seems to me (from a distance -- I'm not super familiar with the intantiation code) that it could be implemented as a _target_wrapper_ argument that could be either provided in the config or when calling instantiate(). If that's the case, it would better fit the current API (IMO).

@JulesGM
Copy link
Author

JulesGM commented Apr 2, 2024

@Jasha10 you might be interested in this.

@Jasha10
Copy link
Collaborator

Jasha10 commented Apr 2, 2024

@JulesGM are you familiar with the hydra-zen project, and specifically with https://mit-ll-responsible-ai.github.io/hydra-zen/generated/hydra_zen.instantiate.html? The hydra_zen.instantiate function takes a _target_wrapper_ argument that seems similar.

@leycec
Copy link

leycec commented Apr 4, 2024

Luv it. Thanks so much for pinging me on, @JulesGM. I have now learned many things. I have learned about the appropriately named OmegaConf, which I now realize I have been waiting for my entire life. I have also learned that @patrick-kidger of jaxtyping fame is still right about everything.

Interestingly, jaxtyping offers a similar runtime type-checker-agnostic @jaxtyping.jaxtyped(typechecker=beartype.beartype) API to that of the proposed hydra.utils.instantiate(cfg, target_wrapper=beartype.beartype) API:

# Import both the annotation and the `jaxtyped` decorator from `jaxtyping`
from jaxtyping import Array, Float, jaxtyped

# Use your favourite typechecker: usually one of the two lines below.
from typeguard import typechecked as typechecker
from beartype import beartype as typechecker

# Type-check a function
@jaxtyped(typechecker=typechecker)
def batch_outer_product(x: Float[Array, "b c1"],
                        y: Float[Array, "b c2"]
                      ) -> Float[Array, "b c1 c2"]:
    return x[:, :, None] * y[:, None, :]

So for so good. You're in good company, @JulesGM. But I've been wondering... can we eventually simplify and streamline the process of selecting competing runtime type-checkers or is literally every Python package ever going to now define its own non-orthogonal proprietary API for selecting competing runtime type-checkers? It's the latter, isn't it? I'm sighing. You can almost feel the hot fetid breath I'm exhaling all over your keyboard. 😮‍💨

If we accept the current status quo and do nothing,what will happen, guaranteed then user headaches explode combinatorially. Currently, users have to manually notify every Python package of their preferred runtime type-checker with an API unique to that package. Instead, users should be able to trivially, publicly, and globally notify every Python package of their preferred runtime type-checker all-at-once simultaneously with just a single Python statement. Instead, we're now facing the exact opposite scenario.

Introducing...

anytype: Utopia Never Seemed So Far Away

I'd make anytype. But I can barely make @beartype. The core conceit is simple, though: it's QtPy, but for runtime type-checking. In a nutshell, anytype would:

  • Be a thin high-level abstraction layer over lower-level runtime type-checkers.
  • Only support features commonly supported by all runtime type-checkers.
  • Provide a uniform API for performing runtime type-checking.
  • Provide a configuration API for selecting between supported runtime type-checkers.
  • Internally delegate all runtime type-checking to the currently configured type-checker.

Downstream packages like Hydra and jaxtyping would then simply import anytype and use that high-level API without concern for the currently configured type-checker. For example, the @anytype.anytype decorator would be a shim for either the @beartype.beartype or comparable @typeguard.typechecked decorators: e.g.,

from anytype import anytype  # <-- type-checker-agnostic decorator for the win
from dataclasses import dataclass

@anytype  # <-- you just won the internet. congrats
@dataclass
class ExampleDataclass:
    name: str
    unit_price: float
    quantity_on_hand: int = 0

inst = ExampleDataclass("a name", 123.32, 23)
# Is ok

inst = ExampleDataclass(123, None, 23)
# Breaks

Downstream users and apps would then also import and configure anytype to use their preferred runtime type-checker. For example, the public anytype.configure() function might accept a checker parameter whose value is an AnytypeChecker enumeration member identifying the desired runtime type-checker: e.g.,

# Probably in the "{your_app}.__init__" subpackage to ensure this happens early:
from anytype import AnytypeChecker, configure
configure(checker=AnytypeChecker.beartype)  # <-- feel the hot claws as they rake your codebase

Of course, nobody has time, energy, inclination, money, or sufficient goodwill towards humanity. Nobody will ever do that. I am nobody, too. Still, a utopian dreamer dreams. If not here on GitHub, then where? Nordic Gods above, where!?


i am very tired and must now lie on a door

Would You Like to Know More About Managed Democracy and anytype?

If so, hit this feature request at the @beartype issue tracker. Let's collaborate and listen. Let someone else do this for all of us.

@rsokl
Copy link
Contributor

rsokl commented Apr 5, 2024

Heh this is crazy timing! I just added this feature in hydra-zen literally four days ago. It is extremely powerful; e.g. adding a pydantic parsing layer to a hydra(-zen) app (and adding beartype support would be straightforward as well!)

that it could be implemented as a target_wrapper argument that could be either provided in the config or when calling instantiate()

I think it should live in instantiate. Otherwise this grows the complexity of configs in a substantial way, and probably without much upside

With this feature, it would be very nice to be able to completely disable Hydra/omegaconf's type-checking features and just let this validation layer be responsible for everything.

@rsokl
Copy link
Contributor

rsokl commented Apr 5, 2024

Btw @JulesGM you should check out hydra-zen sometime 😄 It is really nice for ML workflows and eliminates a lot of manual labor and boilerplate from using Hydra

@odelalleau
Copy link
Collaborator

that it could be implemented as a target_wrapper argument that could be either provided in the config or when calling instantiate()

I think it should live in instantiate. Otherwise this grows the complexity of configs in a substantial way, and probably without much upside.

Just to be clear, my suggestion would provide the additional flexibility of specifying the wrapper in the config but this wouldn't be mandatory: you could still provide it as a kwarg when calling instantiate(), just like you can override any other config setting.

@rsokl
Copy link
Contributor

rsokl commented Apr 5, 2024

I understand, I just think that adding this to the config abstraction gives a degree of complexity that leads users down design paths that they ought not take. You also get weird things when you have nested configs that each have a target wrapper...People will start asking for non-recursive wrappers, etc

@JulesGM
Copy link
Author

JulesGM commented Apr 7, 2024

We are planning to use the ConfigStore approach to do schema and most of the type checking, but it feels very annoying (and increase the maintenance work as the parameters change) to have to write down dataclasses for things that have a _target_ with valid type annotations.

We feel like having a basic version of target_wrapper in hydra-core would be a pretty good place to start with to cover that, without having to go all the way with the inclusion of hydra-zen. The debate of whether this can be included in the configs can maybe be done further down the road as usage evolves?

@rsokl
Copy link
Contributor

rsokl commented Apr 17, 2024

without having to go all the way with the inclusion of hydra-zen

fwiw hydra-zen's only installation dependencies are hydra-core and typing-extensions, so it is very lightweight. And all you would need to do is use hydra_zen.instantiate in place of hydra.utils.instantiate -- that's it; you don't need to change any of your other Hydra code.

I'm not opposed at all to having this be added to hydra-core, just want to point out that this is also a trivially easy path forward. I just need to officially release v0.13.0, which I can probably do this week.

@Jasha10
Copy link
Collaborator

Jasha10 commented Apr 17, 2024

all you would need to do is use hydra_zen.instantiate in place of hydra.utils.instantiate

@JulesGM would that work for you?

@rsokl
Copy link
Contributor

rsokl commented May 1, 2024

hydra-zen v0.13.0 was released yesterday: https://mit-ll-responsible-ai.github.io/hydra-zen/changes.html#v0-13-0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants