Skip to content

Idea: new prioritisation scheme for pyconvert rules #661

@cjdoris

Description

@cjdoris

Current status

Currently a pyconvert rule consists of:

  • the source python type t, which the rule can convert from;
  • the target julia type, T, which the rule can convert to;
  • the priority of the rule; and
  • the function func implementing the rule.

When pyconvert(R, x) runs, it first filters the list of rules according to t and T (roughly pyisinstance(x, t) and typeintersect(R, T) != Union{}). The rules are then ordered first by priority, then by the specificity of t, then by the order the rules were defined.

The priorities are:

  • jlwrap: for wrapped julia objects by just unwrapping them;
  • array: for array-like objects (buffers, numpy arrays, ...);
  • canonical: for the canonical conversion for a type, e.g. float to Float64;
  • normal: for all other reasonable conversions.

The priorities are a bit of a hack, to work around the fact that ordering by specificity of t isn't quite right. For example, we always want to convert julia objects by unwrapping them first, so we need their rules to come first, even if the object also happens to be a Mapping and we are converting to Dict, we don't want to use the generic Mapping to Dict rule. And if the object is array-like, we want to convert by getting at the underlying memory instead of using the generic Sequence to Array rule.

The proposal

So my proposal is to remove priority and add:

  • the scope julia type S, which must be a supertype of T.

We further filter rules by S (R <: S, except if R isa Union then just one component has to match).

For ordering rules, we no longer order by priority, just by specificity of t and insertion order.

We also ignore type(x).mro() and only use strict specificity (issubclass(t1, t2)). That is, rules form a DAG with this partial ordering, which we flatten using insertion order to break ties.

You are only allowed to create rules where you "own" either t or S.

Discussion

This means you can only have S=Any if you own t. Can think of S=Any as being canonical priority or higher. PythonCall will continue to "own" the Python standard library, and most rules in PythonCall will have S=Any. The exception is for some things currently in the normal priority. For example we convert None to Nothing canonically but can also go to Missing. In the new system, the rules will have T=Nothing, S=Any and T=S=Missing, so you generically get Nothing but can get Missing if you ask for it. Similarly tuple canonically converts to Tuple but can also go to Array, the rules for which will become T=Tuple, S=Any and T=Array S=AbstractArray, so you will get an Array if you specify Array or AbstractArray.

If you don't own t, then you must own S. This lets you define e.g. a generic conversion rule for list to some new MyArray you invented. But you can only use the rule if you specify pyconvert(MyArray, x). Doing pyconvert(AbstractArray, x) or pyconvert(Any, x) will not use the rule. Hence we have well-scoped rules, avoid piracy, avoid cases where the conversion rules applied depend on which packages are loaded.

In particular, since passing Python objects to Julia in JuliaCall normally uses pyconvert(Any, x), only rules created by the "owner" of pytype(x) are applied. This makes passing Python values around predictable - some third-party package defining their list to MyArray rule will not affect how list gets passed to Julia by default.

By ignoring the MRO of the passed Python object, we ignore issues with the arbitrary ordering of types in the MRO. Our proposal guarantees that if you have an applicable rule with t=t1 then it can only be overridden by a later rule with t=t2 if t2 is a strict subclass of t1. Currently it can be overridden if t2 is completely unrelated but just happens to be higher up the MRO.

I think this scheme is sufficiently general to encode rules in the priority order users will want. When adding a rule, you must own t or S. If you own t then it will be more specific than anything else anyway. If you own S then you have to opt in to using the rule like pyconvert(S, x) in which case only your rules pass the filter. Or if you do pyconvert(Union{Foo,S}, x) then whether you get a Foo or an S depends on insertion order, but if Foo came from a parent package, then you should rightly get a Foo, which will be the case because it's rules were defined first. So basically insertion order prevents overwriting rules from earlier-loaded packages. This does mean the output type can be import-order-dependent, but only where there are unions, and this case is inherently ambiguous so we have to pick something arbitrarily anyway.

What about jlwrap and array?

We will have rules like t=juliacall.AnyValue, T=S=Any and t=<buffer>, T=PyArray, S=Any. Provided we define these first, they will be applied first unless a rule for a more specific t is defined.

Worked examples

Here are some rules for t=list:

  • T=PyArray, S=Any: canonical conversion to a PyArray, used if you specify converting to PyArray or AbstractArray or Any.
  • T=Array, S=DenseArray: used if you specify converting to Array or DenseArray, but AbstractArray gets you a PyArray.
  • T=Set, S=AbstractSet: used if you specify converting to Set or AbstractSet.
  • T=Tuple, S=Tuple: used if you specify converting to Tuple.

Some rules for t=None:

  • T=Nothing, S=Any: canonical
  • T=Missing, S=Missing: specify Missing (or Union{Missing, Foo})

Some rules for t=float:

  • T=Float64, S=Any: canonical
  • T=Float32, S=Float32: specify another float type
  • T=Number, S=Number: specify another non-float number type such as Integer
  • T=Missing, S=Missing (for NaN)
  • T=Nothing, S=Nothing (for NaN)

Some examples for converting a float:

  • to Any: only rule 1 applies (filtering on S)
  • to Float32: rules 2 and 3 apply (rule 1 ignored due to T, others due to S) so rule 2 is tried first.
  • to Integer: only rule 3 applies (filtering on S).
  • to Union{Integer, Missing}: rules 3 and 4 apply (filtering on S) so rule 3 is tried first.

Say some package defines myfloat <: float and adds a rule for it:

  • T=BigFloat, S=Any

Examples converting a myfloat:

  • to Any: rule 1 and the new rule apply. New rule more specific in t, so use new rule.
  • to AbstractFloat or Number: pretty much the same.
  • to BigFloat: only the new rule applies.
  • to Float32: new rule doesn't apply, so as above rule 2 is first.

Pros and cons

Pros:

  • Strict ownership of rules - avoids piracy.
  • Return type of pyconvert more predictable.
  • Clearer semantics/rule ordering than currently.
  • The number of applicable rules is massively cut down by filtering on S (usually to 1).
  • Where there are more than 1, ordering by t should then be mostly unique. Using insertion order is mainly to disambiguate unions, plus special rules like for buffers and jlwrap.
  • Easy to "opt in" to a conversion rule by being more specific about what you are converting to (see the MyArray example above).

Cons:

  • People might still pirate (i.e. make rules with S=Any for which they don't own t).
  • pyconvert(Union{AbstractArray,MyArray}, x) does not do what you might expect (use the generic AbstractArray rules plus the special MyArray rule) because the union gets normalised down to AbstractArray first, so the MyArray rule is never considered. You need to take more specific unions like Union{PyArray,Array,MyArray} which is annoying. We could make a helper function to create such a union for you.

Rejected ideas

  • I considered ordering also based on specificity of T (more specific wins) and S (less specific, i.e. more canonical, wins).

    If we use S then in the float example we prefer the generic Number rule over the specific Float32 rule. But if we use T then a juliacall.DictValue has rules t=juliacall.AnyValue, T=Any, S=Any and t=Mapping, T=Dict, S=Any and the latter rule will be preferred, which isn't what we want.

    The current proposal only alters ordering a little - namely by removing priority and ignoring MRO - and relying on more aggressive filtering from S.

  • We could allow add_rule to not always add at the end of the list. It could specify one or more existing rules that it must appear above. Rejected because as explained in the discussion section, the existing proposal is sufficient for any sensible rule definitions.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions