-
Notifications
You must be signed in to change notification settings - Fork 77
Description
Current status
Currently a pyconvert rule consists of:
- the source python type
t
, which the rule can convert from; - the target julia type,
T
, which the rule can convert to; - the
priority
of the rule; and - the function
func
implementing the rule.
When pyconvert(R, x)
runs, it first filters the list of rules according to t
and T
(roughly pyisinstance(x, t)
and typeintersect(R, T) != Union{}
). The rules are then ordered first by priority
, then by the specificity of t
, then by the order the rules were defined.
The priorities are:
jlwrap
: for wrapped julia objects by just unwrapping them;array
: for array-like objects (buffers, numpy arrays, ...);canonical
: for the canonical conversion for a type, e.g.float
toFloat64
;normal
: for all other reasonable conversions.
The priorities are a bit of a hack, to work around the fact that ordering by specificity of t
isn't quite right. For example, we always want to convert julia objects by unwrapping them first, so we need their rules to come first, even if the object also happens to be a Mapping
and we are converting to Dict
, we don't want to use the generic Mapping
to Dict
rule. And if the object is array-like, we want to convert by getting at the underlying memory instead of using the generic Sequence
to Array
rule.
The proposal
So my proposal is to remove priority
and add:
- the scope julia type
S
, which must be a supertype ofT
.
We further filter rules by S
(R <: S
, except if R isa Union
then just one component has to match).
For ordering rules, we no longer order by priority, just by specificity of t
and insertion order.
We also ignore type(x).mro()
and only use strict specificity (issubclass(t1, t2)
). That is, rules form a DAG with this partial ordering, which we flatten using insertion order to break ties.
You are only allowed to create rules where you "own" either t
or S
.
Discussion
This means you can only have S=Any
if you own t
. Can think of S=Any
as being canonical
priority or higher. PythonCall will continue to "own" the Python standard library, and most rules in PythonCall will have S=Any
. The exception is for some things currently in the normal
priority. For example we convert None
to Nothing
canonically but can also go to Missing
. In the new system, the rules will have T=Nothing, S=Any
and T=S=Missing
, so you generically get Nothing
but can get Missing
if you ask for it. Similarly tuple
canonically converts to Tuple
but can also go to Array
, the rules for which will become T=Tuple, S=Any
and T=Array S=AbstractArray
, so you will get an Array
if you specify Array
or AbstractArray
.
If you don't own t
, then you must own S
. This lets you define e.g. a generic conversion rule for list
to some new MyArray
you invented. But you can only use the rule if you specify pyconvert(MyArray, x)
. Doing pyconvert(AbstractArray, x)
or pyconvert(Any, x)
will not use the rule. Hence we have well-scoped rules, avoid piracy, avoid cases where the conversion rules applied depend on which packages are loaded.
In particular, since passing Python objects to Julia in JuliaCall normally uses pyconvert(Any, x)
, only rules created by the "owner" of pytype(x)
are applied. This makes passing Python values around predictable - some third-party package defining their list
to MyArray
rule will not affect how list
gets passed to Julia by default.
By ignoring the MRO of the passed Python object, we ignore issues with the arbitrary ordering of types in the MRO. Our proposal guarantees that if you have an applicable rule with t=t1
then it can only be overridden by a later rule with t=t2
if t2
is a strict subclass of t1
. Currently it can be overridden if t2
is completely unrelated but just happens to be higher up the MRO.
I think this scheme is sufficiently general to encode rules in the priority order users will want. When adding a rule, you must own t
or S
. If you own t
then it will be more specific than anything else anyway. If you own S
then you have to opt in to using the rule like pyconvert(S, x)
in which case only your rules pass the filter. Or if you do pyconvert(Union{Foo,S}, x)
then whether you get a Foo
or an S
depends on insertion order, but if Foo
came from a parent package, then you should rightly get a Foo
, which will be the case because it's rules were defined first. So basically insertion order prevents overwriting rules from earlier-loaded packages. This does mean the output type can be import-order-dependent, but only where there are unions, and this case is inherently ambiguous so we have to pick something arbitrarily anyway.
What about jlwrap
and array
?
We will have rules like t=juliacall.AnyValue, T=S=Any
and t=<buffer>, T=PyArray, S=Any
. Provided we define these first, they will be applied first unless a rule for a more specific t
is defined.
Worked examples
Here are some rules for t=list
:
T=PyArray, S=Any
: canonical conversion to aPyArray
, used if you specify converting toPyArray
orAbstractArray
orAny
.T=Array, S=DenseArray
: used if you specify converting toArray
orDenseArray
, butAbstractArray
gets you aPyArray
.T=Set, S=AbstractSet
: used if you specify converting toSet
orAbstractSet
.T=Tuple, S=Tuple
: used if you specify converting toTuple
.
Some rules for t=None
:
T=Nothing, S=Any
: canonicalT=Missing, S=Missing
: specifyMissing
(orUnion{Missing, Foo}
)
Some rules for t=float
:
T=Float64, S=Any
: canonicalT=Float32, S=Float32
: specify another float typeT=Number, S=Number
: specify another non-float number type such asInteger
T=Missing, S=Missing
(for NaN)T=Nothing, S=Nothing
(for NaN)
Some examples for converting a float
:
- to
Any
: only rule 1 applies (filtering onS
) - to
Float32
: rules 2 and 3 apply (rule 1 ignored due toT
, others due toS
) so rule 2 is tried first. - to
Integer
: only rule 3 applies (filtering onS
). - to
Union{Integer, Missing}
: rules 3 and 4 apply (filtering onS
) so rule 3 is tried first.
Say some package defines myfloat <: float
and adds a rule for it:
T=BigFloat, S=Any
Examples converting a myfloat
:
- to
Any
: rule 1 and the new rule apply. New rule more specific int
, so use new rule. - to
AbstractFloat
orNumber
: pretty much the same. - to
BigFloat
: only the new rule applies. - to
Float32
: new rule doesn't apply, so as above rule 2 is first.
Pros and cons
Pros:
- Strict ownership of rules - avoids piracy.
- Return type of
pyconvert
more predictable. - Clearer semantics/rule ordering than currently.
- The number of applicable rules is massively cut down by filtering on
S
(usually to 1). - Where there are more than 1, ordering by
t
should then be mostly unique. Using insertion order is mainly to disambiguate unions, plus special rules like for buffers and jlwrap. - Easy to "opt in" to a conversion rule by being more specific about what you are converting to (see the
MyArray
example above).
Cons:
- People might still pirate (i.e. make rules with
S=Any
for which they don't ownt
). pyconvert(Union{AbstractArray,MyArray}, x)
does not do what you might expect (use the genericAbstractArray
rules plus the specialMyArray
rule) because the union gets normalised down toAbstractArray
first, so theMyArray
rule is never considered. You need to take more specific unions likeUnion{PyArray,Array,MyArray}
which is annoying. We could make a helper function to create such a union for you.
Rejected ideas
-
I considered ordering also based on specificity of
T
(more specific wins) andS
(less specific, i.e. more canonical, wins).If we use
S
then in thefloat
example we prefer the genericNumber
rule over the specificFloat32
rule. But if we useT
then ajuliacall.DictValue
has rulest=juliacall.AnyValue, T=Any, S=Any
andt=Mapping, T=Dict, S=Any
and the latter rule will be preferred, which isn't what we want.The current proposal only alters ordering a little - namely by removing priority and ignoring MRO - and relying on more aggressive filtering from
S
. -
We could allow
add_rule
to not always add at the end of the list. It could specify one or more existing rules that it must appear above. Rejected because as explained in the discussion section, the existing proposal is sufficient for any sensible rule definitions.