-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting the domain/support of objects, and domain type hierarchy #146
Comments
Can you give a concrete example? The ones I can think of (eg matrix functions with |
Hmm, I don't quite get one you could extend a function defined in |
Sorry, I should have said "without a hard dependency in it". I meant via package extensions (falling back to a direct dependency or Requires on Julia <v1.9). |
I was thinking about functions like For functions like |
I think you need to include the type in For example, the domain of domain(::typeof(sqrt), ::T) where T<: Real = HalfLine{T}()
domain(::typeof(sqrt), ::T) where T<: Complex = ℂ / NegativeOpenHalfLine{T}() |
How about
I had that in my initial draft of this proposal, but then I thought that maybe giving the tpe is too much and too little - for distributions and measures, we don't need it, they have a well-defined and unique domain/support. For So we could, initially, limit ourselves to cases that are clear-cut? Or we could do |
The analogue of "domain" for arrays is just called |
I'm not against just |
I think let users use namespaces (DomainSets.domain etc) if they need to avoid confusion. Users also use size, axes, length etc. as names for variables so this isn’t a good argument I think we could move Distributions.support and domain to a simple package (DomainsCore or something) that just defines these two functions. |
Ownership of functions is certainly an issue here. Is there any consensus about using package extensions rather than packages like DomainSetsCore or JuliaApproximationCore? I'm starting to think we could meaningfully do with both (and that |
In general I like that approach, having lightweight API-only packages, I think it enables us to take full advantage of multiple dispatch across packages. In this specific case though, I'm not sure - Distributions would then actually have to have a hard dependency on DomainSets - it couldn't implement I guess Distributions wouldn't accept DomainSets as a dependency, because it pulls in StaticArrays, which Distributions has avoided so far.
I agree in general, but with major packages like Distributions I think one should try to avoid name clashes. |
are you worried about type piracy? It’s a pretty standard idiom for packages to implement functionality defined in a Core package. So donain(d) would just error and say “load DomainSets.jl” if it’s not loaded |
Yes, that wouldn't be a problem, since there would be a clear contract between "DomainSetsCore" and DomainSets.
But it wouldn't always, from the user point of view: not if DomainSets is loaded implicitly via some other dependency path. And if DomainSets should vanish from that (possibly deep) dependency path, then the code that worked before would break, even if upper version bounds are used correctly. I know we have this in some corners of the ecosystem, but I don't think it's a good pattern. We do require code to state all of it's dependencies explicitly in Julia (unlike Python), and I think that has been very beneficial to the stability of the ecosystem. |
How about this (if the Distributions maintainers are Ok with it):
CC @devmotion, I think we should ask you for some input here as well. |
It seems Distributions.jl has its own interval type called Somewhere in the near future possibly both IntervalSets and DomainSets will depend on DomainSetsCore, which actually defines a function called |
Thanks @daanhb , that's great! Could DomainSetsCore also provide a function This got me thinking, though - just a proposal, I don't mean to be presumptuous: if we change a few things around anyway, should we maybe start dropping the term "domain" when it comes to the type of the sets themselves, in the future? And use terms like "domain" and "support" only in the name of functions that get such sets for objects/functions? At the moment, we have |
I like the name Note I added support to Infinite arrays in Julia see https://github.com/JuliaLang/julia/blob/master/test/testhelpers/InfiniteArrays.jl So I'm all for allowing infinite sets but we just need to be more careful and possibly add a |
Hmm, I did not see a discussion about names coming up here. Points, to name just one example, are not infinite sets. And if I may short-circuit the discussion, inheritance from The word domain is used in many meanings, only one of which is "the domain of a function". The word is in itself also associated with sets. Some take it to mean connected open subsets, though even then people also talk about closed domains. In other contexts one might solve a PDE "on a domain", in which case it could be nearly anything. In complex analysis there are regions (e.g. there is a ComplexRegions package). And so on. There is no single good word. Back to the issue at hand: I think that while DomainSets could be used to describe the domain or the support of a function (which to me is a subset of its domain on which it is nonzero), it should not define the concept as the package itself is not about functions. It is a coincidence that a function with the same name "domain" is defined in DomainSetsCore. I see no conceptual inconsistency in extending the meaning of that function to mean "the domain of a function" when applied to function objects. A function like "support" on the other hand might live in JuliaApproximationCore :-) |
My feeling is that |
Thanks for the clarifications @daanhb !
No, sorry - maybe I should have opened a separate issue. In any case, I certainly didn't mean to seem pushy here. I was just thinking about how we can express things like domains and supports better in packages like MeasureBase/MeasureTheory (they currently use some custom types). And about possible proposals in regard to Distributions to get more consistency in this area. IntervalSets + DomainSets seemed like a good fit. In any case, from there on I just thinking about "is a" vs. "role in relationship to" in regard to the terms "domain" vs. "set" in the type hierarchy. And I saw that the Readme begins with "DomainSets.jl is a package designed to represent simple infinite sets". So I thought maybe the term "domain" in the type hierarchy had historical reasons.
Sure, generic programming in Julia doesn't strictly need supertypes. Taking full advantage of multiple dispatch, though, in my experience, gets much easier with common supertypes (where would we be without Again, I certainly didn't mean to push too far here.
In other contexts (InverseFunction) we've been talking about the need for more central APIs to query function properties, like maybe a "FunctionTraits" package, for functions like |
Thanks for the feedback @devmotion ! Question: Are there currently any plans to extend |
Good point. We could suggest defining that more closely in the language docs? :-) So far I haven't seen anything to "exotic" with
❤️ |
It would be great to see some common ground here! Apart from interfaces there is a lot of duplicated effort, including in DomainSets, especially for affine maps. I was hoping to be able to move everything related to functions and maps into a separate package (#92), or to use existing packages. |
Sure, dispatching on a common supertype is beneficial, as is being able to reuse code written for abstract types. I just think both of these advantages are limited in the context of domains, and we've actively been trying to get away from having a common type. (It would be a funny way to settle the |
I'm so with you, here! I wish we have a |
Do we really have to, though? Like @dlfivefifty says, we even have infinite arrays now - and I'm just thinking about how far There are other cases where we don't have a common supertype, for good reason. Common supertypes for iterable objects for example - and we shouldn't have one, iterability isn't always the primary property of an object. So So I fully agree: Just because objects implement a common API that doesn't automatically make them candidates for a common type hierarchy. We don't have multiple inheritance or similar in the language and should choose super-types very carefully. But, to take full advantage of multiple dispatch we should build common type hierarchies where they also make sense semantically, and where we don't run into "but it's just a much an A as a B" situations. In this case I would argue that the primary property of a set, and the domains and things we talk about here, is their, well, "set-likeness". :-) That's why, IMHO, building down from |
The issue is that looking at https://github.com/JuliaLang/julia/blob/master/base/abstractset.jl there's basically no code that will still work for infinite sets 😅 Here's an experiment to detect the interface: julia> struct MySet <: AbstractSet{Int} end
julia> Set(MySet())
ERROR: MethodError: no method matching length(::MySet)
julia> Base.length(::MySet) = 1
julia> Set(MySet())
ERROR: MethodError: no method matching iterate(::MySet)
julia> Base.iterate(::MySet, k...) = iterate((1,), k...)
julia> Set(MySet())
Set{Int64} with 1 element:
1 So the interface for
So the issue for domains is that we lose (2) and potentially (4) no longer makes sense. We could make a major PR into Julia to add overloading say We could take the ArrayLayouts.jl approach and have a parallel implementation of everything. I'm not sure I'd recommend it either. |
Sure, but abstract types don't necessarily need to provide generic implementations/methods for the functions associated with them, right? What there should be, is contracts/semantics of the interface (and yes, we should write those down explicitly way more often :-) ). If you build an And sure, infinite sets won't support I would say it's very similar for sets: Finite sets have a length and support iteration, infinite sets don't. But what they all share, what makes them sets, is the ability to check if they contain an element in a (somewhat) efficient manner: And of course we'll want to support operations like (I should actually be pretty easy to add a generic (We could even define iteration for certain infinite sets, for hypercubes for example, we could use quasirandom sequences like Sobol to implement infinite iteration. That could actually be useful in practice.) |
You raise good points but one can argue the other way here. It's great that the community can agree on using |
Yes, that's a good example - for an Image (though the current approach is certainly practical) one wouldn't necessarily say that it's primary property is that is an array of pixels. An alternative design could, for example, be images as subtypes of I would argue that the situation is different here though - what we really want here is sets, in the mathematical sense, first and foremost. An interval is a set (hence "IntervalSets"), so why not make is a subtype of
True, though maybe they should, in some cases. But it depends, if it's about 3D geometries, one may be interested in the surface or the volume, so But looking at what InvervalsSets and DomainSets currently do, I would say that the package names, documentation and use make it clear that this is really about sets directly, and that |
Valid point, the types in DomainSets could be. |
Care to elaborate on that? |
I think that's not a problem. To me, the main purpose of abstract type is not maximum code reuse, and certainly not at the highest level. Sure, it's good have generic implementations for many functionalities directly for the abstract type, but it's not a must. And I would definitely suggest that we propose a more generic implementation for at least
Personally I would say it doesn't kill the idea - it's a general problem we have in many areas after all. I would say defaulting to
True, though maybe that assumption can change. It has worked out well with arrays, I'd say that's encouraging.
Yes, that's indeed a bit more tricky - is touches on the old "parameter type vs. generated type and what should eltype be debate" for
One very easy example is distributions/measures. Many (log-)pdf/density methods first check if the point with within the support. But in complex systems, one may want to pass that support on, so having a common type is good (so function arguments don't always have to be untyped - types carry meaning, also to the reader of the code).
Yes, exactly - just like
Sure, and I agree we should turn this every possible way and check very carefully if make things |
@hyrodium and @aplavin , can I pull in you here for an "IntervalSets" perspective? The thread above is kinda long, but the gist is (apart from the question where query functions like |
Thanks for engaging the discussion :-)
Alright, let's entertain the notion that a common implementation could be provided.
Let's say I agree with `eltype(d::Domain{T}) == T' for pragmatic reasons
Has it? (genuine question)
I can write function my_personal_in(x, this_argument_should_be_a_domain)
checkdomain(this_argument_should_be_a_domain)
in(x, this_argument_should_be_a_domain)
end and catch user errors and make my intention specific. I can still specialize the function for concrete domains that have a better implementation. Dispatch allows me to write another function like this: function my_personal_in(x, this_argument_should_not_be_a_domain)
# what is this function about?
end but I can code my way around this. Using dispatch in this way arguably makes my code less readable, so I'd rather have less of this in packages than more.
Yet I had missed that by the way. The |
I haven't done much with InfiniteArrays.jl, but Chris Rackaukas told me recently the the SciML people are definitely making use of it now.
Yes, but I think
I agree.
Yes - I don't know if I'm happy about that ... but I don't think should keep us from using I would say, the "primary" thing that makes them what they are is:
Those are also the primary functions that always have to be implemented for subtypes, so I think that fits together. |
@oschulz I don't see how stuff like intervals being And if such a function is needed anyway, seems more straightforward to tell users to use |
Sure, I have no doubt it is useful, I also use it. We're wondering here about the necessity of inheritance from |
Ah - I would say passing it through a lot of generic code that doesn't require arrays to be finite sized, but that does require an If we make get rid of all type hierarchies, we lose the power of multiple dispatch, resp. have to write everything trait-style which will be a lot less readable and a lot more complex too, in the long run. And people will constantly have to define all kinds of |
Right, perhaps this wasn't the best of examples. I'll try to think of something more constructive, such as a generic fallback for |
Safe flight! :-) When you're on the ground again, maybe an argument in the other direction: I think we went through at least a few advantages of subtyping |
This is the list of methods taking AbstractSet in Base:
I think the decision to subtype shouldn't be based on the name of the type (it has "Set" in there and intervals are sets in maths), but on the semantic that follows from methods defined on such a type. Similar with |
Well, I would argue again the the criterion should not be if current generic methods for
I agree, it should just depend on the name, the actualy primary nature of the type should be the deciding factor, I think. And the semantics have to fit, but I don't think those are necessarily defined by what the (quite few) generic methods for
Yes - and also here, quite a few of the generic method implementations in You're right, with |
Mathematically speaking, everything must be a set in ZFC, so I think the term "set" in |
Are you sure you are talking about methods, not functions? I looked at a few of those, they don't seem feasible to generalize to intervals (ie make
But, if subtyping AbstractSet, this should stay. As for functions themselves, they are defined for way more types that |
But where does it say that? |
Uh, I hadn't seen that, that is ugly. Hm, in Base it says
So However, it is defined like that in Base, and |
Which others (apart from the very ugly
Which ones? The interface of |
Oh, lots of methods are a bad fit even if we just consider intervals! Sorry I'm always using intervals as a concrete example to keep in mind :) # just a few methods defined on abstractsets
filter(pred, s::AbstractSet) = mapfilter(pred, push!, s, emptymutable(s))
function mapfilter(pred, f, itr, res)
for x in itr
pred(x) && f(res, x)
end
res
end
setdiff(s::AbstractSet, itrs...) = setdiff!(copymutable(s), itrs...)
function hash(s::AbstractSet, h::UInt)
hv = hashs_seed
for x in s
hv ⊻= hash(x)
end
hash(hv, h)
end Better to ask if there are any meaningful methods that are a good fit.
It's better to have interval/rectangle/... And again, I think there are very few, if any, actual advantages to this subtyping. Do you have a specific concrete example of code that becomes simpler or more generic with |
I'm not sure, and this is why I wrote "I think". |
If you guys think
I would still argue that the term "set" is better than "domain". While "domain" is sometimes understood to represent a set itself, I would say that way more often "domain" is used to state a relationship between a set and another mathematical object, or mathematical structure (e.g. "integral domain") that combines sets with operations. "mathematical set", however, is very clear. But, while I think something like the proposal above would work well, I would still ask the question if |
Ah, sorry - yes, though I think less a fundamental problem with the language but more a lack of documentation in some places. :-) On the other hand, I think we can take it as an opportunity - I'd say AbstractSet is fairly open, and we could propose additions to clarify what it means. And we could even propose deprecating |
I did a bit of digging - the idea to introduce using isless for sets seems to trace back to the year 2012 - maybe one might call it a "primordial sin"? It looks like the unicode operators for subset didn't exist back then. |
I definitely like the proposal for a special supertype more. Is that basically the DomainSetsCore proposal, just with a different name?
There are other common types that can naturally represent "sets of elements". Even Base provides lots of set operations for arrays. And there also are UniqueArrays.jl and other packages with structs that are fundamentally "sets". That's why I'm saying there is little benefit in using |
Please do - if it doesn't make sense for intervals than it certainly won't make sense for more complex sets! :-) I don't think When it comes to what the generic implementations for them do, I'd point to |
Are they really though? A set is has by definition not sequence implied, but arrays are required not to reshuffle their elements. I think this is a fundamental difference. |
"Sorted array" type is basically a variant of the "set" datastructure implementation. Aside from arrays, there are also Dictionaries.jl with their own "set" type. And surely lots more! |
Not quite, I think. The DomainSetsCore proposal looks more like trait-centric interface. I would propose something more type-centric. I think sets are such a fundamental concept (like arrays), that going with (and encouraging people to subscribe to) a type hierarchy would work very well. |
I can see that you guys aren't happy with the |
We currently don't have a common function to query the domain or support of something (a distribution, measure, ML model, etc.) yet, right? We could add something like
getdomain(something)
andgetsupport(some_function)
For functions, the situation is more tricky since their domain/support will often not only depend on the argument type, but also on the actual shape of the argument (array sizes, etc.). We could do
getdomain(f, shp::ValueShapes.AbstractValueShape)
for the kind of stuff ValueShapes currently covers (could extend that).Distributions currently offers
support
, but it's limited to univariate distributions. Havinggetdomain
/getsupport
in DomainSets would allow packages to add support for complex types of domains in a clean fashion without depending on DomainSets.Would a
getdomain
/getsupport
API be welcome in DomainSets (I could do a PR)?(Edit: Also some discussion about domain/set type hierarchies further down.)
The text was updated successfully, but these errors were encountered: