Refactoring of codebase #35

juliohm · 2020-04-12T14:10:19Z

Dear all,

In this issue I would like to discuss a refactoring of LearnBase.jl to accommodate more general problems under transfer learning settings. Before I can do this, I would like to get your feedback on a few minor changes. These changes should facilitate a holistic view of the interface, and should help shape the workflow that developers are expected to follow (see #28).

Below are a few suggestions of improvement that I would like to consider.

Suggestions of improvement

Split the main LearnBase.jl file into smaller source files with more specific concepts. For example, I'd like to review the Cost interface in a separate file called costs.jl. Similarly, we could move the data orientation interface to a separate file orientation.jl and include these two files in LearnBase.jl.
Can we get rid of all exports in the module? I understand that this module is intended for use by developers who would import LearnBase; const LB = LearnBase in their code. Exporting all the names in LearnBase.jl can lead to problems downstream like the fact that LossFunctions.jl was not exporting the abstract SupervisedLoss type, and then users of LossFunctions.jl would also need to import LearnBase.jl just to get access to the name. My suggestion here is to define the interface without exports. And then each package in JuliaML can export the relevant concepts.
The interface for learning models is currently spread over various different Julia ecosystems. In most cases, there are two functions that developers need to implement (e.g. fit/predict, model/update, fit/transform). I would like to do a literature review on the existing approaches, and generalize this to transfer learning settings. This generalization shouldn't force users to subtype their models from some Model type. A traits-based interface is ideal for developers who want to plug their models after the fact, and developers interested in fitting entire pipelines (e.g. AutoMLPipeline.jl).

I would like to start addressing (1) and (2) in the following weeks. In order to address (3) I need more time to investigate and brainstorm a more general interface.

The text was updated successfully, but these errors were encountered:

juliohm · 2020-04-15T13:44:02Z

Issues (1) and (2) have been addressed. The next step in the roadmap is a literature review on existing interfaces for learning models.

CarloLucibello · 2020-04-23T09:18:20Z

Here @oxinabox outlines the current problems with JuliaML ecosystem, I'll paste them below for convenience:

Broken up into many subpackages, because package manager/loading used to be really slow
A lot of use of types for things, because no constant folding
A lot of effort to make using kwargs not required because of them not being specialized on.
Doesn't support Tables.jl (because that would not be created for several more years.)
Excessive use of @inferred in tests because that seems a good idea at the time but now I think can be more chill.

CarloLucibello · 2020-04-23T09:26:47Z

Following those suggestions, I would ditch most of the types and type hierarchies in this repo, in order to improve interoperability with the ecosystem. E.g.

We don't need an ObsDim type, we can work fine with an obsdims keyword argument
Also the AggMode module can go, in favour of keyword arguments and closures

function mse_loss(ŷ, y; agg=mean)
    return agg((ŷ .- y).^2)     
end

mse_loss(ŷ, y)
mse_loss(ŷ, y, agg=sum)
mse_loss(ŷ, y, agg=x->sum(x, dims=2))  # partial reduction
mse_loss(ŷ, y, agg=x->mean(w .* x))  # weighted mean
mse_loss(ŷ, y, agg=identity)   # no aggregation. We could also support `nothing` here

juliohm · 2020-04-23T12:38:20Z

I really love this idea @CarloLucibello , we definitely need a new minor release with these improvements in the near future. I am finishing a cycle now with LearnBase.jl + LossFunctions.jl, but we should definitely come back to this issue here and simplify the code even further with dims and agg arguments 💯

My only divergence from the example above is the lack of types for losses. I think we should find a design in between where the type L2DistLoss is still available with a set of traits attached to it. Similar to the Distances.jl interface:

value(L2DistLoss(), yhat, y, agg=mean)

It is important to have explicit types for the losses because then we have flexibility to manipulate the concept.

joshday · 2020-04-27T21:34:48Z

@juliohm the latest LearnBase release breaks things downstream (PenaltyFunctions). Where did prox and similar functions go?

juliohm · 2020-04-27T22:00:50Z

@joshday, sorry, I remember prox and maybe prox! were not documented in LearnBase.jl. After doing a grep in packages inside JuliaML I couldn't find use cases. Could you please explain what they mean? I can submit a PR with an appropriate docstring reintroducing the names, and tag a patch release if you want.

joshday · 2020-04-27T22:16:03Z

It's the proximal mapping/operator (https://en.wikipedia.org/wiki/Proximal_operator)

Both prox and prox! are used in PenaltyFunctions.

No need for a PR, but could you add them back and make a new release?

juliohm · 2020-04-27T22:27:39Z

Of course. Will do it now 👍

AriMKatz · 2020-05-08T12:37:27Z

Where do you see the MLJ and flux ecosystems in all this ?

juliohm · 2020-05-08T13:06:30Z

I cannot speak for everyone in JuliaML, but I can share my personal view on it. Hopefully at some point in the near future we will be able to lay out a foundational API that is free from specific academic or enterprise organizations, and that is organically evolved by the Julia community as a whole. We have many efforts spread around Flux, MLJ, JuliaStats, and so on, and I hope they will converge at some point to a carefully designed Julian API that all these projects can share. Some projects like MLJ are great to get things moving, and are also great as a playground to try out design ideas. However, I do feel that MLJ for instance is overusing macros, is not taking full advantage of Julia generics when it follows the simplistic fit/predict approach, and is overly complex for users with a beginner to moderate background in ML. My personal opinion is that we need to brainstorm a simple yet expressive API that does not compromise the experience of new Julia users and that can also handle very advanced usages. It is a challenging goal, but I think that is what the Julia community is about, pushing the boundaries of things.

Stay tuned, I should come back with some updates on this effort in the following weeks hopefully.

AriMKatz · 2020-05-08T15:49:46Z

Interesting. Looking forward to hearing more.

juliohm added the enhancement label Apr 12, 2020

juliohm assigned oxinabox and joshday Apr 12, 2020

oxinabox removed their assignment Apr 13, 2020

This was referenced Apr 13, 2020

Reorganize codebase JuliaML/LossFunctions.jl#130

Merged

Refactor codebase #36

Merged

juliohm mentioned this issue Apr 15, 2020

Remove dependencies: StatsBase and Distributions #27

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring of codebase #35

Refactoring of codebase #35

juliohm commented Apr 12, 2020

juliohm commented Apr 15, 2020

CarloLucibello commented Apr 23, 2020 •

edited

Loading

CarloLucibello commented Apr 23, 2020 •

edited

Loading

juliohm commented Apr 23, 2020 •

edited

Loading

joshday commented Apr 27, 2020

juliohm commented Apr 27, 2020

joshday commented Apr 27, 2020

juliohm commented Apr 27, 2020

AriMKatz commented May 8, 2020

juliohm commented May 8, 2020

AriMKatz commented May 8, 2020

Refactoring of codebase #35

Refactoring of codebase #35

Comments

juliohm commented Apr 12, 2020

Suggestions of improvement

juliohm commented Apr 15, 2020

CarloLucibello commented Apr 23, 2020 • edited Loading

CarloLucibello commented Apr 23, 2020 • edited Loading

juliohm commented Apr 23, 2020 • edited Loading

joshday commented Apr 27, 2020

juliohm commented Apr 27, 2020

joshday commented Apr 27, 2020

juliohm commented Apr 27, 2020

AriMKatz commented May 8, 2020

juliohm commented May 8, 2020

AriMKatz commented May 8, 2020

CarloLucibello commented Apr 23, 2020 •

edited

Loading

CarloLucibello commented Apr 23, 2020 •

edited

Loading

juliohm commented Apr 23, 2020 •

edited

Loading