-
Notifications
You must be signed in to change notification settings - Fork 205
Machine learning API #797
Comments
Thanks for putting this together. I'd also like to see us build out some kind of machine learning functionality on top of Onyx. I attempted to build out a Clojure wrapper for Tensorflow that would allow you to build Tensorflow graphs using idiomatic Clojure data structures and patterns (much like Onyx). The issue is that most of the useful functionality in Tensorflow is baked into the Python client library and is not available as part of the data model / C bindings. One of the huge advantages that Onyx has over everything else in the world is that its API and data model are equivalent. Therefore, when building out any sort of ML functionality on Onyx (and I think this goes without saying), we should to adhere to that principle. In my experience, most of my machine learning projects perform best using either random forest or (more recently) XGBoost. The other algorithms might be nice for academic purposes, but if we're looking to build something that the 30 or so people who know both Clojure and machine learning are looking to use, then we should probably just focus on those two algorithms. However, I'm open to suggestions. Also, I think we should not rule out neural networks entirely. I think we should focus on XGBoost and random forest for now and then look at tackling neural networks to some degree after that. |
I agree, random forests are one of the more useful algorithms out there and should cover a lot of ground. If we go one level of abstraction higher, and look at whether or not onyx-core needs any additional functionality to make developing these types of algorithms work, what are your thoughts on that ? Due to Onyx' flexible nature, I'm fairly certain that most of these things can be offloaded to a separate library; perhaps we should just bite the bullet and start working on that, and see where that brings us. |
ML isn't exactly my sweet spot -- but do let us know if you'd like estimates of how difficult any changes to core would be to support, or suggestions about how to structure the library. |
Does anyone have links to any particularly helpful papers on random forest or XGBoost? I'm having a hard time finding anything that gives a clear cut explanation of the algorithms. |
Hi! There's this monograph about random forest and its various declinations. Then there is Chapter 15 of Elements of Statistical Learning which is very good. Unfortunately for XGBoost I can only suggest Chen's paper. |
As discussed on Clojurians, I would like to use this issue to start describing some machine learning functionality that could be achieved with Onyx, and/or what such an API would look like.
Scope
First of all, I think Onyx should not try to support all kinds of machine learning algorithms out there; the area is crowded, and I think there are certain types of algorithms that Onyx is better suitable for than others.
On my list of what I would like to see Onyx able to better facilitate, I'm going to pick a few samples that I'm personally very familiar with -- as such it's heavily biased, but it's a good starting point for the discussion:
And I'm going to leave on specifically out of scope:
Workflow
Let's separate the way we could use Onyx for ML in two different ways:
The typical workflow for training an ML algorithm looks as follows:
Design
I'm not 100% sure about the actual API yet, but I can already see a few patterns here:
I'm not sure whether these things are in/out of scope for Onyx, or belong in an
onyx-ml
plugin library; we would have to further explore this.The text was updated successfully, but these errors were encountered: