-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Write Generalized Linear Machine class #5006
WIP Write Generalized Linear Machine class #5006
Conversation
great stuff i've added some comments... until this is WIP could you make this PR a draft actually? |
Everything added and the unit test works fine, the gist added for the first unit test. However, the meta example test fails. Output: https://pastebin.com/fw5aS77U |
It seems that when creating the labels in the meta example it treats them as You have also a conflict with the data submodule. I think you need to rebase against the develop branch. |
We are getting there! :) once the meta example is fixed, you should also add the meta example serialization result to the |
I think there are two options: wait for the work @LiuYuHui is doing right now implementing the label encoding at Machine level, or you can use the |
I think use |
I think regression_labels cannot be used, since the error in the gist seems
to be thrown by regression_labels itself.
…On Thu, 30 Jul 2020 at 11:57, Heiko Strathmann ***@***.***> wrote:
I think use regression_labels for now, and as this should be done before
the other refactor.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5006 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZE2KEB3KOHOFMEKMSGSWLR6E7XVANCNFSM4MHT627Q>
.
|
So basically it is not possible to manually create RegressionLabels without calling regression_labels, right? If that is the case, @Hephaestus12 I think we should remove the glm meta example from here and open a new PR with it. That way we could merge directly this GLM code and then add the meta example later on (since the issue with it is not with the GLM itself). |
Ah yes, it isn't implemented, even though it would be an easy thing to add, but beyond this PR. In that case I would wait until the |
I see. @Hephaestus12 could you please move the new data files and the glm meta example on another PR and remove them from here? Then, could you also change this PR from ”draft” to the normal mode? We can then have a final look and if tests pass we merge. |
7299dd2
to
64a224d
Compare
src/shogun/machine/GLM.h
Outdated
class GLMCostFunction | ||
{ | ||
public: | ||
friend class GLM; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed anymore, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to make another function public but yes, we can remove the friend class.
Looks good to me. MacOS failures seem unrelated to this patch. Once the rest is green (and the above comment is resolved) we can merge. |
{ | ||
if (!data->has_property(FP_DOT)) | ||
error("Specified features are not of type CDotFeatures"); | ||
set_features(std::static_pointer_cast<DotFeatures>(data)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason why the features are added to the state here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular reason, I did it this way as it was done like this in LinearMachine:
shogun/src/shogun/machine/LinearMachine.cpp
Lines 70 to 76 in ec557d8
if (data) | |
{ | |
if (!data->has_property(FP_DOT)) | |
error("Specified features are not of type CDotFeatures"); | |
set_features(std::static_pointer_cast<DotFeatures>(data)); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, you'll have to see if this has side effects that you rely on. @LiuYuHui do you know if this is required? Because you had to refactor this code recently in the feature branch
SGVector<float64_t> w_old = m_w.clone(); | ||
|
||
auto X = get_features()->get_computed_dot_feature_matrix(); | ||
auto y = regression_labels(get_labels())->get_labels(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be fixed at some point. I guess when we have the LabelEncoder in Machine, otherwise you are performing a potentially expensive operation in each iteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like that RegressionLabels
is what LabelEncoder lacks, currently, LabelEncoder only support MulticlassLabels and BinaryLabels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but it should be possible to add it?
src/shogun/machine/GLM.h
Outdated
class GLM : public RandomMixin<IterativeMachine<LinearMachine>> | ||
{ | ||
public: | ||
// friend class GLMCostFunction; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to remove code, just remove it. Don’t just comment it out. Please don’t commit this kind of changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, can merge it now and do minor fixes related to the machine refactor later on
Can we merge this? |
Merging this so that @Hephaestus12 can use it to continue working on the influenza app. Any fixes, like cleaning up the distributions and optimiser, can be done in the future. Thank you! |
Yaay 💯 |
#5005 #5000
This is the basic framework for the Generalized Linear Machine class.
This class is supposed to implement the following distributions
BINOMIAL,
GAMMA,
SOFTPLUS,
PROBIT,
POISSON
The code has been written keeping in mind this reference: PyGLMNet library
However, I have only written code for the Poisson distribution till now.
THIS IS A WORK IN PROGRESS
This PR is so that a discussion can be held about the implementation of the GLM and so that Some feedback can be obtained for my code.
@lgoetz
@geektoni
TODO
SGObject
test is failing.FeatureDispatchCRTP
.