Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Write Generalized Linear Machine class #5006

Merged
merged 31 commits into from
Aug 4, 2020

Conversation

Hephaestus12
Copy link
Contributor

@Hephaestus12 Hephaestus12 commented Apr 14, 2020

#5005 #5000
This is the basic framework for the Generalized Linear Machine class.
This class is supposed to implement the following distributions
BINOMIAL,
GAMMA,
SOFTPLUS,
PROBIT,
POISSON

The code has been written keeping in mind this reference: PyGLMNet library
However, I have only written code for the Poisson distribution till now.

THIS IS A WORK IN PROGRESS

This PR is so that a discussion can be held about the implementation of the GLM and so that Some feedback can be obtained for my code.
@lgoetz
@geektoni

TODO

  • Write code.
  • Add basic test.
  • Add gradient test.
  • Link github gists for generating data.
  • Check why the SGObject test is failing.
  • Use FeatureDispatchCRTP.

@vigsterkr
Copy link
Member

great stuff i've added some comments... until this is WIP could you make this PR a draft actually?

@Hephaestus12 Hephaestus12 marked this pull request as draft April 14, 2020 10:56
@Hephaestus12
Copy link
Contributor Author

Everything added and the unit test works fine, the gist added for the first unit test.

However, the meta example test fails. Output: https://pastebin.com/fw5aS77U

@geektoni
Copy link
Contributor

Everything added and the unit test works fine, the gist added for the first unit test.

However, the meta example test fails. Output: https://pastebin.com/fw5aS77U

It seems that when creating the labels in the meta example it treats them as MulticlassLabels rather than RegressionLabels. Is there any way to force the label's creation to adopt a certain type of features? @gf712 @karlnapf

You have also a conflict with the data submodule. I think you need to rebase against the develop branch.

@geektoni
Copy link
Contributor

We are getting there! :) once the meta example is fixed, you should also add the meta example serialization result to the data submodule such to enable the integration tests.

@gf712
Copy link
Member

gf712 commented Jul 30, 2020

Everything added and the unit test works fine, the gist added for the first unit test.
However, the meta example test fails. Output: https://pastebin.com/fw5aS77U

It seems that when creating the labels in the meta example it treats them as MulticlassLabels rather than RegressionLabels. Is there any way to force the label's creation to adopt a certain type of features? @gf712 @karlnapf

You have also a conflict with the data submodule. I think you need to rebase against the develop branch.

I think there are two options: wait for the work @LiuYuHui is doing right now implementing the label encoding at Machine level, or you can use the regression_labels free function in the GLM train and apply routines to convert any labels to RegressionLabels if possible.

@karlnapf
Copy link
Member

I think use regression_labels for now, and as this should be done before the other refactor.

@geektoni
Copy link
Contributor

geektoni commented Jul 30, 2020 via email

@geektoni
Copy link
Contributor

So basically it is not possible to manually create RegressionLabels without calling regression_labels, right?

If that is the case, @Hephaestus12 I think we should remove the glm meta example from here and open a new PR with it. That way we could merge directly this GLM code and then add the meta example later on (since the issue with it is not with the GLM itself).

@gf712
Copy link
Member

gf712 commented Jul 30, 2020

Ah yes, it isn't implemented, even though it would be an easy thing to add, but beyond this PR. In that case I would wait until the LabelEncoder is implemented in Machine (or work on it in another pr)

@geektoni
Copy link
Contributor

Ah yes, it isn't implemented, even though it would be an easy thing to add, but beyond this PR. In that case I would wait until the LabelEncoder is implemented in Machine (or work on it in another pr)

I see. @Hephaestus12 could you please move the new data files and the glm meta example on another PR and remove them from here? Then, could you also change this PR from ”draft” to the normal mode?

We can then have a final look and if tests pass we merge.

@Hephaestus12 Hephaestus12 force-pushed the feature/implementGLM branch from 7299dd2 to 64a224d Compare July 30, 2020 21:13
@Hephaestus12 Hephaestus12 marked this pull request as ready for review July 30, 2020 21:14
class GLMCostFunction
{
public:
friend class GLM;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed anymore, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to make another function public but yes, we can remove the friend class.

@geektoni
Copy link
Contributor

Looks good to me. MacOS failures seem unrelated to this patch. Once the rest is green (and the above comment is resolved) we can merge.

@geektoni geektoni requested review from gf712, karlnapf and geektoni July 31, 2020 07:39
{
if (!data->has_property(FP_DOT))
error("Specified features are not of type CDotFeatures");
set_features(std::static_pointer_cast<DotFeatures>(data));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason why the features are added to the state here?

Copy link
Contributor Author

@Hephaestus12 Hephaestus12 Jul 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason, I did it this way as it was done like this in LinearMachine:

if (data)
{
if (!data->has_property(FP_DOT))
error("Specified features are not of type CDotFeatures");
set_features(std::static_pointer_cast<DotFeatures>(data));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I remove it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, you'll have to see if this has side effects that you rely on. @LiuYuHui do you know if this is required? Because you had to refactor this code recently in the feature branch

SGVector<float64_t> w_old = m_w.clone();

auto X = get_features()->get_computed_dot_feature_matrix();
auto y = regression_labels(get_labels())->get_labels();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be fixed at some point. I guess when we have the LabelEncoder in Machine, otherwise you are performing a potentially expensive operation in each iteration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like that RegressionLabels is what LabelEncoder lacks, currently, LabelEncoder only support MulticlassLabels and BinaryLabels.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but it should be possible to add it?

class GLM : public RandomMixin<IterativeMachine<LinearMachine>>
{
public:
// friend class GLMCostFunction;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to remove code, just remove it. Don’t just comment it out. Please don’t commit this kind of changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment removed

@Hephaestus12 Hephaestus12 requested a review from vigsterkr August 2, 2020 11:16
Copy link
Member

@gf712 gf712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, can merge it now and do minor fixes related to the machine refactor later on

@Hephaestus12
Copy link
Contributor Author

Can we merge this?

@gf712 gf712 merged commit 8e5e509 into shogun-toolbox:develop Aug 4, 2020
@gf712
Copy link
Member

gf712 commented Aug 4, 2020

Merging this so that @Hephaestus12 can use it to continue working on the influenza app. Any fixes, like cleaning up the distributions and optimiser, can be done in the future. Thank you!

@Hephaestus12
Copy link
Contributor Author

Yaay 💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants