Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问performance是在什么数据上测的? #2

Open
hengji-liu opened this issue May 31, 2017 · 5 comments
Open

请问performance是在什么数据上测的? #2

hengji-liu opened this issue May 31, 2017 · 5 comments

Comments

@hengji-liu
Copy link

请问performance是在什么数据上测的?为什么只有五万条数据?
在实际的test data上表现如何?

@PENGZhaoqing
Copy link
Owner

当时机器内存不够,因此是对部分数据集进行训练和test,并没有跑完

@hengji-liu
Copy link
Author

哦哦 好的。。。
因为这个数据在训练集和测试集上的分布是不一样的,测试集上还有额外的service和attack type
所以比较想看一下在测试集上的表现
我自己也在写这个任务,初步做出来也就和99年那些top entry的结果差不多。。。

@PENGZhaoqing
Copy link
Owner

厉害了,初步就能拿到top, 是用的什么模型

@hengji-liu
Copy link
Author

暂时只用了个Random Forest,连参数都没调。。。
只能说RF本身很厉害。。。我只是萌新拿到了屠龙刀而已
之后还打算加个Ada Boost和Extra Trees做Stacking Ensemble
其实我觉得用XGBoost和GDBT应该效果会好一丢丢,但是试了下,训练时间有点长
我也就做的玩玩,所以就不用那俩了

@PENGZhaoqing
Copy link
Owner

PENGZhaoqing commented May 31, 2017

要玩的话,可以去实战一下,比较有意思,最近腾讯有个转化率比赛http://algo.tpai.qq.com/ ,没事去水水哈哈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants