You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
h2o4gpu :Genetic algorithm along with Random Forest Regression produces error: terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: out of memory
#789
Open
Geerthy11 opened this issue
Jul 24, 2019
· 1 comment
I am working on feature selection using Genetic Algorithm (GA) with Random forest regression model (h2o4gpu.RandomForest Regressor). The number of estimators is 100, rest of the parameters are default. Here, the fitness function for GA is RF model's MAE. My dataset is 1.51 MB and dimension is 4000*44. However, The following is the types of error i get after certain iterations (say 30-40) whenever i run the program:
terminate called after throwing an instance of 'thrust::system::system_error'
what(): parallel_for failed: out of memory
Aborted (core dumped)
I am working on feature selection using Genetic Algorithm (GA) with Random forest regression model (h2o4gpu.RandomForest Regressor). The number of estimators is 100, rest of the parameters are default. Here, the fitness function for GA is RF model's MAE. My dataset is 1.51 MB and dimension is 4000*44. However, The following is the types of error i get after certain iterations (say 30-40) whenever i run the program:
terminate called after throwing an instance of 'thrust::system::system_error'
what(): parallel_for failed: out of memory
Aborted (core dumped)
terminate called after throwing an instance of 'dmlc::Error'
what(): [08:58:38] /workspace/include/xgboost/./../../src/common/common.h:41: /workspace/src/tree/../common/device_helpers.cuh: 422: out of memory
Stack trace:
[bt] (0) /conda/envs/rapids/xgboost/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x24) [0x7f3f0b07fcb4]
[bt] (1) /conda/envs/rapids/xgboost/libxgboost.so(+0x3267e2) [0x7f3f0b2a57e2]
[bt] (2) /conda/envs/rapids/xgboost/libxgboost.so(xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal >::EvaluateSplits(std::vector<int, std::allocator >, xgboost::RegTree const&, unsigned long)+0x1041) [0x7f3f0b2b48b1]
[bt] (3) /conda/envs/rapids/xgboost/libxgboost.so(xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal >::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal >, xgboost::DMatrix, xgboost::RegTree*, dh::AllReducer*)+0x131e) [0x7f3f0b2c7dfe]
[bt] (4) /conda/envs/rapids/xgboost/libxgboost.so(+0x34a201) [0x7f3f0b2c9201]
[bt] (5) /conda/envs/rapids/bin/../lib/libgomp.so.1(GOMP_parallel+0x42) [0x7f3f1c5bee92]
[bt] (6) /conda/envs/rapids/xgboost/libxgboost.so(xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal >::Update(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal >, xgboost::DMatrix, std::vector<xgboost::RegTree*, std::allocatorxgboost::RegTree* > const&)+0x918) [0x7f3f0b2bae98]
[bt] (7) /conda/envs/rapids/xgboost/libxgboost.so(xgboost::gbm::GBTree::BoostNewTrees(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal >, xgboost::DMatrix, int, std::vector<std::unique_ptr<xgboost::RegTree, std::default_deletexgboost::RegTree >, std::allocator<std::unique_ptr<xgboost::RegTree, std::default_deletexgboost::RegTree > > >)+0xa81) [0x7f3f0b105791]
[bt] (8) /conda/envs/rapids/xgboost/libxgboost.so(xgboost::gbm::GBTree::DoBoost(xgboost::DMatrix, xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal >, xgboost::ObjFunction)+0xd65) [0x7f3f0b106c95]
Aborted (core dumped)
The following are the specifications:
Ubuntu 16.04.6 LTS
Python 3.6.8
CUDA 10.2/ cuDNN -7.4.1
GPU model -Quadro GV100
Nvidia docker version : 18.09.6
RAM: 125 GB
H2o4gpu is installed using PIP wheel for cuda 10.0 (https://s3.amazonaws.com/h2o-release/h2o4gpu/releases/stable/ai/h2o/h2o4gpu/0.3-cuda10/h2o4gpu-0.3.2-cp36-cp36m-linux_x86_64.whl)
Kindly provide your suggestions to this issue.
The text was updated successfully, but these errors were encountered: