Skip to content
This repository has been archived by the owner on Jun 22, 2022. It is now read-only.

LightGBM on dimension reduced dataset

Kamil A. Kaczmarek edited this page Jul 10, 2018 · 4 revisions

whale 🐳

Feature Extraction

  • truncated svd projection
  truncated_svd__n_components: 50
  truncated_svd__n_iter: 10
  • pca projection
  pca__n_components: 100
  • fast ica projection
fast_ica__n_components: 15
  • factor analysis
  factor_analysis__n_components: 50
  • gaussian random projection
  gaussian_random_projection__n_components: 50
  gaussian_projection__eps: 0.1

Note as it turns out the eps parameter doesn't matter (tried 0.01,0.1,1.0) with exact same results

  • sparse random projection
  sparse_random_projection__n_components: 50

Model and results

  • lightGBM truncated svd 1.56 CV
  • lightGBM pca 1.55 CV
  • lightGBM fast ica 1.57 CV
  • lightGBM factor analysis 1.51 CV
  • lightGBM gaussian random projection 1.63 CV
  • lightGBM sparse random projection 1.47 CV
  • lightGBM projections (all) 1.47 CV
  • lightGBM projections best (sparse random projection + factor analysis + truncated svd + fast-ica) 1.448 CV
  • lightGBM projections second best (sparse random projection) 1.452 CV
  • lightGBM raw + projections (second best) 1.393 CV
  • lightGBM projections (second best) + aggregations 1.345 CV
  • lightGBM raw + projections (second best) + aggregations 1.3416 CV 1.41 CV

Pipeline diagram

pipeline-solution-4