Parameter Importance Assessment

Modern data sets are often described with far too many variables for practical model building. Usually most of these variables are irrelevant to the predictive modeling, and obviously their relevance is not known in advance. There are several disadvantages of dealing with overlarge feature sets. One is purely technical — dealing with large feature sets slows down algorithms, takes too many resources and is simply inconvenient. Another is even more important — many machine learning algorithms exhibit a decrease of accuracy when the number of variables is significantly higher than optimal. Therefore selection of the small (possibly optimal) feature set ensure best possible predictive modeking results is desirable for practical reasons.

In this repository, you will get R code for implementing four parameter importance assessment techniques which are based on some of the most popular machine learning and statistical modeling algorithms.

Random Forest - Boruta
Decision Tree/ Logistic Regression/ Linear Regression - Recursive Feature Elimination (RFE)
Decision Tree - Recursive PARTitioning (RPART)
Linear Regression - LMG

Following R packages are used in the development.

install.packages('Boruta') for Random Forest (Boruta)
install.packages('caret') for RFE and RPART
install.packages('relaimpo') for LMG
install.packages('randomForest') for Random Forest implementation
install.packages('TH.data') for datasets used in other packages

References

Lindeman, R.H., Merenda, P.F. and Gold, R.Z. (1980) Introduction to Bivariate and Multivariate Analysis, Glenview IL: Scott, Foresman.
Therneau, T. M., & Atkinson, E. J. (1997). An introduction to recursive partitioning using the RPART routines (Vol. 61, p. 452). Mayo Foundation: Technical report.
Granitto, P. M., Furlanello, C., Biasioli, F., & Gasperi, F. (2006). Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemometrics and Intelligent Laboratory Systems, 83(2), 83-90.
Kursa, M. B., Jankowski, A., & Rudnicki, W. R. (2010). Boruta–a system for feature selection. Fundamenta Informaticae, 101(4), 271-285.

Feedback and Questions

Hope you find this code useful. As it turns out different algorithms showed different parameter as important, or at least the degree of importance changed. This need not be a conflict, because each algorithm gives a different perspective of how the parameter can be useful depending on how the kernel estimation function learn independent and dependent parameters.

Please send your questions, comments and feedback to:

Mail me 👉 [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Code		Code
Data		Data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parameter Importance Assessment

About

Releases

Packages

Languages

rkmkp220/Parameter-Importance

Folders and files

Latest commit

History

Repository files navigation

Parameter Importance Assessment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages