This repository contains my personal notes, examples, and code implementations based on the book "Effective XGBoost" by Matt Harrison. The purpose of this repo is to document my journey as I peer thru the mind of a master displaying his craft so I could glean some insights myself into building models that I could use for my projects.
Ths is a book that provides a comprehensive guide to using the XGBoost library for building machine learning models. The book covers the basics of decision trees, a fundamental component of the XGBoost model, and discusses the tradeoffs involved in creating a predictive model. It also provides best practices for using the XGBoost library and shows how to use related libraries to improve your model. The book includes examples and exercises to help readers practice using XGBoost and understand its features.
The book is organized into chapters that cover topics such as data preprocessing, hyperparameter tuning, model evaluation, and model interpretation. It also discusses advanced topics such as feature interactions, SHAP values, and model deployment. Throughout the book, the author provides practical examples and code snippets to help readers understand how to use XGBoost effectively.
- Introduction
- Datasets
- Exploratory Data Analysis
- Tree Creation
- Stumps on Real Data
- Model Complexity & Hyperparameters
- Tree Hyperparameters
- Random Forest
- XGBoost
- Early Stopping
- XGBoost Hyperparameters
- Hyperopt
- Step-wise Tuning with Hyperopt
- Do you have enough data?
- Model Evaluation
- Training For Different Metrics
- Model Interpretation
- xgbfir (Feature Interactions Reshaped)
- Exploring SHAP
- Better Models with ICE, Partial Dependence, Monotonic Constraints, and Calibration
- Serving Models with MLFlow
- Conclusion