Skip to content

mhahsler/Introduction_to_Data_Mining_R_Examples

Repository files navigation

R Companion for Introduction to Data Mining

R and tidyverse are very popular for data mining. This repository contains slides and documented R examples to accompany several chapters of the popular data mining textbook:

Pang-Ning Tan, Michael Steinbach, Anuj Karpatne and Vipin Kumar, Introduction to Data Mining, Addison Wesley, 1st or 2nd edition.

The slides and examples are used in my course CS 5/7331 Data Mining taught at SMU and will be regularly updated and improved. The code examples are now compiled into the free online book An R Companion for Introduction to Data Mining which is published under the Creative Commons Attribution-ShareAlike license and you can share and adapt them freely. Please open an issue for corrections or to suggest improvements.

Content

Companion Chapter Lecture Slides Free Textbook Chapter
1. Introduction PDF, PowerPoint -
2. Data PDF, PowerPoint -
2.5. Exploring Data PDF, PowerPoint Web Chapter Exploring Data
3. Classification: Basic Concepts PDF, PowerPoint 3. Classification
4. Classification: Alternative Techniques PDF, PowerPoint -
5. Association Analysis: Basic Concepts PDF, PowerPoint 5. Association Analysis
6. Association Analysis: Advanced Concepts - -
7. Cluster Analysis: Basic Concepts PDF, PowerPoint 7. Cluster Analysis
8. Regression - Appendix D
9. Logistic Regression - (covered in Chapter 4.6)

Interactive Help

  • Ask the R Wizard (GPT) to explain R code and help with writing code.

Software Requirements

You need to install:

Each book chapter will use a set of packages that must be installed. The installation is done directly in R and the installation code can be found at the beginning of each chapter.

Statement of Need

The textbook Introduction to Data Mining has been one of the most popular choices for learning and teaching data mining concepts. Some of the most important chapters have been made available for free by the authors on the books's website. One of the authors also provides Python Jupyter notebooks with examples, but complete R code examples were still needed. Given the R community's interest in data analysis, data science, and machine learning, and the broad support of R packages for data mining, there was a noticeable gap that was filled by this learning resource. This resource targets advanced undergraduate and graduate students and can be used as a component for a first introduction to data mining.

Instructor Resources

  • PowerPoint presentation files for a data mining course can be found in the repository directory slides. The slides have an R symbol at the bottom whenever there are R code examples available.
  • Datasets for projects can be found at https://www.kaggle.com/datasets

How To Cite This Book

Michael Hahsler (2024). An R Companion for Introduction to Data Mining. figshare. DOI: 10.6084/m9.figshare.26750404, URL: https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/

License

Creative Commons Attribution 4.0 International License All code and documents in this repository are licensed under the Creative Commons Attribution 4.0 International License.

For questions please contact Michael Hahsler.