DataMiningPractice

Data Mining is described as the art of combing data to discover hidden patterns, connections, and trends. It uses methods at the intersection of machine learning, statistics, programming, and AI. Like any skill, it requires practice and knowledge, and these short forays into Data Mining served as exercises to build the muscles needed to undertake more complex projects later on. The purpose of this repository is two fold; present my experience and skills with data mining and serve as a home for recursive code to be used later.

All notebooks come furnished with a data exploration section and conclusion, and some sport classes and functions created to make for easy reproduction of models.

Project completed in collaboration with Dblash, Joshua Dobbins, and DonnaMulkern

Iris Dataset Exploration

Using the infamous Iris dataset, explored techniques of initial data exploration. Determine how the Iris dataset features relate to eachother and obtain general mathematic information relating to the features.

Iris Clustering

Employ multiple clustering techniques on the same Iris Dataset, utilize a PCA (Principal Component Analysis), and visualize the results. Additionally, this was used to evaluate multiple clustering techniques against eachother to determine the best suited one for this dataset.

Iris Review

Rather than using clustering, this is a exploration of classification techniques when applied to the Iris Dataset. 17 classification methods from the sklearn library are utilized. These classifications are again evaluated against eachother, and all methods used are outlined as well as their advantages and disadvantages described.

California Housing Regression

Regression models are investigated in this notebook using the California Housing dataset, with the target variable being Median House Value. 8 models and 5 methods are compared, again all from sklearn

Mushroom Stew

Using data in the (expanded.csv)[https://github.com/pogags/DataMiningPractice/blob/main/expanded.csv], determine what a forager might want to look for when picking mushrooms to ensure a safe and appealing stew. The records in this dataset represent mushrooms, and the data has 22 features and 1 target class which is a binary of whether the mushroom is edible or poisonous. This notebook uses both data exploration as well as classification techniques to determine the best way to forage, and includes a bonus section that could be employed to pick safe mushrooms if the forager in question lost their sense of smell (scent had the highest feature importance generally).

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
README.md		README.md
california_housing_regression.ipynb		california_housing_regression.ipynb
expanded.csv		expanded.csv
iris_clustering.ipynb		iris_clustering.ipynb
iris_dataset_exploration.ipynb		iris_dataset_exploration.ipynb
iris_review.ipynb		iris_review.ipynb
mushroom_stew.ipynb		mushroom_stew.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataMiningPractice

Iris Dataset Exploration

Iris Clustering

Iris Review

California Housing Regression

Mushroom Stew

About

Releases

Packages

Languages

pogags/DataMiningPractice

Folders and files

Latest commit

History

Repository files navigation

DataMiningPractice

Iris Dataset Exploration

Iris Clustering

Iris Review

California Housing Regression

Mushroom Stew

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages