K-Means Clustering is a Machine Learning Algorithm which creates a model based on the nearest-neighbour-choice.
The model in this repository has been implemented by k-means classification which predics the favourite sport in a state(in India).
util_methods.py - Contains the linear algebra methods that are used in the k-means model.
fav_sport.py - Run this file to get output of the model. It contains the following methods:
- major_cluster(labels) : Takes a list of input variables as parameter and returns a winner with most number of occurences.
- knn_classify(k, labeled_points, new_point) : Driver algorithm which creates cluster based on the value of k.
- plot_cities() : Plots the cities based on (latitude and longitude) with the appropriate markers of favourite sport.
- classify_and_plot_grid() : Plots the graph of k-means clustering model!
An attempt to make the code as readable as possible has been made.
You can modify the code to create your own k-means classification.
e.g. Instead of states(a list used in the fav_sport.py file), you can use languages which people speak throughout the world.
Feel free to reach out for any suggestions or help!
Joel Grus and his book: Data Science from Scratch