Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content Enhancement : KNN #14

Open
AM1CODES opened this issue Apr 3, 2021 · 2 comments
Open

Content Enhancement : KNN #14

AM1CODES opened this issue Apr 3, 2021 · 2 comments

Comments

@AM1CODES
Copy link
Member

AM1CODES commented Apr 3, 2021

The below mentioned additions are required -

  • Math - We need to add a solved example to show the working of the algorithm. To be precise, a pen and paper example of the math behind the algorithm with help of images like we have used for other algorithms.
@sudharsansrini
Copy link

sudharsansrini commented Aug 31, 2021

KNN - K Nearest Neighbor

KNN algorithm is mainly used to classify and predicting the values in Machine Learning. We can use it for Supervised Machine Learning problems(Labeled) such that Classification and Regression.

KNN
K - constant value
NN - Nearest Neighbor

KNN is simply , Finding the K Nearest Neighbor of the data point and classifying the data points with maximum number of classes of the neighbors

20210831_155414_1

We can find the value of K by finding the mean of the error in each records and plotting as graph
K is the value where the mean error rate is lesser

error =[]
for i in range(1, 40):
    model = KNeighborsClassifier(n_neighbors=i)
    model.fit(x_train, y_train)
    pred_i = model.predict(x_test)
    error.append(np.mean(pred_i != y_test))

plt.figure(figsize=(15, 15))
plt.plot(range(1, 40), error, color = "red", linestyle = "dashed", marker="o", markerfacecolor="blue", markersize=10)
plt.title('Error Rate K value')
plt.xlabel('K value')
plt.ylabel('Mean Error')
plt.show()

Finding K -> Finding how many neighbors that we have to choose around the data point

Figure_1

Value of k = less error rate
After choosing K value,
You may have a question like,
How will it find the k nearest points ?

Distance of the data points will be calculated in KNN using Minkowski distance.

min

Based on Minkowski Distance metric, We gonna classify the data points.
p=1, Manhattan Distance
p=2, Euclidean Distance

Manhattan Distance

1_KDgfdK6SooXtaUvlnXdpaA

This is the distance between real vectors using sum of their absolute difference.

Euclidean Distance

1_9LeaMTcOXxeTPN-VCbKloQ

Euclidean distance is calculated as the square root of sum of the squared differences between a new point (x) & existing point y

Hamming Distance

1_1lXpSrDyY37V4Ke--GkRQQ

It is used for categorical variables. If the value x and the value y are the same, the distance d will be equal to 0. Otherwise distance equal to 1

The above methods are used to find the distance and to finding the k closest neighbor of that particular data point.
With the classes of neighbors, We can classify the data point with maximum neighbor's class

Thank you!

@AM1CODES
Copy link
Member Author

Hey @sudharsansrini the content that you mentioned above looks really amazing and we could incorporate it in the main platform. I am pretty sure that writing this would have taken a good amount of time and i request you to make a PR for the same because otherwise it won't get counted as your contribution. Feel free to join our discord if you need any help with how to contribute and stuff like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants