Skip to content

chunwangpro/A-case-study-retrieving-and-measuring-similarity-on-Wikipedia-articles

Repository files navigation

Retrieving Wikipedia articles

In this task, we focused on using nearest neighbors and clustering to retrieve documents that interest users, by analyzing their text. We explored two document representations: word counts and TF-IDF. We also built an Jupyter notebook for retrieving articles from Wikipedia about famous people.

Then we dug deeper into this application, compare results with word counts and TF-IDF, explore the retrieval results for various famous people, and familiarize ourselves with the code needed to build a retrieval system.

  • Data: people_wiki.sframe

    Or if you are using pandas and scikit-learn, you can read people_wiki.csv

  • Code: Retrieving Wikipedia articles.ipynb

About

Using turicreate to compare the performance of knn model using word-counts and TF-IDF.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published