Practice in information retieval algorithms.
Project | Description |
---|---|
Sekitei | Algorithm of focusing web crawler (extract features, clustering URLs and distribution the quota between sites based on the known URLs) |
Indexation | Modeling datastorage for search inverse index on Python |
Antispam | Feature engineering approach for defining spam based on Gradient Boosting |
Duplicates | Algorithm of defining duplicates based on minshingles |
Sentence Boundaries | Feature engineering approch for sentence boundaries detection based on Random Forest |