Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update sklearn model on 0.18 - see GH-3
$ ./soft404/train.py text_items_big Most common languages in data: [('zh-cn', 143533), ('en', 117488), ('ko', 23013), ('ja', 11624), ('fr', 8772), ('de', 8533), ('it', 6847), ('pt', 5491), ('', 4918), ('vi', 3399)] Using only data for "en" language 117484 pages, 26464 domains, 0.28 404 pages Training vectorizer... 117484/117484 [10:18<00:00, 189.91it/s] Building numeric features... 117484/117484 [02:45<00:00, 708.33it/s] Training and evaluating... 105735 in train, 11749 in test AUC 0.992 ± 0.007 AUC_text 0.992 ± 0.005 AUC_text_full 0.992 ± 0.005 F1 0.963 ± 0.013 F1_text 0.958 ± 0.012 F1_text_full 0.958 ± 0.014 selected_features 3000.000 ± 0.000
- Loading branch information