Skip to content

Commit

Permalink
Update sklearn model on 0.18 - see GH-3
Browse files Browse the repository at this point in the history
$ ./soft404/train.py text_items_big
Most common languages in data:
[('zh-cn', 143533),
 ('en', 117488),
 ('ko', 23013),
 ('ja', 11624),
 ('fr', 8772),
 ('de', 8533),
 ('it', 6847),
 ('pt', 5491),
 ('', 4918),
 ('vi', 3399)]
Using only data for "en" language
117484 pages, 26464 domains, 0.28 404 pages
Training vectorizer...
117484/117484 [10:18<00:00, 189.91it/s]
Building numeric features...
117484/117484 [02:45<00:00, 708.33it/s]
Training and evaluating...
105735 in train, 11749 in test
AUC   0.992 ± 0.007
AUC_text 0.992 ± 0.005
AUC_text_full 0.992 ± 0.005
F1    0.963 ± 0.013
F1_text 0.958 ± 0.012
F1_text_full 0.958 ± 0.014
selected_features 3000.000 ± 0.000
  • Loading branch information
lopuhin committed Jan 13, 2017
1 parent 5869020 commit d066986
Showing 1 changed file with 0 additions and 0 deletions.
Binary file modified soft404/clf.joblib
Binary file not shown.

0 comments on commit d066986

Please sign in to comment.