-
Notifications
You must be signed in to change notification settings - Fork 6
timnugent/naive-bayes
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Naive Bayes Classifier ---------------------- (c) Tim Nugent Compile by running 'make'. Uses std=c++11 - on older compilers you may need to change this to 'std=c++0x' in the Makefile. Run all tests with 'make test' Run the train/classify tool as follows: ./nb_classify -d 1 training.data test.data Training and test data should be in svm-light/libsvm format, e.g. +1 1:4.12069 2:11.3896 3:18.5742 4:2.85764 5:53.4406 -1 1:3.14565 2:17.4338 3:19.3353 4:2.63431 5:56.4233 Sparse data is not (yet) supported. The -d flag is the decision rule, option 1 = Gaussian (default), 2 = Multinomial and 3 = Bernoulli. -v flag turns verbose mode on - use this to see the classification results. The -a flag passes the smoothing parameter used in the multinomial decision rule (default 1.0, Laplace smoothing). Gaussian example ---------------- Generate Gaussian training and test data as follows: ./generate_gaussian_data 5000 +1 5.3 2.0 12.8 3.1 18.2 1.0 1.2 3.4 56.2 2.3 > training.data ./generate_gaussian_data 5000 -1 7.3 1.0 10.8 1.1 24.1 2.3 0.8 1.2 53.1 0.8 >> training.data ./generate_gaussian_data 1000 +1 5.3 5.0 12.8 5.1 18.2 4.0 1.2 5.4 56.2 4.3 > test.data ./generate_gaussian_data 1000 -1 7.3 4.0 10.8 4.1 24.1 5.3 0.8 4.2 53.1 2.8 >> test.data Arguments are the number of examples, the class label, then pairs of means and standard deviations. Note that I bumped up the standard deviations in the test data to make classification harder. Then train and classify as follows: ./nb_classify -d 1 training.data test.data Multinomial example ------------------- Example based on table 13.1 from here: http://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html echo "+1 1:2 2:1 3:0 4:0 5:0 6:0" > training.data echo "+1 1:2 2:0 3:1 4:0 5:0 6:0" >> training.data echo "+1 1:1 2:0 3:0 4:1 5:0 6:0" >> training.data echo "-1 1:1 2:0 3:0 4:0 5:1 6:1" >> training.data echo "+1 1:3 2:0 3:0 4:0 5:1 6:1" > test.data Then train and classify as follows: ./nb_classify -d 2 training.data test.data Bernoulli example ----------------- Example taken from here: http://www.inf.ed.ac.uk/teaching/courses/inf2b/learnnotes/inf2b-learn-note07-2up.pdf echo "+1 1:1 2:0 3:0 4:0 5:1 6:1 7:1 8:1" > training.data echo "+1 1:0 2:0 3:1 4:0 5:1 6:1 7:0 8:0" >> training.data echo "+1 1:0 2:1 3:0 4:1 5:0 6:1 7:1 8:0" >> training.data echo "+1 1:1 2:0 3:0 4:1 5:0 6:1 7:0 8:1" >> training.data echo "+1 1:1 2:0 3:0 4:0 5:1 6:0 7:1 8:1" >> training.data echo "+1 1:0 2:0 3:1 4:1 5:0 6:0 7:1 8:1" >> training.data echo "-1 1:0 2:1 3:1 4:0 5:0 6:0 7:1 8:0" >> training.data echo "-1 1:1 2:1 3:0 4:1 5:0 6:0 7:1 8:1" >> training.data echo "-1 1:0 2:1 3:1 4:0 5:0 6:1 7:0 8:0" >> training.data echo "-1 1:0 2:0 3:0 4:0 5:0 6:0 7:0 8:0" >> training.data echo "-1 1:0 2:0 3:1 4:0 5:1 6:0 7:1 8:0" >> training.data echo "+1 1:1 2:0 3:0 4:1 5:1 6:1 7:0 8:1" > test.data echo "-1 1:0 2:1 3:1 4:0 5:1 6:0 7:1 8:0" >> test.data ./nb_classify -d 3 training.data test.data Generating dummy data --------------------- See above for Gaussian data. For multinomial data: ./generate_multinomial_data 5000 +1 2 3 4 3 5 > training.data ./generate_multinomial_data 5000 -1 1 4 3 2 4 >> training.data The five weights are used to generate integers with probability weight/sum of weights. For Bernoulli data: ./generate_bernoulli_data 5000 +1 0.7 0.1 0.5 > training.data ./generate_bernoulli_data 5000 -1 0.4 0.2 0.6 >> training.data The three probabilities are used to generate binary variables. Note that the prior class probabilites are calculated based on their ratios in the training data.
About
Naïve Bayes classifier with gaussian, multinomial and bernoulli decision rules [machine learning][statistics]
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published