Email Spam Classifier using SVM

Run the code

This project has implemented the following email preprocessing and normalization steps:

• Lower-casing: The entire email is converted into lower case

• Stripping HTML: All HTML tags are removed from the emails.

• Normalizing URLs: All URLs are replaced with the text \httpaddr".

• Normalizing Email Addresses: All email addresses are replaced with the text \emailaddr".

• Normalizing Numbers: All numbers are replaced with the text \number".

• Normalizing Dollars: All dollar signs ($) are replaced with the text \dollar".

• Word Stemming: Words are reduced to their stemmed form.

• Removal of non-words: Non-words and punctuation have been re- moved.

The vocabulary list was selected by choosing all words which occur at least a 100 times in the spam corpus, resulting in a list of 1899 words.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
InsertEmailHere.txt		InsertEmailHere.txt
LICENSE		LICENSE
Main.m		Main.m
README.md		README.md
emailFeatures.m		emailFeatures.m
getVocabList.m		getVocabList.m
linearKernel.m		linearKernel.m
porterStemmer.m		porterStemmer.m
predictSVM.m		predictSVM.m
processEmail.m		processEmail.m
readFile.m		readFile.m
spamTest.mat		spamTest.mat
spamTrain.mat		spamTrain.mat
trainSVM.m		trainSVM.m
vocab.txt		vocab.txt