A Naive Bayes classifier is a simple and effective method for filtering spam emails. It's based on the principles of Bayesian probability and assumes that the features (words) in an email are independent of each other. Despite its simplicity, Naive Bayes often performs well in practice.
![]() |
![]() |
Spam emails are unsolicited messages sent in bulk to a large number of recipients. They are a nuisance to email users and a serious problem for email service providers. Spam filters are designed to detect spam emails and prevent them from reaching users' inboxes. A Naive Bayes classifier is a simple and effective method for filtering spam emails. It's based on the principles of Bayesian probability and assumes that the features (words) in an email are independent of each other. Despite its simplicity, Naive Bayes often performs well in practice.
The dataset used in this project is the SpamAssassin Public Corpus. It contains 3,375 ham (non-spam) emails and 1,500 spam emails. The emails are stored in two directories: spam
and easy_ham
. The spam
directory contains 500 spam emails and the easy_ham
directory contains 2,375 ham emails. The dataset is stored in the SpamData
directory.