delphine milan: The Basics Of Bayesian Spam Filtering

Bayesian spam filtering is a way to distinguish between legitimate emails and illegitimate spam emails, through a process that uses Bayesian statistical methods.
Bayesian spam filtering has become a popular way to distinguish between legitimate emails and illegitimate spam emails, through a process that uses Bayesian statistical methods. It filters emails by classifying documents into categories. Based on the contents of the message in your email, the Bayesian spam filters calculate the probability of the message being a spam. They are much more robust than the normal content based filters, and their anti spam approach hardly has false positives.
Normally when you receive an email, one look tells you whether the email is a spam or not. To your eyes, there is zero probability of a spam looking like a good email. How would it be if spam filters, too, worked in the same way!
Bayesian Spam Filters
Bayesian spam filters are what are known as scoring content-based spam filters. They try to work the way your eye does in identifying spam emails, by looking for words and other characteristics that typify spam. Every characteristic typical of spam is assigned a score, and the total spam score for the whole message is computed. Depending on the type of Bayesian spam filter you are using, it may also look for legitimate email characteristics, thereby lowering the total score.
The basic difference between the Bayesian spam filters and other simple scoring content based spam filters is that the Bayesian spam filters build the list themselves, as against other filters that depend on a manually built list of characteristics.
You start with a sizable bunch of emails you have identified as spam, and another bunch of good emails. The filters look at both, the legitimate and the spam emails and calculate in what probability various characters appear in them. Bayesian spam filters may look at:
The words in the message body The headers (message paths and senders) The word pairs and phrases HTML code, such as colors Where a particular phrase appears (meta information)
The Problems With Scoring Content Based Filters
Though the scoring based spam filters work well, they also encounter certain problems; the normal ones more so than the Bayesian spam filters. These are some of the problems faced:
The scoring content based spam filters build a list of characteristics from the spam emails and the good emails they get. For building a good list of spam characteristics, mail needs to be collected from hundreds of sources (email addresses). This may weaken the efficiency of the spam filters, as the characteristics of the good email would be different for each person. If the spammers make an effort to make their mails look like genuine mails, the filtering characteristics may have to be corrected manually - a very big effort.
--------------------------------------------------------------------------------------------------------------
Author is admin and technical expert associated with development of security and performance enhancing software like Registry Cleaner, Anti Spyware, Window Cleaner. Learn how Anti Spam filter helps in securing online privacy. Visit our Home page or Resource Center to read more about products.

Bookmark it:

delphine milan

Sunday, April 20, 2008

The Basics Of Bayesian Spam Filtering

No comments:

Blog Archive

About Me