Statistical Filtering

Statistical filtering uses the Bayesian spam filtering technique to calculate the probability of a message being spam based by its contents. Unlike simple content-based filters, Bayesian spam filtering learns from spam and from good mail by examining each word in the body of an e-mail message to determine if it is spam. Each word within a message is compared against known spam and non-spam word counts, and assigned a value based on whether the word is likely to be spam. Then, the entire message is assigned a probability based on the assessment of all combined word counts. If a message is identified as spam, you can choose to delete it, forward it to an e-mail address, or insert an X-Header into it. Words that contain non-alphabetic characters, such as numbers, are treated differently from other words. For more information, see Identifying Wildcards in E-mail.

To increase the chances of legitimate messages not be identified as spam, you can create a host-specific exclude list. The exclude list contains words that you do not want to be included in the statistical analysis, because they are just as likely to appear in non-spam messages as they are in spam messages. The exclude list is stored in the exclude-list.txt file, which is located in the domain's directory.

Advanced Statistical Filtering

The advanced statistical filtering options control the underlying functionality of the statistical filtering component. These options are useful for experienced administrators who want to further refine the antispam filtering ability.