Statistical Filter Options (Content Filtering)
How to get here
- From the home page, click the > tab.
- In the Domains list, select a domain. The page appears.
- In the left navigation pane, click . The pageappears.
- Click . The Statistical Filter page appears.
Use Statistical Filtering to create and maintain the mail domain specific , specify the action to take when spam is identified, and specify whether to use the primary mail domain's word counts or create new ones.
Statistical Filtering uses the Bayesian spam filtering technique to calculate the probability of a message being spam based by its contents. Each word in an e-mail message is examined and evaluated depending on how often the word appears in spam and non-spam e-mail. The entire message is then evaluated based on all of the word values to determine whether it is likely to be spam.
Shows the current selected domain. From the drop down you can pick any of the domains available to this administrative user account.
Set the following options to configure statistical filtering.
- Click to create a new word to filter for the current domain.
- . Click a word or phrase, then click to modify.
- Select a phrase that you want to delete from the domain, then click to delete the phrase.
- Click or to sort the word list.
If the Word list has multiple pages, you can use the page navigation control which appears below the list.
:
- Immediately deletes the message.
- Forwards the message to an e-mail address entered in the text box to the right of this option. By default, messages are sent to the root address and stored in a mailbox called "bulk". Example
- (default). Inserts an X- Header into the message indicating that the message was identified as spam by statistical filtering. For more information, see Spam X-Header Explanations.
- Moves the message to the user's mailbox specified in the text box to the right of this option. If the mailbox does not exist, it is created.
- No action is performed on messages identified as spam by the statistical filter.
We recommended that you select the option instead of until you know that the antispam options are setup correctly.
For more spam options see Using Delivery Rules to Filter Spam.
- If selected, the subject of a message that is identified as spam by the statistical filter will be modified to begin with .
These options control the underlying functionality of the statistical filtering feature and are dependant upon each other to effectively identify spam. If you have a significant number of legitimate messages that are being identified as spam (false positives) or vice versa, you may need to adjust these options.
The default settings are appropriate for most systems. We strongly advise that ONLY experienced administrators modify these settings. Setting these options too high or too low could hinder IMail Server's ability to identify spam.
- (default value is 40%). The percentage assigned to new words to determine if they are spam. Enter a value between 0 and 100%.
The higher the value, the more likely a new word will be treated as if it had previously appeared in e-mail messages. The lower the value, the more likely a new word will be treated as if it had previously appeared in e-mail messages. For example, if you enter 0, every new word will be treated as if it were non-spam. If you enter 100%, every word will be identified as spam.
We recommend that this value not be set higher than 40%. The idea behind setting this option at 40% or less is to bias the statistical analysis in favor of being legitimate e-mail, thereby reducing the likelihood of a false positive.: If this option is set to 20%, a new word will be treated as having appeared in spam emails 20% of the time and as having appeared in non-spam emails 80% of the time. - (default value is 90%). The closer the value is to 100%, the less likely that spam will be caught. The closer the value is to 0, the greater the probability that you will have false positives. Enter a value between 0 and 100%.
This option sets the minimum probability percentage at which a message will be identified as spam. Messages with probability values below the value entered are identified as non-spam. Messages with probability values above this value are identified as spam.: Suppose this option is set to 80%. If an e-mail message is processed and the combined probability for all of the word values within it is 60%, then this message is identified as non-spam because it does not meet the probability benchmark of 80%.: If the word "Stop" appears in an e-mail for the first time, it is considered a new word and assigned a probability of 40% (probability a new word is spam). If you have the "spam calculated probability exceeds" set to 90%, then "stop" is not considered to be spam. In order for "stop" to be considered spam, its probability will have to increase from 40% to 90%. - (default value is 15). The number of individual , within each e-mail, used to calculate the probability that an e-mail is spam. You can enter any value in this text box; however, entering anything above 25 may have unpredictable results.
Each word within an e-mail is assigned two word counts: the number of times the word has occurred in spam, and the number of times that a word has occurred in non-spam. From these values, a spam probability is computed for the word. This setting examines the words whose probabilities deviate most from an average word. These words are both spam and non-spam words.: Suppose this option is set to 15. Since most words have an average spam probability of 50% (50% likely to be spam, 50% likely to be non-spam), then the fifteen words that are farthest away from 50% are used. So if a word has a spam probability of 5% it will most likely be used. Likewise, if a word has a spam probability of 90%, it will most likely be used. A word that has a 45% probability will most likely not be used.
The value for the can greatly affect the performance of statistical filtering. The greater the value, the more time is spent determining which words to evaluate within a message. Thus, statistical filtering takes longer to calculate the e-mail probability and mail processing takes longer.
Related Topics
About Statistical Filtering
Creating Separate antispam-table.txt Files for Multiple Email Domains
Installing Updated phrase.txt File
Setting Premium Filter Antispam Options