HTML Filtering Example of Scanning E-mail

To better understand how HTML filtering will increase your ability to identify spam, below is an example of an HTML spam message that was filtered first only through statistical and phrase filtering, and then through HTML filtering. In this message the spammer used bogus HTML tags to try to hide the words from spam filters. From the statistical filtering log entries below you can see that IMail Server didn't recognize many words in the e-mail. When this same message was run through HTML filtering, the log entries below show that more words were recognized:

Original Message
Date: Tue, 8 Apr 2003 16:04:09 -0400
Message-Id: <TestUser@ipswitch.com>
Mime-Version: 1.0
Content-Type: text/html; charset=us-ascii
From: "Test User" <TestUser@ipswitch.com>
Reply-To: <TestUser@ipswitch.com>
To: TestUser2@ipswitch.com
Subject: hello there
X-Mailer: <IMail v8.00>

<!W>VIA<!Z>GRA<!E> N<!l>o<!k>w<!g> a<!y>v<!b>a<!Z>I<!Y>l<!X>a<!N>b<!Q>l<!V>e<!H> f<!J>o<!I>r<!D> a<!S> l<!O>o<!I>w <!A>c<!Z>o<!X>s<!S>t<!J> t<!N>h<!X>e<!U> e<!L>ff<!V>ec<!W>tiv<!Z>ene<!E>ss<!I> <!K>o<!G>f<!Y><!F>V<!I>I<!F>AGRA<!C> has<!U> be<!D>en<!L> p<!Z>r<!B>o<!W>ven<!V>
t<!Z>i<!I>m<!M>e a<!H>nd<!E> tim<!U>e a<!H>g<!G>a<!B>in <!W>in
<!I>cl<!O>i<!D>ni<!O>c<!F>a<!K>l<!I> s<!Y>t<!K>udies <!C>w<!F>i<!F>th
t<!F>h<!M>ous<!K>and<!J>s o<!J>f<!B> p<!H>ati<!J>ent<!N>s<!J>.<!Y><!C>

Results When E-Mail is Scanned only with Statistical Filtering
05:23 10:18 SMTP (02940000) word = agra, probability = 0.990000
05:23 10:18 SMTP(02940000) word = udies, probability = 0.400000

Results when E-Mail is scanned through statistical and HTML Filtering
05:23 10:24 SMTP(09380000) word = viagra, probability = 0.911599
05:23 10:24 SMTP(09380000) word = thousands, probability = 0.796194
05:23 10:24 SMTP(09380000) word = proven, probability = 0.748141
05:23 10:24 SMTP(09380000) word = patients, probability = 0.718994
05:23 10:24 SMTP(09380000) word = been, probability = 0.285162
05:23 10:24 SMTP(09380000) word = again, probability = 0.309129