Saturday, August 13, 2005

Spam Filters

About six months ago, I switched to Mozilla Thunderbird for e-mail. Up till then with my old e-mail client I'd been using SpamAssassin, a highly effective Bayesian spam filter. But Thunderbird had its own built-in Bayesian filter, so I thought I'd give it a try.

I trained Thunderbird's filter to recognize spam and non-spam— easy as clicking on a button— and over time it did learn. Very slowly. It got better, but it never got very aggressive: I'd say after six months, Thunderbird's built-in filter was catching about 80% of the spam I was receiving.

Fortunately Thunderbird also lets you define your own manual filters. I'd set up some of these early on, and I've been adding more along the way. But just over the past week or two, I've become very active at adding new manual spam filters. And so within these past couple of weeks, the percentage of spam I catch has risen to probably 98% or better. The rare piece of spam that now slips through into my inbox, I examine to see what additional manual filters would prevent it from slipping through again.

Of course one of the best ways to catch spam is to use the anti-spam headers that are added to your e-mail enroute. If X-IMAIL-SPAM-STATISTICS is 0.9 or greater, then into the spam bucket it goes— and likewise for a couple of other such anti-spam headers.

Spammers often send out email to a dozen or so addresses at my small local ISP, for some reason not explicitly including my e-mail address in the list. So I catch another large portion of my spam by setting up a manual filter for "To: or Cc: contains but not myemail."

To this I've been adding local email addresses to which I know neither I nor any of my regular correspondents would be sending e-mail. The rule "if To: or Cc: contains, then send to spam bucket" catches the occasional spam sent both to my address and to the otherwise unknown jimbob.

A few frequent spammers are foolhardy enough to use their own domain, or some easily recognizable variant of it. Thus "From: contains emsemail" or "From: contains somespecial" gets junked. And one spammer keeps changing "From:", but keeps sending to one of my e-mail addresses using a specific and very distinctive mailer: thus any e-mail sent to this particular address of mine, with X-Mailer equal to Apple Mail (2.728), gets tossed out.

Then there are the telltale words in the subject line: "invest," "rolex," "whore," "mortgage," "low cost," "specials," "medica" (catches variations such as "medical" or "medication"), etc.

And scanning the text of the e-mail for various combinations of the terms "confidential," "business," "proposal," and "risk," catches many e-mails from so-called Nigerian scams. You know, "Sir, please forgive my contacting you with this confidential business proposal, I am in receipt of $23.8 million in the Pyramid National Bank of Lagos, Nigeria, from estate of late Evangelist Lanson Purple who died in a plane crash leaving no heirs, and I need your assistance in transfering this money out of the country. Please send me your bank account number, no risk to you in this venture, and you will retain 25% of funds as your fee..."

Then there's the distant professional colleague who sends me (and also dozens if not hundreds of others) a cheery little weekly newsletter. I'm not sure how I got on her mailing list, since I doubt she would ever have been able to recognize me if she met me on the street. However, her weekly musings always have the same subject line, and so rather than embarrass her by telling her to take my name off her blasted mailing list, I've simply added a manual spam filter for any email from her with the usual subject line.

I'm about to give the same treatment to another monthly newsletter I receive, five or six pages of text which somehow manage to run to well over one megabyte. Not that this makes as much difference now that I've got DSL. But I can't for the life of me figure how this brief newsletter regularly manages to run to over a meg— I've tried resaving the document once I receive it, and my word processor has no trouble reducing it, in the very same document format, to less than 5% of the original size in which it is e-mailed out.

Of course, anyone who's already in my address book will get through to me no matter what. As for the rest, with my manual filters Thunderbird is getting better and better at sorting out the wheat from the chaff. I hardly ever have a false positive. And the amount of my spam being routed straight into File 13 is rapidly approaching 100%.

By the way, overall I really like Mozilla Thunderbird. If you're looking for an e-mail alternative to that virus conduit known as Micro$oft Outlook Express, you might want to check out the secure and user-friendly Thunderbird.


Post a Comment

<< Home