TechSoup.org The place for nonprofits, charities, and libraries

Thursday 12/2/04 -- Blocking Spam at the Inbox

Thursday 12/2/04 -- Blocking Spam at the Inbox

  • When all else fails, or when there is no "all else", there are still things you yourself can do to help block spam at your Inbox. There are a number of different types of software designed to help you get those messages out of your Inbox. These solutions generally work with an email reading program installed on your computer, instead of a web-based email interface (though if anyone knows of any solutions for web-based mail, please add them).

    One type of program gets set up as a "virtual" mail server. You install it on your computer, and provide it with your POP3 username and password. It downloads and filters the messages on its own, then stores them. You configure your email client to retrieve your mail from this program instead of your normal mail server. A free program of this type I like is MailWasher: http://www.snapfiles.com/get/mailwasher.html

    Another option is software that acts as a plug-in to your normal email client. Many major email clients have plug-ins made that do on-demand filtering of messages (you can filter them upon arrival, or any time you like). I have had good success with SpamBayes (http://spambayes.sourceforge.net/) for Outlook (mentioned in another post here), though there are a number of them out there.

    Some email clients have junk mail filtering built in. Mozilla Thunderbird (http://www.mozilla.org/products/thunderbird/), available for most major operating systems, has built-in Bayesian filtering (more on that later). Microsoft Outlook has had built in junk mail filtering for some time, though it has been primarily rule-based.

    Types of filtering
    - Word Matching: You can set up your own junk mail filters using this method, using any mail client that supports rules. The idea is that a message is flagged as possible spam if it contains certain words, such as "prescription drugs", "XXX", "make money", or "click here". While originally effective, most junk email nowadays adds random characters or spaces to confuse filtering programs but remain readable to humans -- such as "p.r,e.s,c.r,i.p,t.i,o.n d r u g s".

    - Sender: Whitelisting, blacklisting, and graylisting are other ways you can filter which messages hit your inbox. Whitelisting is probably the most effective, while at the same time being the most restrictive -- only email addresses you specify may send you mail. Blacklisting is the opposite -- email addresses you specify may NOT send you email (all others can). Graylisting is a newer technique, involving temporarily delaying the delivery of email to an address. The idea is that proper mail servers will be able to handle the delay, while spammers may not care to wait around and try again. A more detailed description can be found here: http://www.sneakemail.com/info.pl?sel=grey (from the Sneakemail service, though there other implementations)

    - Bayesian: Bayesian filtering looks at each "word" (collection of characters separated on either side by spaces) in an email message, and makes a guess as to how likely it is that a message containing that word is junk email compared to how likely it is that it is NOT junk email. Confused? Let me give an example:

    Let's say I receive two messages containing the word "mortgage". One is from my bank, and is important information about my home loan. Another is a junk email message. On its own, the word mortgage may appear in legitimate messages 50% of the time, and spam 50% of the time. But looking at all the other words in each message together, we find that the legitimate messages include "jsmith@mybank.com" (email addresses are considered as words), "elm street" (the street my house is on), and "12340987" (my account number). The spam messages include words like "unsubscribe", "spam" (as in "this message is not spam"), or "click" and "here". By examining the likelihood of each word being in a spam or legitimate message, the message overall gets a score. You can generally set your own thresholds for which scores are spam and which are not. (Still confused? Look here for a more in-depth explanation http://email.about.com/cs/bayesianfilters/a/bayesian_filter.htm).

    Bayesian filtering involves some "training" to tell it which messages are junk and which are not... so results may not be immediate. However, many anti-spam activists claim that Bayesian is the most effective way to see the least amount of junk email long-term.

    What software do you use to do filtering on your computer? Do you have a preferred type of filtering? Is it worth the time and effort to train a Bayesian filter, or do you prefer an "install and go" program?
  • Here is a little example of why simple word matching is far more error prone then it even appears on the surface. I work for a very large multi-national. Often we get asked things like, "Why can't you stop the lolita porn?". Well people in Western Europe and North America forget that lolita is legit name for a person. blocking out any message that contains the word lolita would totally screw over people that are named Lolita. And sure enough a check of the company address book reveals that we actually do have people named Lolita that work for the company.
  • I just started a trial use of MailWasher Pro to see if I could impact the Korean and Chinese and/or game based spam that has continued, unabated, for 4 plus years.

    For the trial I turned off all the sbl-rbl Dull lists on my mail server. I also disabled all my other rule based or IP based blocks.

    Most Korean ISP (as discussed elsewhere) do not make much effort to stem the spam originating from their customers. KISA and it's spamcop email do very little as well.

    The test of MailWasher has been very good and the immediate posting to related DULL and DUNS lists is quick.

    Note: I have also found that bl.spamcop.net and sbl-rbl@spamhaus.org both often block email that I actually wish to receive.