DoC Computing Support Group


Revision 5 as of 2008-09-11 11:19:36

Clear message

Email Spam Filtering

What is the Spam problem, and what is BrightMail?

In recent years, unsolicited bulk emails (spam) have become an increasing problem. Many people are finding their inboxes flooded with increasingly offensive spam - perhaps 100 spam messages per useful message.

Dealing with this problem is quite tricky, because there is no objective definition of what precisely constitutes a spam email, and thus the problem of taking an email message and reliably categorising it as spam or non-spam is a hard problem, with no single perfect solution. However, this problem is receiving serious attention world-wide nowadays, and several systems are now available which use a variety of techniques to detect spam.

CSG have investigated several spam-detection techniques - SpamAssassin and Dspam in the past - and have now settled on a commercial anti-spam and anti-virus package called BrightMail from Symantec. BrightMail automatically downloads new anti-spam and anti-virus tests - updating itself every few minutes - and claims to largely solve the spam problem by detecting the bulk of spam mail - but not all - while having an astonishingly low 1-in-a-million false positive rate.

What does CSG's installation of BrightMail do?

On both Departmental mail servers the Exim mail server software now hands all email messages to BrightMail for checking before Exim's normal mail processing starts. BrightMail runs all its tests on the message and comes to a verdict.

BrightMail verdicts, and Exim's response to them, can be summarised as:

  • BrightMail Verdict

    Exim Response

    Non-spam

    Leave unmarked; deliver normally

    Known virus

    Delete it silently!

    Virus checking failure [rare]

    Mark it by prefixing "[WARNING: NOT VIRUS CHECKED]" onto the Subject line; then deliver normally!

    Spam content - 99.9999% certain

    Mark it by adding the headers X-BrightMail-Spam-Flag: YES and X-Spam-Flag: YES, then deliver normally

    Spam from a blocked (recently blacklisted) IP address

    Mark it by adding the headers X-BrightMail-Spam-Blocked: YES and X-Spam-Flag: YES, then deliver normally

    Might be spam [rare]

    Mark it by adding the headers X-BrightMail-Spam-Maybe: YES and X-Spam-Flag: YES, then deliver normally

Note that the X-Spam-Flag: YES header is added to maintain backwards compatibility with the previous SpamAssassin system.

Any of these headers may be used by your .forward file in order to handle spam specially, we'll cover this shortly.

Please understand clearly that this is all that BrightMail does. It scans messages, deletes viruses, and then (optionally) adds some of the headers as shown above. It does not delete spam email, store spam messages in a different place, register spam messages with off-site spam databases, or anything else like that.

So what happens to messages marked as Spam?

This is where you come in. BrightMail adds the X-BrightMail-* and X-Spam-* headers whether you like it or not. But unless you do something, messages marked as spam will still come straight into your Inbox mail folder (your .email file) just as they always did - albeit invisibly marked with the Spam headers. This document assumes that you are not happy with this - that you want the spam singled out and dealt with separately. Now, you could do this mail filtering either (portably) in the Exim mail server or (not portably) in the mail client (an increasing number of email clients provide some type or other of rule-based mail filtering to be done, but each differs significantly). This document only describes the Exim server-side approach - to do server-side mail filtering with Exim, you need to construct an Exim .forward file containing a suitable mail filtering rule.

Don't worry if you've never heard of a .forward file before - to aid people set up their spam filtering, we have dropped in a default .forward file for everyone who didn't already have one. Alternatively, if you want to read up on the Exim .forward file, we have written a general purpose guide describing Exim's forward file - you're welcome to read it for more information, but if all you care about is spam checking, read on.

Ok, so what are my options for handling messages marked as spam?

First, bear in mind that BrightMail is based on heuristics and may mis-characterize an occasional email as spam. Symantec quote a 1-in-a-million false positive rate, and deliberately allow some probable spam through rather than falsely characterize non-spam as spam. Despite this bias, it is still possible that BrightMail may occasionally categorise an important legitimate email as spam. We have had a report of several legitimate emails from a Quantum Physics mailing list being marked erroneously as spam!

The three basic choices for dealing with spam messages are:

  • Trust BrightMail's 99.9999% certainty and delete all messages marked as spam:

    • This is an extreme option, which may cause you to lose the occasional non-spam email. Do this at your own risk.
  • Save all messages marked as spam into a single IMAP folder:
    • This is our preferred option. Here we assume that you use IMAP and that your IMAP mail folder directory is called ~/IMAP. Don't worry if your IMAP mail folder directory is something else - eg. ~/Mail - that's fine too. All we care about is that you know the name of your IMAP folder directory - you'll need this information later on. The only problem with this option is that your spam folder will rapidly grow enormous, so you must still check your spam folder periodically and bulk delete whole swathes of spam. As one way of keeping the size limited, we are considering an optional scheme which you can sign up to whereby every month your current Spam folder becomes your previous Spam folder, automatically at midnight on the first of the month.

  • Save each month's spam messages into a different IMAP folder:
    • This option has the advantage of allowing you to easily delete all a previous month's spams by simply deleting an entire IMAP folder. Again, an automatic opt-in service is being considered where old spam folders are deleted periodically.

Setting spam filtering up using spamhelp

Now you've decided what you want to do with messages marked as spam, and know (if relevent) the answers to supplementary questions like which directory contains your IMAP folders, you are ready to run a utility called spamhelp which will assist you.

This does not actually set up/modify your .forward file, but will create a suggested .forward file for you and tell you what to do. If you do not already have a .forward file, then it will suggest that you cut and paste a couple of simple Linux commands to make your .forward file live.

If, on the other hand, you do already possess a .forward file, then it will simply show you an appropriate mail filtering rule and recommend that you use your expertise to insert the rule in an appropriate place in your .forward file. As a convenience, it will also create a complete .forward-format file which would just perform spam checking. You can then either insert the spam-filtering rule into your existing .forward or merge your existing mail filtering rules into the new standard structure .forward-format file. We recommend the latter option in cases of doubt.

To run spamhelp simply login to any Linux machine - such as shell3 - using your SSH client, and then run /vol/linux/bin/spamhelp and answer the questions. For example, one possible run through of spamhelp is as follows:

Welcome to spamhelp.

- Ok, you already have a .forward file

- Please see http://www.doc.ic.ac.uk/csg/faqs/email/spam/ for a
  full discussion of CSG's spam-marking system and what options you have.

What do you want to do with emails marked as spam?

1. Delete them forever
2. Save them in a separate IMAP folder
3. Save each month's spam in a separate IMAP folder

Please make your choice (or CTRL-C to quit): 3

Ok, saving each month's spam messages into a separate IMAP folder is a
very good option.

- If you've followed our advice to the letter, then your IMAP directory
  (where all the IMAP folders are stored) should be called IMAP - you may
  have entered this into your mail client as this as IMAP, ~/IMAP or /IMAP
  or even the old, no longer supported, /homes/dcw/IMAP.

Please enter the name of your IMAP folder directory [default IMAP]: Wibble
No directory ~/Wibble found.  Please try again [default IMAP]: Mail

- Ok, your IMAP directory is Mail

- Here's the relevent spam filtering rule you need:

#
#       BrightMail filtering rule:
#       - file all messages marked as spam by BrightMail into
#         this month's spam folder (Spam-YYYY-MM)
#       all your IMAP folders live in ~/Mail
#
if $h_X-Spam-Flag: is "YES"
then
        if      $tod_log matches "^(....-..)-"
        then
                save Mail/Spam-$1
                finish
        endif
endif

- You need to insert the above section into your .forward file,
  at an appropriate place.  Read the http://www.doc.ic.ac.uk/csg/faqs/email/spam
  documentation for a discussion of where the most appropriate place might be.

- Even though you already have a .forward file, I've created an example
  .forward file containing the above section, currently called
  /homes/dcw/EXAMPLE_FORWARD, for you to look at.

- If you're not confident to make this change, email help@doc.ic.ac.uk
  and we'll be happy to assist you.

As it says above, if this all sounds too complicated, contact CSG for assistance -- we'll be happy to help!