DoC Computing Support Group


Differences between revisions 22 and 23
Revision 22 as of 2009-06-16 17:16:28
Size: 9484
Editor: dcw
Comment:
Revision 23 as of 2009-06-17 16:45:15
Size: 9495
Editor: dcw
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Email: Spam Filtering = = Email: Filtering out Spam in DoC =

Email: Filtering out Spam in DoC

What is the Spam problem, and what is BrightMail?

In recent years, unsolicited bulk emails (spam) have become an increasing problem. Across the world, there are now probably 500 spam emails sent per useful message. Without a good anti-spam system, you get to read all these!

Dealing with this problem is quite tricky, because there is no objective definition of what precisely constitutes a spam email, and thus the problem of taking an email message and reliably categorising it as spam or non-spam is a hard problem, with no single perfect solution. However, this problem is receiving serious attention world-wide nowadays, and several systems are now available which use a variety of techniques to detect spam.

These notes describe the DoC email system, however please note that the Dept has just decided to migrate to using the College Exchange system over the next few months. Spam is handled in an entirely different way by the College Exchange system - these notes do not cover that. These notes will be rewritten later once many DoC users are using Exchange. Take a look at our Using Email page here for some info about both email system, and some links to Exchange information.

CSG have investigated several spam-detection techniques in the past - SpamAssassin and Dspam - and have now settled on a commercial anti-spam and anti-virus package called BrightMail from Symantec. BrightMail automatically downloads new anti-spam and anti-virus tests - updating itself every few minutes - and claims to largely solve the spam problem by detecting about 95% of all spam mail while having an astonishingly low 1-in-a-million false positive rate.

What does CSG's installation of BrightMail do?

On both Departmental mail servers the Exim mail server software now hands all email messages to BrightMail for checking before Exim's normal mail processing starts. BrightMail runs all its tests on the message and comes to a verdict.

BrightMail verdicts, and Exim's response to them, can be summarised as:

BrightMail Verdict

Exim Response

Non-spam

Leave unmarked; deliver normally

Known virus

Delete it silently!

Virus checking failure [rare]

Mark it by prefixing "[WARNING: NOT VIRUS CHECKED]" onto the Subject line; then deliver normally!

Spam content - 99.9999% certain

Mark it by adding the headers X-BrightMail-Spam-Flag: YES and X-Spam-Flag: YES, then deliver normally

Spam from a blocked (recently blacklisted) IP address

Mark it by adding the headers X-BrightMail-Spam-Blocked: YES and X-Spam-Flag: YES, then deliver normally

Might be spam [rare]

Mark it by adding the headers X-BrightMail-Spam-Maybe: YES and X-Spam-Flag: YES, then deliver normally

Note that the X-Spam-Flag: YES header is added to maintain backwards compatibility with the previous SpamAssassin system.

Any of these headers may be used by your .forward file in order to handle spam specially, we'll cover this shortly.

Please understand clearly that this is all that BrightMail does. It scans messages, deletes viruses, and then (optionally) adds some of the headers as shown above. It does not delete spam email, store spam messages in a different place, register spam messages with off-site spam databases, or anything else like that.

So what happens to messages marked as Spam?

After the X-BrightMail-* and X-Spam-* headers are added, your email is sent to your home directory server, where a separate copy of Exim delivers the email by running through your ~/.forward file. This contains mail processing rules to determine what should happen to the incoming email. We have made sure that every DoC user has a ~/.forward file, by default this will divert marked Spam into your Spam IMAP folder (ie. the file ~/IMAP/Spam) although, of course, you may change what this does.

Don't worry if you've never heard of a .forward file before - you can stick to using the default one we've dropped in. However, if you're interested and would like to read up on the Exim .forward file, we have written a general purpose guide describing Exim's forward file - you're welcome to read it for more information, but if all you care about is spam checking, read on.

Note that this mail filtering happens portably on the server side, independent of what email client you use. Many email clients also provide some type of rule-based mail filtering, but each differs significantly). This document only describes the Exim portable server-side approach.

Ok, so what are my options for handling messages marked as spam?

First, bear in mind that BrightMail is based on heuristics and may mis-characterize an occasional email as spam. Symantec quote a 1-in-a-million false positive rate, and deliberately allow some probable spam through rather than falsely characterize non-spam as spam. Despite this bias, it is still possible that BrightMail may occasionally categorise an important legitimate email as spam. We have had a report of several legitimate emails from a Quantum Physics mailing list being marked erroneously as spam; well, it's a point of view:-)!

The three basic choices for dealing with spam messages are:

  • Trust BrightMail's 99.9999% certainty and delete all messages marked as spam:

    • This is an extreme option, which may cause you to lose the occasional non-spam email. Do this at your own risk. If you really want to do this,

      edit your ~/.forward file from a linux machine (or via an ssh/putty connection to a linux machine such as shell1) and replace the existing spam rule with:

    if $h_X-Spam-Flag: is "YES"
    then
          seen finish   # WOW!  DELETE ALL MARKED SPAM.
    endif
  • Save all messages marked as spam into a single IMAP folder called Spam:
    • This is our preferred option, it's the default behaviour if you haven't modified your .forward file. Your IMAP mail folder directory is called ~/IMAP, and the default spam folder name is Spam. The only problem with this option is that your spam folder will rapidly grow enormous, so you must still check your spam folder periodically and bulk delete whole swathes of spam. We have setup a spam folder rotation scheme, whereby every week your current Spam folder becomes your previous Spam folder, automatically at midnight on the Sunday/Monday transition, look at

      your ~/.folderrotate file to see the rule that does this.

  • Save each month's spam messages into a different IMAP folder:
    • This option has the advantage of allowing you to easily delete all a previous month's spams by simply deleting an entire IMAP folder. Ask us if you're interested in this option, it's slightly more complicated to setup.

Customizing your spam filtering - some hints

As we have said above, BrightMail marks messages as spam with a very high degree of certainty. So there aren't many spam customizations needed.

However, there are still a few things you can customize: Here are a few hints and examples:

  1. Fixing specific false-positives:
    • Suppose you regularly find that a certain type of email that you wish to receive is characterised by BrightMail as spam. For example, the previous SpamAssassin detection scheme characterised all Easyjet booking confirmation emails as spam. If such a case occurs in future, you can add specific pre-spam filtering rules to your .forward file, whose condition matches a specific characteristic (eg. the From or maybe the Subject: header) of the email and whose action tells Exim to stop processing the .forward rules and deliver the email normally (to your .email file in your home directory, aka your IMAP "Inbox"). For instance:

            # Easyjet booking confirmations look like spam to spamassassin,
            # so pre-spam filter them
            if $h_From: contains "@easyjet.co.uk"
            then
              finish          # stop processing and deliver normally
            endif
  2. You might also decide to have separate rules matching each of the X-!BrightMail-Spam-* headers. If so, use the following basic structure:

    • # Separate Spam/Blocked/Maybe messages into 3 different IMAP folders.
      # i.e. you might want to:
      #       - divert to brightmail-spam folder if spam.
      #       - divert to brightmail-blocked folder if blocked.
      #       - divert to brightmail-maybe folder if may be spam.
      #
      if $h_X-BrightMail-Spam-Flag: is "YES"
      then
          save IMAP/brightmail-spam
          finish
      elif $h_X-BrightMail-Spam-Blocked: is "YES"
      then
          save IMAP/brightmail-blocked
          finish
      elif $h_X-BrightMail-Spam-Maybe: is "YES"
      then
          save IMAP/brightmail-maybe
          finish
      endif

      In particular you might decide that the X-BrightMail-Spam-Maybe case should be delivered normally. In this case, the final comment should say:

      #      - deliver normally to Inbox if may be spam.
      And the final elif should simply read:
      elif $h_X-BrightMail-Spam-Maybe: is "YES"
      then
          finish # deliver normally
      endif
 
 

guides/email/spam (last edited 2009-06-17 16:45:15 by dcw)