DoC Computing Support Group


Differences between revisions 7 and 8
Revision 7 as of 2009-05-15 15:45:08
Size: 12068
Editor: dcw
Comment:
Revision 8 as of 2009-06-16 14:39:55
Size: 9139
Editor: dcw
Comment:
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
In recent years, unsolicited bulk emails (spam) have become an increasing problem. Many people are finding their inboxes flooded with increasingly offensive spam - perhaps 300 spam messages per useful message. In recent years, unsolicited bulk emails (spam) have become an increasing problem. Across the world, there are now probably 500 spam emails sent per useful message.  Without a good anti-spam system,
you get to read all these!
Line 9: Line 10:
CSG have investigated several spam-detection techniques - [[http://spamassassin.apache.org|SpamAssassin]] and Dspam in the past - and have now settled on a commercial anti-spam and anti-virus package called [[http://www.symantec.com/business/brightmail-antispam|BrightMail]] from Symantec. !BrightMail automatically downloads new anti-spam and anti-virus tests - updating itself every few minutes - and claims to largely solve the spam problem by detecting the bulk of spam mail - but not all - while having an astonishingly low 1-in-a-million false positive rate. These notes describe the DoC email system, however please note that the Dept has just decided to migrate to using the College Exchange system over the next few months. Spam is handled in an entirely different
way by the College Exchange system - these notes do not cover that. These notes will be rewritten later once many DoC users are using Exchange. Take a look at our
[[services/email|Using Email]] page here for some info about both email system, and some links to Exchange information.

CSG have investigated several spam-detection techniques in the past - [[http://spamassassin.apache.org|SpamAssassin]] and Dspam - and have now settled on a commercial anti-spam and anti-virus package called [[http://www.symantec.com/business/brightmail-antispam|BrightMail]] from Symantec. !BrightMail automatically downloads new anti-spam and anti-virus tests - updating itself every few minutes - and claims to largely solve the spam problem by detecting about 95% of all spam mail while having an astonishingly low 1-in-a-million false positive rate.
Line 34: Line 39:
We have made sure that every DoC user has a `~/.forward` file, by default this will divert marked Spam into your `Spam` IMAP folder, ie. the file `~/IMAP/Spam`, although you may change what this does. We have made sure that every DoC user has a `~/.forward` file, by default this will divert marked Spam into your `Spam` IMAP folder (ie. the file `~/IMAP/Spam`) although, of course, you may change what this does.
Line 45: Line 50:
First, bear in mind that !BrightMail is based on heuristics and may mis-characterize an occasional email as spam. Symantec quote a 1-in-a-million false positive rate, and deliberately allow some probable spam through rather than falsely characterize non-spam as spam. Despite this bias, it is still possible that !BrightMail may occasionally categorise an important legitimate email as spam. We have had a report of several legitimate emails from a Quantum Physics mailing list being marked erroneously as spam! First, bear in mind that !BrightMail is based on heuristics and may mis-characterize an occasional email as spam. Symantec quote a 1-in-a-million false positive rate, and deliberately allow some probable spam through rather than falsely characterize non-spam as spam. Despite this bias, it is still possible that !BrightMail may occasionally categorise an important legitimate email as spam. We have had a report of several legitimate emails from a Quantum Physics mailing list being marked erroneously as spam; well, it's a point of view:-)!
Line 57: Line 62:

    As one way of keeping the size limited, we are considering an optional scheme which you can sign up to whereby every month your current Spam folder becomes your previous Spam folder, automatically at midnight on the first of the month.
    We have setup a spam folder rotation scheme, whereby every week your current Spam folder becomes your previous Spam folder, automatically at midnight on the Sunday/Monday transition, look at
    your `~/.folderrotate` file to see the rule that does this.
Line 62: Line 67:
    This option has the advantage of allowing you to easily delete all a previous month's spams by simply deleting an entire IMAP folder.

    Again, an automatic opt-in service is being considered where old spam folders are deleted periodically.

== Setting spam filtering up using spamhelp ==
Now you've decided what you want to do with messages marked as spam, and know (if relevent) the answers to supplementary questions like which directory contains your IMAP folders, you are ready to run a utility called `spamhelp` which will assist you.

This does not actually set up/modify your `.forward` file, but will create a suggested `.forward` file for you and tell you what to do. If you do not already have a `.forward` file, then it will suggest that you cut and paste a couple of simple Linux commands to make your `.forward` file live.

If, on the other hand, you do already possess a `.forward` file, then it will simply show you an appropriate mail filtering rule and recommend that you use your expertise to insert the rule in an appropriate place in your `.forward` file. As a convenience, it will also create a complete `.forward`-format file which would just perform spam checking. You can then either insert the spam-filtering rule into your existing `.forward` or merge your existing mail filtering rules into the new standard structure `.forward`-format file. We recommend the latter option in cases of doubt.

To run `spamhelp` simply login to any Linux machine - such as `shell3` - using your SSH client, and then run `/vol/linux/bin/spamhelp` and answer the questions. For example, one possible run through of `spamhelp` is as follows:

{{{
Welcome to spamhelp.

- Ok, you already have a .forward file

- Please see http://www.doc.ic.ac.uk/csg/faqs/email/spam/ for a
  full discussion of CSG's spam-marking system and what options you have.

What do you want to do with emails marked as spam?

1. Delete them forever
2. Save them in a separate IMAP folder
3. Save each month's spam in a separate IMAP folder

Please make your choice (or CTRL-C to quit): 3

Ok, saving each month's spam messages into a separate IMAP folder is a
very good option.

- If you've followed our advice to the letter, then your IMAP directory
  (where all the IMAP folders are stored) should be called IMAP - you may
  have entered this into your mail client as this as IMAP, ~/IMAP or /IMAP
  or even the old, no longer supported, /homes/dcw/IMAP.

Please enter the name of your IMAP folder directory [default IMAP]: Wibble
No directory ~/Wibble found. Please try again [default IMAP]: Mail

- Ok, your IMAP directory is Mail

- Here's the relevent spam filtering rule you need:

#
# BrightMail filtering rule:
# - file all messages marked as spam by BrightMail into
# this month's spam folder (Spam-YYYY-MM)
# all your IMAP folders live in ~/Mail
#
if $h_X-Spam-Flag: is "YES"
then
        if $tod_log matches "^(....-..)-"
        then
                save Mail/Spam-$1
  finish
 endif
endif

- You need to insert the above section into your .forward file,
  at an appropriate place. Read the http://www.doc.ic.ac.uk/csg/faqs/email/spam
  documentation for a discussion of where the most appropriate place might be.

- Even though you already have a .forward file, I've created an example
  .forward file containing the above section, currently called
  /homes/dcw/EXAMPLE_FORWARD, for you to look at.

- If you're not confident to make this change, email help@doc.ic.ac.uk
  and we'll be happy to assist you.
}}}

As it says above, if this all sounds too complicated, contact CSG for assistance -- we'll be happy to help!
    This option has the advantage of allowing you to easily delete all a previous month's spams by simply deleting an entire IMAP folder. Ask us if you're interested in this option, it's slightly
    more complicated to setup.

Email Spam Filtering

What is the Spam problem, and what is BrightMail?

In recent years, unsolicited bulk emails (spam) have become an increasing problem. Across the world, there are now probably 500 spam emails sent per useful message. Without a good anti-spam system, you get to read all these!

Dealing with this problem is quite tricky, because there is no objective definition of what precisely constitutes a spam email, and thus the problem of taking an email message and reliably categorising it as spam or non-spam is a hard problem, with no single perfect solution. However, this problem is receiving serious attention world-wide nowadays, and several systems are now available which use a variety of techniques to detect spam.

These notes describe the DoC email system, however please note that the Dept has just decided to migrate to using the College Exchange system over the next few months. Spam is handled in an entirely different way by the College Exchange system - these notes do not cover that. These notes will be rewritten later once many DoC users are using Exchange. Take a look at our Using Email page here for some info about both email system, and some links to Exchange information.

CSG have investigated several spam-detection techniques in the past - SpamAssassin and Dspam - and have now settled on a commercial anti-spam and anti-virus package called BrightMail from Symantec. BrightMail automatically downloads new anti-spam and anti-virus tests - updating itself every few minutes - and claims to largely solve the spam problem by detecting about 95% of all spam mail while having an astonishingly low 1-in-a-million false positive rate.

What does CSG's installation of BrightMail do?

On both Departmental mail servers the Exim mail server software now hands all email messages to BrightMail for checking before Exim's normal mail processing starts. BrightMail runs all its tests on the message and comes to a verdict.

BrightMail verdicts, and Exim's response to them, can be summarised as:

  • BrightMail Verdict

    Exim Response

    Non-spam

    Leave unmarked; deliver normally

    Known virus

    Delete it silently!

    Virus checking failure [rare]

    Mark it by prefixing "[WARNING: NOT VIRUS CHECKED]" onto the Subject line; then deliver normally!

    Spam content - 99.9999% certain

    Mark it by adding the headers X-BrightMail-Spam-Flag: YES and X-Spam-Flag: YES, then deliver normally

    Spam from a blocked (recently blacklisted) IP address

    Mark it by adding the headers X-BrightMail-Spam-Blocked: YES and X-Spam-Flag: YES, then deliver normally

    Might be spam [rare]

    Mark it by adding the headers X-BrightMail-Spam-Maybe: YES and X-Spam-Flag: YES, then deliver normally

Note that the X-Spam-Flag: YES header is added to maintain backwards compatibility with the previous SpamAssassin system.

Any of these headers may be used by your .forward file in order to handle spam specially, we'll cover this shortly.

Please understand clearly that this is all that BrightMail does. It scans messages, deletes viruses, and then (optionally) adds some of the headers as shown above. It does not delete spam email, store spam messages in a different place, register spam messages with off-site spam databases, or anything else like that.

So what happens to messages marked as Spam?

After the X-BrightMail-* and X-Spam-* headers are added, your email is sent to your home directory server, where a separate copy of Exim delivers the email by running through your ~/.forward file. This contains mail processing rules to determine what should happen to the incoming email. We have made sure that every DoC user has a ~/.forward file, by default this will divert marked Spam into your Spam IMAP folder (ie. the file ~/IMAP/Spam) although, of course, you may change what this does.

Don't worry if you've never heard of a .forward file before - you can stick to using the default one we've dropped in. However, if you're interested and would like to read up on the Exim .forward file, we have written a general purpose guide describing Exim's forward file - you're welcome to read it for more information, but if all you care about is spam checking, read on.

Note that this mail filtering happens portably on the server side, independent of what email client you use. Many email clients also provide some type of rule-based mail filtering, but each differs significantly). This document only describes the Exim portable server-side approach.

Ok, so what are my options for handling messages marked as spam?

First, bear in mind that BrightMail is based on heuristics and may mis-characterize an occasional email as spam. Symantec quote a 1-in-a-million false positive rate, and deliberately allow some probable spam through rather than falsely characterize non-spam as spam. Despite this bias, it is still possible that BrightMail may occasionally categorise an important legitimate email as spam. We have had a report of several legitimate emails from a Quantum Physics mailing list being marked erroneously as spam; well, it's a point of view:-)!

The three basic choices for dealing with spam messages are:

  • Trust BrightMail's 99.9999% certainty and delete all messages marked as spam:

    • This is an extreme option, which may cause you to lose the occasional non-spam email. Do this at your own risk.
  • Save all messages marked as spam into a single IMAP folder called Spam:
    • This is our preferred option, it's the default behaviour if you haven't modified your .forward file. Your IMAP mail folder directory is called ~/IMAP, and the default spam folder name is Spam. The only problem with this option is that your spam folder will rapidly grow enormous, so you must still check your spam folder periodically and bulk delete whole swathes of spam. We have setup a spam folder rotation scheme, whereby every week your current Spam folder becomes your previous Spam folder, automatically at midnight on the Sunday/Monday transition, look at

      your ~/.folderrotate file to see the rule that does this.

  • Save each month's spam messages into a different IMAP folder:
    • This option has the advantage of allowing you to easily delete all a previous month's spams by simply deleting an entire IMAP folder. Ask us if you're interested in this option, it's slightly more complicated to setup.

Customizing your spam filtering - some hints

As we have said above, BrightMail marks messages as spam with a very high degree of certainty. So there aren't many spam customizations needed.

However, there are still a few things you can customize: Here are a few hints and examples:

  1. Fixing specific false-positives:
    • Suppose you regularly find that a certain type of email that you wish to receive is characterised by BrightMail as spam. For example, the previous SpamAssassin detection scheme characterised all Easyjet booking confirmation emails as spam. If such a case occurs in future, you can add specific pre-spam filtering rules to your .forward file, whose condition matches a specific characteristic (eg. the From or maybe the Subject: header) of the email and whose action tells Exim to stop processing the .forward rules and deliver the email normally (to your .email file in your home directory, aka your IMAP "Inbox"). For instance:

            # Easyjet booking confirmations look like spam to spamassassin,
            # so pre-spam filter them
            if $h_From: contains "@easyjet.co.uk"
            then
              finish          # stop processing and deliver normally
            endif
  2. You might also decide to have separate rules matching each of the X-BrightMail-Spam-* headers. If so, use the following basic structure:

    • # Separate Spam/Blocked/Maybe messages into 3 different IMAP folders.
      # i.e. you might want to:
      #       - divert to brightmail-spam folder if spam.
      #       - divert to brightmail-blocked folder if blocked.
      #       - divert to brightmail-maybe folder if may be spam.
      #
      if $h_X-BrightMail-Spam-Flag: is "YES"
      then
          save IMAP/brightmail-spam
          finish
      elif $h_X-BrightMail-Spam-Blocked: is "YES"
      then
          save IMAP/brightmail-blocked
          finish
      elif $h_X-BrightMail-Spam-Maybe: is "YES"
      then
          save IMAP/brightmail-maybe
          finish
      endif

      In particular you might decide that the X-BrightMail-Spam-Maybe case should be delivered normally. In this case, the final comment should say:

      #      - deliver normally to Inbox if may be spam.
      And the final elif should simply read:
      elif $h_X-BrightMail-Spam-Maybe: is "YES"
      then
          finish # deliver normally
      endif
 
 

guides/email/spam (last edited 2009-06-17 16:45:15 by dcw)