How to ensure your vital data is safely backed-up

Philosophy

CSG's standard model for reliable data storage is straightforward:

If you choose to step beyond this model, you are responsible for backing up data yourself.

(Please note that these classifications have no bearing on data confidentiality. Whilst important, protecting data from unauthorised access is a topic outside the scope of this document.)

What we do back-up

The only filesystems that we back up are the following:

You can check when we last successfully backed up your home directory, and whether or not we currently do backups of your favourite shared volume (and if so, when we last did so successfully), to:

... by checking this secure page to see your personal backup report

So, if you store your vital data in your home directory, or a shared volume, and check that we are backing it up, your data is our responsibility and should be safe.

Things which are not backed-up

Implications for users of laptops and custom desktop installations

All CSG Linux installations use, by default, your main network file-store as your local home-directory. (i.e. /homes/$USERNAME). As a result, anything you store there is backed-up. Similarly, CSG Windows installations provide access to your main home-directory via drive H:\.

However, if you choose to store data on your desktop computer's hard disk, especially if you've reinstalled your desktop with your own custom build, or are using your own laptop, it will not use any of these network shares by default -- meaning that you need to organise some suitable backup mechanism for any vital data you're working on. The rest of this document describes ways of performing such backups:

Backup strategies for standalone / custom machines

Mounting and working out of an existing reliable network file-store

If the computer you're using is constantly connected to the DoC network, then the ideal strategy for keeping your data safe is to mount some suitable network file-store onto your machine, and then conduct your work out of that area directly.

This may not be practical for some high-performance computing applications, where raw I/O performance or storage capacity become a factor; however, it is by far the simplest option.

Copying data to a durable network file-store

If the computer you're using is Internet-connected, but for some reason mounting a DoC network file-store directly is not possible (for example, if you're outside of the College firewall, or if you're only network-connected intermittently) then the next best thing would be to periodically copy your working data-set to one of the CSG network file-stores. It's really important that you set your backups to run automatically, e.g. via cron; in our experience, backup systems which aren't automated tend to be neglected.

Tools which you could consider using include:

Copying data to separate media

A not-uncommon strategy for standalone hosts is for users to manually copy their vital data to USB memory sticks or removable hard-drives. Whilst simple and straightforward -- and when done carefully, effective -- there are several hazards to avoid:

As a result, while this method is useful for providing an extra level of protection in addition to some other backup mechanism, we generally advise against its exclusive use.

If you do decide to use this method, we suggest taking the following precautions:

Emailing important data to yourself

Whilst not suitable for large amounts of data, a common and effective strategy is for you to email copies of documents and other comparatively small files to yourself -- either to your Imperial email account, or to some third-party email provider, such as GMail. Whilst there are significant dangers with using this strategy for highly sensitive documents -- you should not consider email to be secure from tampering or eavesdropping -- this can be immensely effective strategy for backing up highly valuable documents, such as a PhD thesis.

In fact, whilst we take a lot of care when looking after your data -- keeping multiple copies online and offline, keeping offline copies in a secure, fire-resistant safe, etc. -- there are some rare classes of event which, while highly unlikely, would destroy all of the copies of your data held by CSG but would not affect an agency like Google, which has an international distributed storage infrastructure.

Publish it on the Internet

If the data you wish to preserve is not secret, then you should definitely consider publishing it on the world-wide web, e.g. via the Departmental personal web hosting services, or using the Departmental Research Publications (Pubs) service. There are a number of organisations, such as Google and archive.org, who index the entire visible web and archive it.

Additionally, if your publications are interesting, then they are likely to be mirrored by other interested parties on the Internet regardless.

As Linus Torvalds famously once wrote:

"Only wimps use tape backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it ;)"

(That said, please ensure that you have all of the necessary rights to publish the material of interest on the Internet -- and ensure that, as always, you comply with the appropriate regulations!)

guides/file-storage/backup-strategy (last edited 2011-10-26 15:08:29 by dwm)