Backing up your data

This document is for people who have content on linux desktops at the ROE that is not normally backed up. Note: the /home partition is already archived. In this document, I will provide a step-by-step HOWTO for getting your content automatically backed up. If you follow this document you will have your data backed up automatically every day at the same time.

There are a number of machines used for providing backup space.

linnhe
mull
64TB
nevis
18TB

On these machinbes, the space available for backup storage is spread over a number of discs: /rsync*. You will need a directory one of these storage area in which to store your backups.

Step 1: Submit a helpdesk request for a backup directory on mull.

Make a note of which drive your backup directory is located on. It will be of the form /rsyncnn where nn is a two-digit number.

If you have completed Step 3 (below), then you can get n via:

ssh mull "dir -d /rsync*/$USER"

We all use the same machine for our backups: mull. To optimise the network traffic and disc activity, it is important that we don't write to the drive at the same time. I maintain a list of time slots available for backups.

Step 2: Look for an available time slot from the list of time slots and email me (ert) with your selection so I can update the document.

This guide uses the tool rsync (remote synchronisation) for mirroring the contents of one drive to a different location.

To use rsync to automatically backup a drive without any interaction, first you must have password-less logins set up in SSH.

Password-less logins are very secure, so long as you don't leave your computer unattended. Horst would be right in giving a good stern lecture on the importance of not leaving yourself logged in while away from your terminal. The simple answer is to lock your screen.

Step 3: Enable passwordless logins

cd ~/.ssh
ssh-keygen -t rsa
<ret>
<ret>
<ret>
cp id_rsa.pub authorized_keys
(<ret> just means press return)

Test:

ssh fetlar
(It should log you in w/o asking for a password)

If you get the error agent admitted failure to sign using the key, then try ssh-add.

Step 4: Determine the correct command you need

rsync

Backups of a file system can be created and maintained by the rsync command. rsync creates and maintains a mirror of a file system. When it is run, it determines the differences between the two file systems and then copies newer files over to the mirror. So the first time it runs, it will simply copy everything. Subsequent calls will only copy files that have changed. The copying is one-way; the source tree is essentially read-only by default.

Eg, I want to backup the entire directory tree under /garve/ert on my hostmachine, garve, to mull:/rsync44/ert such that /garve/ert is identical to mull:/rsync44/ert/garve.

Basic command (from garve):

rsync -a /garve/ert/ mull:/rsync44/ert/garve
or from mull:
rsync -a garve:/garve/ert/ /rsync44/ert/garve

The "-a" means "archive".

If a file exists on mull such as /rsync44/ert/garve/foo/bar which doesn't exist on garve (perhaps it is from a previous backup) then the file is not touched. This is good for archiving data, but the mirrored filesystem will continue to grow as unmatched files accumulate. To avoid this, use the --delete option: if a file exists at the destination that does not exist at the source, it is deleted from the archive. Clearly this is a risk. If I delete a file accidently and don't notice for a couple of days, the archive will also have lost it.

cron

We can run rsync by hand, but it would be more convenient if it ran automatically every day or every week. The cron daemon allows you to do this. Users maintain a file (a cron table) which the cron daemon scans, looking for commands to run. The cron table is a list of commands and the times to run them. To manipulate your cron table, use crontab

A cron table entry has the form (see: man 5 crontab):

<minute> <hour> <day of month> <month> <day of week> command

The wildcard "*" matches all occurrences.

So a crontab entry on mull like this:

30 05 * * * rsync -a --delete garve:/garve/ert /rsync44

Says "every day in every month at 05:30, execute
rsync -a --delete garve:/garve/ert /rsync44"

You may need to re-direct any output from the command to /dev/null if you don't want email messages every day from cron. Do this like:

30 05 * * * rsync -a --delete garve:/garve/ert /rsync44 >>& /dev/null
I have for the first line of my cron table:
SHELL=/bin/tcsh
which lets me use the csh redirection format ">>&".

Step 5: Edit your cron entry on mull and add the neccessary cron table entry, as described above..
ssh mull and type:

crontab -e

This will open an editor with your cron table. It uses the editor pointed to by your environment variable EDITOR. Otherwise it defaults to the vi editor.

Save and exit your editor. The cron table entry for the rsync command should be ready to go. You should check that it worked by sshing in to mull the following day and examining the contents of your backup directory.

I encourage you to look over the man pages for rsync and crontab as there are other neat things you can do that may be more appropriate to your archive needs.

Tips

Install the rsync command in the crontab entry on mull instead of the machine on which your filesystem resides natively. This has two advantages: 1) You only need to maintain one crontab entry even if you backup items from more than one machine; 2) your contents on spider (the web server) can be archived (you can ssh to spider but not from).

Keep a copy of your crontab entry. Your crontab is installed in /var/spool/cron/crontabs/. The /var tree can be wiped clean at the administrator's discretion. After editting your crontab entry, copy it to your home directory via:

crontab -l > ~/.crontab
This dumps the contents of your crontab entry to the hidden file, .crontab. If you have crontab entries on more than one machine, append the machine name. For example:
crontab -l > ~/.crontab-garve
If you do lose your entry, it can be re-installed from your copy via
crontab ~/.crontab

If you have more than one directory to mirror, add multiple crontab entries (one for each directory) and stagger them by a minute or two. You will also need to have a separate directory for each backup. For example:

30 05 * * * rsync -a --delete garve:/garve1/ert /rsync44/ert/garve1 >>& /dev/null
31 05 * * * rsync -a --delete garve:/garve2/ert /rsync44/ert/garve2 >>& /dev/null
Don't forget to mkdir /rsync44/ert/garve1 and mkdir /rsync44/ert/garve2 first

FAQ

Q: crontab entry there, but doesn't execute.

A: Check that there is a carriage return at the end of the entry. Missing final carriage returns are a particular problem for entries editted with emacs. See the BUGS section of the crontab man pages.

Q: rsync doesn't work on a Tru64 system. I get the error message:

mull: Connection refused
unexpected EOF in read_timeout

A: The default versions of SSH on the Tru64 systems are not the same as on mull. On the Tru64 systems, you need to explicitly use the version in /usr/local/bin which is OpenSSH, like on mull. Here is an example which backs up my home directory which is hosted on the Tru64 system reaxp05 on the filesystem /starusers/:

 rsync -avz --delete --rsh=/usr/local/bin/ssh /starusers/ert \
       mull:/rsync44/ert/home
Note, this isn't an issue if you run rsync from mull, in which case
 rsync -avz --delete --rsh=ssh reaxp0:/starusers/ert /rsync44/ert/home
works fine provided issuing the command from mull:
 ssh reaxp0 true
returns nothing. Otherwise you will get the error:
 protocol version mismatch - is your shell clean?

Q: How do I exclude files from being backed up?

A: Use the --exclude 'pattern' flag. For example, I want to exclude temporary backup files which have the form PM.bak.0, PM.bak.1, etc. or sometimes just PM.bak so the flags --exclude '*.bak.*' --exclude '*.bak' prevent these files from being backed up.

Q: How do I backup only certain files?

A: Exclude everything using --exclude '*', but include what you want using --include 'pattern'. The subtle trick is that since --exclude '*' excludes everything, including directories, you need to include all directories, then include the files you want backed up. For example, to backup all C source files, use:

--include "*/" --include "*.c" --exclude "*"
See the man pages for more details.

Q: How do I back up my laptop (or any machine) which is on the vDMZ?

A: Tunnel through moray.

In the following, I back up the /home directory of my laptop, galactica, to mull. The command is run from my laptop. Note that I have a different username on my laptop than on the IfA system, so need to add my IfA username .

rsync -avz --delete --rsync-path="ssh mull rsync" \
     /home ert@moray.roe.ac.uk:/rsync44/ert/galactica

Q: How do I edit my crontab entry?

A: To edit your crontab:

crontab -e
To list your crontab:
crontab -l
Note when editting: The editor used will be the one pointed to by your EDITOR environment variable.

Q: Can I use the backup area for my laptop?

A: Yes. The backup area is for you to use for any work-related content you need to protect. How you perform a backup from you laptop is entirely up to you. Be aware that you need to be on the local network (ROE VPN, for example).