rsnapshot

Image

5/23/11 update: click for instructions on how to install Rsnapshot on Puppy Linux
For step-by-step setup instructions, see the digitalocean webpage.

Background

In my opinion, rsnapshot is a wonderful the best computer file backup tool. It's a PERL-based Linux tool chain using rsync. If you're familiar with the Mac OSX world, it's functionally efficient like the Timemachine program. In terms of utility and lucidity, it's right up there with Irfanview for graphic handling and Total Commander for file directory synchronization, both which I've used for nearly 10 years because I've found nothing better. For synchronizing work environments, I've settled on Unison because it forwards syncrhonized adds, mods, and deletes. Unison is a little bit more awkward to set up the first time, and it has a more analytical way of doing things, which is attractive once it's set up. For example, it propagates recent file deletions into the synchronized directory, too. This is not possible using only native file systems of Windows, so Total Commander does not offer this feature. See my other static webpage for a comparison of unison, rsnapshot, and krusader.

Why Rsnapshot

  1. It does intelligent incremental backups, which use little disk space. It uses hard file links to make it look as if each backup is a full back-up so you don't have to root through lots of incremental files to find what you're looking for.
  2. It offers pre- and post-scripting options, so you could automatically compress and/or encrypt the destination folders even more if you wish.
  3. It's a tool-chain, relying on other Linux standards underneath it (rsync, cron).
  4. Many backup utilities center all their attention on the backup side, and it's a particular pain to do an actual file recovery if and when you're already stressed out because of data loss. With rsnapshot, it's trivial to recover files because they look just like another file directory to users, and can be directly copied wherever you need them.
  5. Easy automated to do schedule interleaved periodic backups, similar to the way professional organizations do it.
  6. Easy to include off-site storage in the backup schedule because rsnapshot knows how to do rsync over an ssh link.

A Few Observations

  1. in the /etc/rsnapshot.conf file you specify how many hourly, daily, weekly, monthly, and yearly files you want to keep. Because these are incremental backups, the space required is much less than proportional to the number of backups you keep.
  2. the rsnapshot.conf file strangely calls the number of backups a backup interval. As an engineer and scientist, in addition to a computer programmer, this sloppiness with units is confusing. An interval comes with units such as minutes, or hours, or day; a backup count is a unitless number. One is the mathematical inverse of the other.
  3. when an interval-level backup (hourly, daily, weekly, monthly, yearly) runs, it takes, depletes, or acquires possession of the oldest backup from the next lower level. For example, when the daily job runs, hourly.6 is renamed to daily.0.
  4. the second rsnapshot screen shot on the rsnapshot FAQ page shows that any interval-level pulls up the oldest from the the time-layer below it, depleting the lower level from the amount it's suppose to have. But to be precise, it doesn't take the oldest file, it takes the file labeled with the the lower level count set in rsnapshot.conf. In other words, if rsnapshot.conf has "interval daily 3" then the weekly backup will take daily.2 (leaving behind daily.0, daily.2, daily.4, or anything else).
  5. when backing up from Linux to a Windows partition (Mandriva 2009 is now capable of writing and reading to/from an NTFS partitions after a nominal install), rsnapshot seems to work (files are deposited onto the NTFS parition) except the lack of hard links makes causes errors in the /var/log/rsnapshot log. I'm not sure yet if all the desired files are present. At best you'll have to root through separate incremental backups until you find what you want.
  6. by default, Mandriva uses the cron table to do small period items first. I text script file called "rsnapshot" is in whichever directory as appropriate, so it runs at these times:
[root@axp log]# cat /etc/crontab
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
HOME=/

# run-parts
01 * * * * root nice -n 19 run-parts --report /etc/cron.hourly
02 4 * * * root nice -n 19 run-parts --report /etc/cron.daily
22 4 * * 0 root nice -n 19 run-parts --report /etc/cron.weekly
42 4 1 * * root nice -n 19 run-parts --report /etc/cron.monthly

This results in:
    • hourly at 1 minute past
    • daily at 4:02 am
    • weekly Sunday at 4:22 am
    • monthly on the 1st at 4:42 am.
This default crontab execution order has two irksome things for me as far as rsnapshot is concerned:
  1. running longer-interval backups after shorter-interval backups depletes a lower level file. In other words, the hourly will run, generating eventually, for example, 3 files (hourly.0, hourly.1, and hourly.2) and then when the daily runs, it will steal hourly.2. So really, you have sitting around one less backup than you think you specified in rsnapshot.conf, after the next higher time interval script runs.
  2. actual file copies and hard-link rewrites are done with the lowest level (usually hourly.0). The hourly level should be run last in case it takes a long time, and so that it replenishes the missing hourly file taken by the daily run.

Inspecting the rsnapshot backup directories, the rsnapshot log, and the rsnapshot.conf file, we can confirm several things:
[root@axp log]# ll /home/.snapshots
total 36
drwxr-xr-x 3 root root 4096 2009-03-28 02:01 daily.0/
drwxr-xr-x 3 root root 4096 2009-03-28 01:05 daily.1/
drwxr-xr-x 3 root root 4096 2009-03-28 01:04 daily.2/
drwxr-xr-x 3 root root 4096 2009-03-28 08:01 hourly.0/
drwxr-xr-x 3 root root 4096 2009-03-28 07:01 hourly.1/
drwxr-xr-x 3 root root 4096 2009-03-28 06:01 hourly.2/
drwxr-xr-x 3 root root 4096 2009-03-28 01:00 monthly.0/
drwxr-xr-x 3 root root 4096 2009-03-28 01:01 weekly.0/
drwxr-xr-x 3 root root 4096 2009-03-28 01:01 weekly.1/
[root@axp log]# cat rsnapshot | grep "completed" | tail -10
[28/Mar/2009:01:08:29] /usr/bin/rsnapshot daily: completed successfully
[28/Mar/2009:01:09:43] /usr/bin/rsnapshot hourly: completed successfully
[28/Mar/2009:02:01:52] /usr/bin/rsnapshot hourly: completed successfully
[28/Mar/2009:03:01:52] /usr/bin/rsnapshot hourly: completed successfully
[28/Mar/2009:04:01:52] /usr/bin/rsnapshot hourly: completed successfully
[28/Mar/2009:04:19:42] /usr/bin/rsnapshot daily: completed successfully
[28/Mar/2009:05:02:47] /usr/bin/rsnapshot hourly: completed successfully
[28/Mar/2009:06:01:53] /usr/bin/rsnapshot hourly: completed successfully
[28/Mar/2009:07:01:56] /usr/bin/rsnapshot hourly: completed successfully
[28/Mar/2009:08:01:48] /usr/bin/rsnapshot hourly: completed successfully
[root@axp log]# cat /etc/rsnapshot.conf | grep "interval"
interval hourly 3
interval daily 3
interval weekly 2
interval monthly 1
# Normally, when rsnapshot is called with its lowest interval
# intervals. With sync_first enabled, "rsnapshot sync" handles the file sync,
# and all interval calls simply rotate files. See the man page for more
# If enabled, rsnapshot will move the oldest directory for each interval
# to [interval_name].delete, then it will remove the lockfile and delete
[root@axp log]#

  • Hourlies were started one minute after the hour and took 52 seconds to complete.
  • When the daily ran at 4:02, the config file said "interval hourly 3", so the daily run took the hourly.3 file. Since there were 3 of them, they would have been at 4:01, 3:01, and 2:01. Sure enough, notice daily.0 has a timestamp of 2:01.

Which of the backup files actually contain data (versus hard links to other backup files)? Two different commands can give the answer:
[root@axp .snapshots]# du -s *
54G daily.0
33M daily.1
24M daily.2
31M hourly.0
24M hourly.1
24M hourly.2
24M monthly.0
24M weekly.0
24M weekly.1
[root@axp .snapshots]# rsnapshot du
54G /home/.snapshots/hourly.0/
24M /home/.snapshots/hourly.1/
24M /home/.snapshots/hourly.2/
31M /home/.snapshots/daily.0/
33M /home/.snapshots/daily.1/
24M /home/.snapshots/daily.2/
24M /home/.snapshots/weekly.0/
24M /home/.snapshots/weekly.1/
24M /home/.snapshots/monthly.0/
54G total

Notice hourly.0 always has the burden of creating the bulk of the files, while all the higher backups have only incremental content. In fact, the directories other than hourly.0 physically contain only those files that have since been deleted and no longer show up in the more recent backups. As an aside, when using hard links under Linux, the concept of "which" directory has the file is rather arbitrary. I'm not sure why the file volume shows up in one directory compared to the other. Remember, you can dive into any of the snapshots, and all the common files also appear there in addition to the unique files that are actually kept it that directory.

The two du options below list my backups just before and after the hourly that ran from 9:01.00 to 9:03.33.
[root@axp .snapshots]# date
Sat Mar 28 08:58:56 EDT 2009
[root@axp .snapshots]# rsnapshot du
54G /home/.snapshots/hourly.0/
24M /home/.snapshots/hourly.1/
24M /home/.snapshots/hourly.2/
31M /home/.snapshots/daily.0/
33M /home/.snapshots/daily.1/
24M /home/.snapshots/daily.2/
24M /home/.snapshots/weekly.0/
24M /home/.snapshots/weekly.1/
24M /home/.snapshots/monthly.0/
54G total
[root@axp .snapshots]# date
Sat Mar 28 09:03:39 EDT 2009
[root@axp .snapshots]# cat /var/log/rsnapshot | grep "28/Mar/2009:09"
[28/Mar/2009:09:01:04] /usr/bin/rsnapshot hourly: started
[28/Mar/2009:09:01:04] echo 11162 > /var/run/rsnapshot.pid
[28/Mar/2009:09:01:04] mv /home/.snapshots/hourly.2/ /home/.snapshots/hourly.3/
[28/Mar/2009:09:01:04] mv /home/.snapshots/hourly.1/ /home/.snapshots/hourly.2/
[28/Mar/2009:09:01:04] mv /home/.snapshots/hourly.0/ /home/.snapshots/hourly.1/
[28/Mar/2009:09:01:04] mkdir -m 0755 -p /home/.snapshots/hourly.0/
[28/Mar/2009:09:01:04] /usr/bin/rsync -a --delete --numeric-ids --relative
--delete-excluded --exclude=/mnt/hdb2/.snapshots --exclude=/home/.snapshots
--exclude=/home/brian/ln-hda7 --exclude=/home/brian/ln-hdb1
--exclude=/home/brian/ln-hdb2 --exclude=.snapshots
--link-dest=/home/.snapshots/hourly.1/localhost/
/home /home/.snapshots/hourly.0/localhost/
[28/Mar/2009:09:03:26] /usr/bin/rsync -a --delete --numeric-ids --relative
--delete-excluded --exclude=/mnt/hdb2/.snapshots --exclude=/home/.snapshots
--exclude=/home/brian/ln-hda7 --exclude=/home/brian/ln-hdb1
--exclude=/home/brian/ln-hdb2
--link-dest=/home/.snapshots/hourly.1/localhost/
/etc /home/.snapshots/hourly.0/localhost/
[28/Mar/2009:09:03:33] /usr/bin/rsync -a --delete --numeric-ids --relative
--delete-excluded --exclude=/mnt/hdb2/.snapshots --exclude=/home/.snapshots
--exclude=/home/brian/ln-hda7 --exclude=/home/brian/ln-hdb1
--exclude=/home/brian/ln-hdb2
--link-dest=/home/.snapshots/hourly.1/localhost/
/usr/local /home/.snapshots/hourly.0/localhost/
[28/Mar/2009:09:03:33] touch /home/.snapshots/hourly.0/
[28/Mar/2009:09:03:33] rm -f /var/run/rsnapshot.pid
[28/Mar/2009:09:03:33] /usr/bin/rsnapshot hourly: completed successfully
[root@axp .snapshots]# rsnapshot du
54G /home/.snapshots/hourly.0/
43M /home/.snapshots/hourly.1/
24M /home/.snapshots/hourly.2/
24M /home/.snapshots/hourly.3/
31M /home/.snapshots/daily.0/
33M /home/.snapshots/daily.1/
24M /home/.snapshots/daily.2/
24M /home/.snapshots/weekly.0/
24M /home/.snapshots/weekly.1/
24M /home/.snapshots/monthly.0/
54G total

Notice the total set of backup files is 54G. Since my .snapshot directory is in the system /home directory, the overall space used on my /home partition should be about 108 GB (original copy of files and the backup set of files, which contains one copy of each plus maybe some things that were deleted but not flushed out of the backup set yet).
[root@axp .snapshots]# df
Filesystem Size Used Avail Use% Mounted on
/dev/hda1 11G 3.0G 6.7G 31% /
/dev/hda6 124G 108G 17G 87% /home
/dev/hda7 16G 8.0G 7.1G 54% /mnt/hda7
/dev/hdb1 59G 3.7G 55G 7% /mnt/hdb1
/dev/hdb2 70G 63G 6.6G 91% /mnt/hdb2

..and it is! Thanks for listening :-)

Created by brian. Last Modification: Saturday 23 of April, 2016 07:16:13 CDT by brian.