[Linux] Simplest possible snapshot-style backups using rsync

... which was inspired by Mike Rubel's web page.

Let's say, I've got two hard drives. First is $HOME, second is mounted at /mnt/hitachi (happened to be a name of my second backup HDD).

First, create initial snapshot by copying all your files from $HOME to /mnt/hitachi:

#!/bin/bash
DIRNAME=/mnt/hitachi/bak-$(date +%Y%m%d-%H%M%S)/
rsync --delete -av $HOME/ $DIRNAME

That creates a copy of your $HOME in, say, /mnt/hitachi/bak-20191014-121855.

Now create hard-links for all the files in backup. "ls -t" is used to find newest directory under /mnt/hitachi. "cp -al" creates hard-links instead of copying files as whole objects.

#!/bin/bash
path=/mnt/hitachi
DIRNAME=$(ls $path -1ar | grep bak | head -1)
NEWDIRNAME=/mnt/hitachi/bak-$(date +%Y%m%d-%H%M%S)
cp -al $DIRNAME $NEWDIRNAME

This creates a second snapshot, say, /mnt/hitachi/bak-20191015-211458. The second snapshot is like a virtual copy. Each file physically stored only once. But there are now two hard-links from both snapshots.

Important: I used ls -t $path | head -n1 before. But it happens that somehow, the cp -al command zaps timestamps of the directory. So now I'm using ls $path -1ar | grep bak | head -1 to get newest directory, using sorting by directory name.

Now update (or populate) the second (or newest) snapshot:

#!/bin/bash
path=/mnt/hitachi
DIRNAME=$(ls $path -1ar | grep bak | head -1)
rsync -av $HOME/ $path/$DIRNAME/ --delete --ignore-errors

This scripts copies files absent in the last snapshot or deletes hard-link. (A physical file is being deleted if no hard-links points to it.)

Run the second and the third script once a day or more often...

Now you have a set of snapshots. This is like preserving a history of a file or a directory. Like VCS does. Or like Wikipedia tracks history of each article.

If your second backup HDD has no more free space, just delete oldest snapshot(s). Again, this will not delete files still linked in the newer snapshots.

Why snapshots are important? You can delete a file couple of days ago AND made backup. In snapshot-style backup, the file may still be accessible if you have a snapshot made BEFORE file deletion. Also, this can be a solution against ransomware malware, if your backup system made backup of already encrypted files, older snapshot can still be protected...

Also, nice feature of Windows: https://en.wikipedia.org/wiki/Shadow_Copy.

Trimming

Script to delete oldest snapshot:

#!/bin/bash
path=/mnt/hitachi

echo Before trimming:
df --output=source,target,avail,pcent $path | grep -v Mounted

DIRNAME=$(ls -tr $path | head -n1)
echo Going to trim $path/$DIRNAME
# -I doesn't work as I wanted, because rm asking for removing write-protected files...
read -p "Press enter to actually delete the files or Ctrl-C"
rm -rf $path/$DIRNAME

echo After trimming:
df --output=source,target,avail,pcent $path | grep -v Mounted

However, you can just largest file in oldest snapshot, right? And even more, only files that linked only once, i.e., stored only in one snapshot. (Killing files linked from several snapshots is senseless, you can't buy much space by this operation.) In my case, these are old VM images...

find . -size +1G -type f -links 1 -print

And if you're sure:

find . -size +1G -type f -links 1 -delete

Thus, old snapshots are not full copies (largest files can be missted). But, your backup system now can store much more snapshots (or history) for smaller files.

Feedback from a reader

Date: Thu, 17 Oct 2019 08:25:07 +0300
From: Ciprian Dorin Craciun <ciprian(dot)craciun(at)gmail.com>

I've read you article 'Simplest possible snapshot-style backups using
rsync' (https://yurichev.com/blog/bak/), and found it interesting.
(Personally I use 'rdiff-backup'.)

...

Also please note that you can use just 'rsync' to obtain the same
results (without using the extra 'cp') by using '--compare-dest' or
'--link-dest'.

Moreover it would be useful to point out that if one changes a file in
a "snapshot", and it is linked, then all the other snapshots are also
affected.  Perhaps a 'chattr' with setting immutable could be useful
after backup to make sure this doesn't happen;  but before backups
that immutable flag must be removed so that hardlinks can be made.

Thanks for the article,
Ciprian.

UPD: at reddit.


→ [list of blog posts]

'