TIP: Automating Website Backups - Part IIb (Reducing Backup Size Contd.)
Last time, I had told you about how to reduce the backup size. Well, this is a short note further developing on the approaches discussed there. Basically, telling tar to backup only those files which have changed. For this, Linux has a command “find” which I am very fond of. You can give it a option “-newer” followed by a f”filename” and it will return the names of the files that are newer than filename.
find ~ -type f -newer ~/backups/backup_x.tgz > files.txt
So, the command given above will find all the files that have changed since you took the backup “backup_x.tgz” and store those filenames and paths into “_files.txt”. _The “-type f” option makes sure that only filenames are listed and not directory names, because tar creates a lot of issues when presented with directory names (Explore yourself about this).
Now, all we have to do is give this “files.txt” to tar as an input to tell it which files to archive.
tar cvzpGf ~/backups/backup_$date.tgz -T files.txt
The “-T files.txt” option makes this happen. Moreover, I have introduced 2 new options here, that were not present in our last part. They are:
- p – Tells tar to preserve file permissions
- G – Tells tar to ignore any file read errors etc and continue
Apart from this, you can also take a look at the “-mtime x” option for find command which lets you specify to list files which have changed in past x days. There are other similar options available for find. Look at “man find” and take your pick. Similar options exist for tar, but I have had a lot of weird issues using them, so I’d recommend sticking with this two step process of “find” followed by “tar”.
Now, the above mentioned commands and options can be used in innumerous ways and combinations to achieve your perfect balance of space and ease of use etc for backups. I’ll list down a sample script here, that will make a full backup on every first day of the week, and then make incremental backups over each day for rest 6 days. So, you’ll save a lot of space (more than 5 times), but you will have to use all 7 backup files to make a full restore. (I’m listing just the backup part, you can add the “mutt” command yourself, for e-mailing as mentioned in Part I)
#!/bin/bash
date=`date +%w`
if [ ! -e "test/a" ] || [ -z "$date" ]; then
tar cvzpGf ~/backups/backup_`date +%w`.tgz ~/public_html
echo inif
else
date2=$(($date-1))
find ~ -type f -newer ~/backups/backup_$date2 > files.txt
tar cvzpGf ~/backups/backup_$date.tgz -T files.txt
fi
That’s it for today. Lemme know if you have any doubts, or if you would like to see any other questions answered in this series. The question that will be answered next time is:
**Q2: **Cron? Using tar, making up the script file is enough command line for me. Isn’t there an easy way?