TIP: Automating Website Backups - Part II (Reducing Backup Size)
Last week I had written a short guide on “Automating Website Backups”. I received some good comments/suggestions about it from Jasvinder through e-mail about other aspects that should be addressed. So, thought of doing a few follow ups to that post. This is done in more of a FAQ fashion, so as to answer some of the questions that may arise after reading the first post. Each post will answer one of the questions listed below in detail
Questions to be answered:
Q1: I have strict limits on webspace/email attachment etc. Can anything be done to reduce the backup size?
**Q2: **Cron? Using tar, making up the script file is enough command line for me. Isn’t there an easy way?
Q3: Well, backup is all hunky-dory. Now, how do I restore?
Want the answers? Read on.
Q1: I have strict limits on webspace/email attachment etc. Can anything be done to reduce the backup size?
**A1: **Sure, we can do quite a lot about it. If your concern is just the webspace, and you want a very simple restoring process, you should be looking for reducing the number of backups that you save on the server, instead of reducing the size of the backup (you anyways have all the backups mailed out to you). The number of backups can be reduced by manipulating the command given in step 1 of the previous post. To recall, the command was:
Here, the date +%w
part ensured that at any time there will be 7 previous backups at max stored on the server (backup_0.tgz, backup_1.tgz,…). You can remove this part and every day your previous backup will be overwritten by the new one, thus using space for just 1 backup. And since you have all the backups emailed to you as well, you don’t need to worry about restoring back to any point in time. Its just that having the backup on your server saves some time if you do have to restore since you don’t have to upload the backup file onto the server.
BUT, if you must place restriction on your backup size (backup is so big that that your email can’t handle it as an attachment or you are really really pressed for webspace or backing up all the files takes up too much time/cpu of the host server), then you can save yourself by backing up only the files that need to be backed up. What does that mean? All CMS (Content Management Systems like Joomla, WordPress, Drupal, or your custom ones, or even a manually made site) will have some/many static components/files which don’t change with time, and so these files don’t need to be backed up every time. You can just backup the files that change or the ones that get added (like images, uploads etc).
So, all you have to do is determine which files you don’t want backed up, create a list for them and supply it to the _tar _command, with the _—exclude-from _option, and tar will dutifully avoid zipping those files up, leaving you with only the most needed files.
Now, there are two ways to go about determining which files you don’t need. First is that you simply run “ls -R > exclude_these_files.txt” on the root of your home directory on the server (the folder beneath which your public_html or www folder resides). Then fire up your favourite editor and load this file into that and delete the file names from the file which you want to back up, leaving only those filenames in the file which you don’t want to backup. Generally, you would be interested in backing up any files that you have uploaded yourself onto the server (like images, archives, songs, any other files that you uploaded, any files that you modified, etc)
If you are not sure which files you should be backing up and you don’t want to backup all files, then there is still hope for you, but only if you are reading this guide before starting a new site . Because then, before making any posts or customizations, you can just run the above mentioned ls command, and you will have a list of the default files that are always there with your CMS and you don’t need to back them up. However, make sure that if you ever change any of the default files, do delete their names from the exclude_these_files.txt.
That’s it for today. Lemme know if you have any doubts, or if you would like to see any other questions answered in this series. The question that will be answered next time is:
**Q2: **Cron? Using tar, making up the script file is enough command line for me. Isn’t there an easy way?